Roadmap
What we shipped, what we are building next, and what we plan to build.
Last Shipped
Folders for Prompt Organization
2/4/2026
Playground
Create folders and subfolders to organize prompts. Drag prompts between folders and search across everything.
Navigation Links from Traces to App/Environment/Variant
1/28/2026
Observability
Clickable links in observability traces to navigate to the application, variant, version, and environment used in each trace. Jump directly to the configuration that generated a specific trace.
Date Range Filtering in Metrics Dashboard
1/9/2026
Observability
Filter traces by date range in the metrics dashboard. View metrics for the last 6 hours, 24 hours, 7 days, or 30 days.
Test Set Versioning and New UI
1/20/2026
Evaluation
Track test set changes with versioning. Every edit creates a new version. Evaluations link to specific versions for reliable comparisons. Plus a rebuilt UI that scales to 100K+ rows with inline editing for chat messages and JSON.
Chat Sessions in Observability
1/9/2026
Observability
Track multi-turn conversations with session grouping. All traces with the same session ID are automatically grouped together, showing complete conversation flows with cost, latency, and token metrics per session.
PDF Support in the Playground
12/17/2025
PlaygroundEvaluationObservability
Attach PDF documents to chat messages in the playground. Upload files, provide URLs, or use file IDs from provider APIs. Works with OpenAI, Gemini, and Claude models. PDFs are supported in evaluations and observability traces.
Provider Built-in Tools in the Playground
12/11/2025
Playground
Use provider built-in tools like web search, code execution, and file search directly in the Playground. Supported providers include OpenAI, Anthropic, and Gemini. Tools are saved with prompts and automatically used via the LLM gateway.
In progress
Improving Navigation between Testsets in the Playground
Playground
We are making it easy to use and navigate in the playground with large testsets.
Prompt Snippets
Playground
Create reusable prompt snippets that can be referenced across multiple prompts. Reference specific versions or always use the latest version to maintain consistency across prompt variants.
AI-Powered Prompt Refinement in the Playground
Playground
Analyze prompts and suggest improvements based on best practices. Identify issues, propose refined versions, and allow users to accept, modify, or reject suggestions.
Open Observability Spans Directly in the Playground
PlaygroundObservability
Add a button in observability to open any chat span directly in the playground. Creates a stateless playground session pre-filled with the exact prompt, configuration, and inputs for immediate iteration.
Running Evaluators in the Playground
PlaygroundEvaluation
Run evaluators directly in the playground to get immediate quality feedback on prompt changes. Evaluate outputs inline as you iterate on prompts. Scores, pass/fail results, and evaluator reasoning appear right next to the LLM response.
Enterprise SSO
Security
Single sign-on (SSO) support for enterprise customers. Integrate your identity provider using SAML or OIDC for secure, centralized authentication. Log in with existing corporate credentials and control user provisioning and access.
US Region for Agenta Cloud
Misc
Agenta Cloud is adding a US-based region. Run your projects with all data stored within the United States. This helps meet data residency requirements that need data to stay in a specific geography.
Creating Agents from the UI
Playground
Build and configure AI agents directly from the Agenta UI. Define agent workflows, select tools, and set up orchestration logic without writing code. Test and iterate on agent behavior in the playground, then deploy to production with versioning and observability built in.
Webhooks for Deployment Linked to CI
Integration
Trigger CI/CD pipelines automatically when you deploy a prompt version. Connect Agenta deployments to your existing CI workflows so that deploying a new version kicks off automated tests, approval gates, or release processes.
Planned
Prompt Caching in the SDK
SDK
We are adding the ability to cache prompts in the SDK.
Tagging Traces, Testsets, Evaluations and Prompts
Evaluation
We are adding the ability to tag traces, testsets, evaluations and prompts. This is useful for organizing and filtering your data.
Feature Requests
Upvote or comment on the features you care about or request a new feature.