Skip to main content

Roadmap

What we shipped, what we are building next, and what we plan to build.

Last Shipped

Folders for Prompt Organization

Create folders and subfolders to organize prompts. Drag prompts between folders and search across everything.

Navigation Links from Traces to App/Environment/Variant

Clickable links in observability traces to navigate to the application, variant, version, and environment used in each trace. Jump directly to the configuration that generated a specific trace.

Date Range Filtering in Metrics Dashboard

Filter traces by date range in the metrics dashboard. View metrics for the last 6 hours, 24 hours, 7 days, or 30 days.

Test Set Versioning and New UI

Track test set changes with versioning. Every edit creates a new version. Evaluations link to specific versions for reliable comparisons. Plus a rebuilt UI that scales to 100K+ rows with inline editing for chat messages and JSON.

Chat Sessions in Observability

Track multi-turn conversations with session grouping. All traces with the same session ID are automatically grouped together, showing complete conversation flows with cost, latency, and token metrics per session.

PDF Support in the Playground

PlaygroundEvaluationObservability

Attach PDF documents to chat messages in the playground. Upload files, provide URLs, or use file IDs from provider APIs. Works with OpenAI, Gemini, and Claude models. PDFs are supported in evaluations and observability traces.

Provider Built-in Tools in the Playground

Use provider built-in tools like web search, code execution, and file search directly in the Playground. Supported providers include OpenAI, Anthropic, and Gemini. Tools are saved with prompts and automatically used via the LLM gateway.

In progress

Improving Navigation between Testsets in the Playground

We are making it easy to use and navigate in the playground with large testsets.

Prompt Snippets

Create reusable prompt snippets that can be referenced across multiple prompts. Reference specific versions or always use the latest version to maintain consistency across prompt variants.

AI-Powered Prompt Refinement in the Playground

Analyze prompts and suggest improvements based on best practices. Identify issues, propose refined versions, and allow users to accept, modify, or reject suggestions.

Open Observability Spans Directly in the Playground

PlaygroundObservability

Add a button in observability to open any chat span directly in the playground. Creates a stateless playground session pre-filled with the exact prompt, configuration, and inputs for immediate iteration.

Running Evaluators in the Playground

PlaygroundEvaluation

Run evaluators directly in the playground to get immediate quality feedback on prompt changes. Evaluate outputs inline as you iterate on prompts. Scores, pass/fail results, and evaluator reasoning appear right next to the LLM response.

Single sign-on (SSO) support for enterprise customers. Integrate your identity provider using SAML or OIDC for secure, centralized authentication. Log in with existing corporate credentials and control user provisioning and access.

US Region for Agenta Cloud

Agenta Cloud is adding a US-based region. Run your projects with all data stored within the United States. This helps meet data residency requirements that need data to stay in a specific geography.

Creating Agents from the UI

Build and configure AI agents directly from the Agenta UI. Define agent workflows, select tools, and set up orchestration logic without writing code. Test and iterate on agent behavior in the playground, then deploy to production with versioning and observability built in.

Webhooks for Deployment Linked to CI

Trigger CI/CD pipelines automatically when you deploy a prompt version. Connect Agenta deployments to your existing CI workflows so that deploying a new version kicks off automated tests, approval gates, or release processes.

Planned

Prompt Caching in the SDK

We are adding the ability to cache prompts in the SDK.

Tagging Traces, Testsets, Evaluations and Prompts

We are adding the ability to tag traces, testsets, evaluations and prompts. This is useful for organizing and filtering your data.

Feature Requests

Upvote or comment on the features you care about or request a new feature.

Request a feature