Project Overview¶
Vision¶
AgentTel is an open-source telemetry library and semantic convention extension for OpenTelemetry that makes application telemetry natively consumable by AI agents. It works across the full stack — JVM backends (Java, Kotlin, Scala) and browser frontends (TypeScript/JavaScript) — bridging the gap between human-oriented observability and the structured, contextual data that autonomous systems require.
Motivation¶
The Observability Gap¶
Today's observability stack — traces, metrics, logs — is designed for human operators working through dashboards and alert rules. When AI agents are tasked with autonomous incident response, they face fundamental limitations:
Missing Context. A span showing POST /api/payments with 312ms latency tells an agent nothing without knowing the baseline is 45ms, the operation is retryable, the service is Tier 1, and the team to page is #payments-oncall.
No Behavioral Baselines. Agents cannot distinguish "this is normal for a Tuesday morning" from "this is a 5x latency spike" without baseline data attached to the telemetry itself. External baseline systems add latency and require separate integrations.
No Topology Awareness. Understanding that payment-service depends on postgres (required, 5s timeout, circuit breaker enabled) and stripe-api (required, fallback to cached pricing) requires knowledge that isn't embedded in today's telemetry.
No Decision Metadata. When an agent detects an anomaly, it needs to know: Can I retry this? Is there a fallback? Should I page someone or auto-remediate? Today, this information lives in runbooks, tribal knowledge, and configuration files — not in the telemetry stream.
No Actionable Interface. Even with enriched telemetry, agents need a structured API to query live system state, understand incidents in context, and execute remediation — not just read historical traces.
The Solution¶
AgentTel enriches telemetry at the instrumentation layer — the earliest and most reliable point in the data pipeline — with five categories of agent-ready context:
- Topology — Service identity, ownership, dependency graph, consumer relationships
- Baselines — Static and rolling latency/error baselines per operation
- Decision Metadata — Retryability, idempotency, fallbacks, runbooks, escalation levels
- Anomaly Detection — Z-score deviation detection with pattern recognition
- SLO Tracking — Error budget consumption with burn rate alerting
It also provides an agent interface layer that packages this telemetry into structured formats AI agents can consume via the Model Context Protocol (MCP), complete with incident context building, remediation execution, and full action auditability.
Why Now¶
- OTel GenAI semantic conventions are still in development — there is an opportunity to influence them before stabilization
- JVM GenAI instrumentation is fragmented — Spring AI has basic Micrometer support, community projects are in SNAPSHOT, and there is no coverage for Anthropic, Bedrock, or OpenAI Java SDKs
- Enterprise Java has a massive installed base but is underserved by the Python-centric AI observability ecosystem
- Industry validation from Logz.io, Mezmo, Sawmills, Splunk, and Datadog confirms that agent-consumable telemetry is the frontier of observability
Design Principles¶
Annotations-First API¶
Developers declare operational semantics alongside their code:
@AgentOperation(
expectedLatencyP50 = "45ms",
retryable = true,
runbookUrl = "https://wiki/runbooks/process-payment"
)
This keeps context co-located with the code it describes, reviewed in pull requests, and versioned with the application.
Strict OpenTelemetry Extension¶
AgentTel is a semantic convention extension to OpenTelemetry, not a replacement. It uses the standard SpanProcessor and SpanExporter interfaces, adds attributes under the agenttel.* namespace, and coexists with all existing OTel conventions. Any OTel-compatible backend can ingest AgentTel-enriched spans.
Zero-Overhead When Disabled¶
All enrichment is conditional. When no annotations are present or AgentTel is not configured, the library adds zero overhead. Baseline computation uses lock-free ring buffers. Anomaly detection is O(1) per span.
Framework-Agnostic Core¶
The core library depends only on the OpenTelemetry SDK. Spring Boot integration is provided through a separate starter module. Frontend telemetry is provided through a standalone TypeScript SDK. The architecture supports future adapters for Quarkus, Micronaut, and other frameworks.
Optional Dependencies¶
GenAI instrumentation libraries (Spring AI, LangChain4j, Anthropic SDK, OpenAI SDK, AWS Bedrock SDK) are all compileOnly dependencies. They activate only when the corresponding library is present on the classpath. Users are never forced to pull in libraries they don't use.
Project Structure¶
agenttel/
├── agenttel-api/ # Annotations, attributes, enums (zero dependencies)
├── agenttel-core/ # Runtime engine (OTel SDK dependency only)
├── agenttel-genai/ # GenAI instrumentation (optional framework deps)
├── agenttel-agent/ # Agent interface layer (MCP, health, incidents, reporting)
├── agenttel-spring-boot-starter/ # Spring Boot auto-configuration
├── agenttel-javaagent-extension/ # Zero-code OTel javaagent extension
├── agenttel-web/ # Browser SDK (TypeScript) — frontend telemetry
├── agenttel-instrument/ # IDE MCP server (Python) — instrumentation automation
├── agenttel-testing/ # Test utilities
├── examples/
│ ├── spring-boot-example/ # Spring Boot + AgentTel demo
│ └── langchain4j-example/ # LangChain4j + AgentTel demo
├── dashboards/ # Grafana dashboard templates
└── docs/ # This documentation
Module Summary¶
| Module | Artifact | Dependencies | Description |
|---|---|---|---|
agenttel-api |
io.agenttel:agenttel-api |
None | Annotations, attribute constants, enums, data models |
agenttel-core |
io.agenttel:agenttel-core |
OTel SDK, Jackson | Span enrichment, baselines, anomaly detection, SLO tracking, events |
agenttel-genai |
io.agenttel:agenttel-genai |
OTel SDK + optional GenAI libs | LangChain4j, Spring AI, Anthropic/OpenAI/Bedrock instrumentation |
agenttel-agent |
io.agenttel:agenttel-agent |
OTel SDK, Jackson | MCP server, health aggregation, incident context, remediation, trend analysis, SLO reports, executive summaries, cross-stack context |
agenttel-spring-boot-starter |
io.agenttel:agenttel-spring-boot-starter |
Spring Boot | Auto-configuration for Spring Boot applications |
agenttel-javaagent-extension |
io.agenttel:agenttel-javaagent-extension |
OTel Javaagent | Zero-code enrichment for any JVM app — no Spring dependency |
agenttel-web |
@agenttel/web (npm) |
TypeScript, ES2020+ | Browser telemetry SDK — page loads, navigation, API calls, journeys, anomaly detection, cross-stack correlation |
agenttel-instrument |
agenttel-instrument (pip) |
Python 3.11+ | IDE MCP server — codebase analysis, config generation, validation, auto-improvements |
agenttel-testing |
io.agenttel:agenttel-testing |
OTel SDK Testing | Test utilities for verifying span enrichment |
Target Audience¶
- Platform engineers building internal developer platforms with AI-assisted incident response
- SRE teams adopting AIOps tooling that needs structured telemetry
- Backend developers who want their JVM services to be "agent-ready" with minimal code changes
- Frontend developers who want browser telemetry with journey tracking, anomaly detection, and cross-stack correlation
- AI/ML engineers building autonomous agents that interact with production systems
Current Status¶
AgentTel is in alpha (v0.1.0-alpha). The core instrumentation, GenAI support, and agent interface layer are implemented and tested. The API surface may evolve before 1.0.
Further Reading¶
- Semantic Conventions — Complete attribute and event schema
- Architecture — Technical design and data flow
- Agent Layer — MCP server, incident context, remediation
- GenAI Instrumentation — LLM framework instrumentation
- API Reference — Full API documentation
- Roadmap — Release plan and future work