API Reference
Complete reference for the AgentTel API surface — annotations, programmatic APIs, configuration properties, and enums.
Annotations
@AgentObservable
Service-level annotation declaring topology metadata. Applied to the main application class.
@AgentObservable(
service = "payment-service",
team = "payments-platform",
tier = ServiceTier.CRITICAL,
domain = "commerce",
onCallChannel = "#payments-oncall"
)
@SpringBootApplication
public class PaymentServiceApplication { }
| Parameter |
Type |
Default |
Description |
service |
String |
"" |
Service name (defaults to Spring application name) |
team |
String |
required |
Owning team identifier |
tier |
ServiceTier |
STANDARD |
Service criticality tier |
domain |
String |
"" |
Business domain |
onCallChannel |
String |
"" |
Escalation channel |
@DeclareDependency
Declares a service dependency. Applied to the main application class. Repeatable.
@DeclareDependency(
name = "postgres",
type = DependencyType.DATABASE,
criticality = DependencyCriticality.REQUIRED,
timeoutMs = 5000,
circuitBreaker = true
)
@DeclareDependency(
name = "stripe-api",
type = DependencyType.EXTERNAL_API,
criticality = DependencyCriticality.REQUIRED,
fallback = "Return cached pricing"
)
| Parameter |
Type |
Default |
Description |
name |
String |
required |
Dependency name |
type |
DependencyType |
required |
Dependency type |
criticality |
DependencyCriticality |
REQUIRED |
Impact of failure |
protocol |
String |
"" |
Communication protocol |
timeoutMs |
long |
0 |
Configured timeout in milliseconds |
circuitBreaker |
boolean |
false |
Whether circuit breaker is enabled |
fallback |
String |
"" |
Fallback description |
healthEndpoint |
String |
"" |
Health check endpoint |
@DeclareConsumer
Declares a downstream consumer of this service. Applied to the main application class. Repeatable.
@DeclareConsumer(
name = "checkout-service",
pattern = ConsumptionPattern.SYNCHRONOUS,
slaLatencyMs = 200
)
| Parameter |
Type |
Default |
Description |
name |
String |
required |
Consumer service name |
pattern |
ConsumptionPattern |
SYNCHRONOUS |
How the consumer calls this service |
slaLatencyMs |
long |
0 |
Consumer's latency SLA |
@AgentOperation
Method-level annotation declaring operational semantics. Applied to Spring MVC/WebFlux endpoints or any traced method.
@AgentOperation(
expectedLatencyP50 = "45ms",
expectedLatencyP99 = "200ms",
expectedErrorRate = 0.001,
retryable = true,
idempotent = true,
runbookUrl = "https://wiki/runbooks/process-payment",
fallbackDescription = "Return cached pricing",
escalationLevel = EscalationLevel.PAGE_ONCALL,
safeToRestart = true
)
| Parameter |
Type |
Default |
Description |
expectedLatencyP50 |
String |
"" |
Expected P50 latency (e.g., "45ms", "1.5s") |
expectedLatencyP99 |
String |
"" |
Expected P99 latency |
expectedErrorRate |
double |
0.0 |
Expected error rate (0.0–1.0) |
retryable |
boolean |
false |
Whether the operation can be retried |
retryAfterMs |
long |
0 |
Suggested retry delay |
idempotent |
boolean |
false |
Whether retries are safe (same result) |
runbookUrl |
String |
"" |
Operational runbook URL |
fallbackDescription |
String |
"" |
Description of fallback behavior |
escalationLevel |
EscalationLevel |
NOTIFY_TEAM |
How to escalate issues |
safeToRestart |
boolean |
false |
Whether the service can be safely restarted |
Programmatic API
AgentTelEngine
The main orchestrator. Builds and wires all core components.
AgentTelEngine engine = AgentTelEngine.builder()
.openTelemetry(openTelemetry)
.topology(topologyRegistry)
.addStaticBaseline("POST /api/payments",
new OperationBaseline(45.0, 200.0, 0.001))
.patternMatcher(new PatternMatcher(2.0, 5.0, 3))
.rollingBaselineProvider(new RollingBaselineProvider(1000, 10))
.sloTracker(sloTracker)
.build();
// Get the SpanProcessor to register with OTel SDK
SpanProcessor processor = engine.createSpanProcessor();
TopologyRegistry
TopologyRegistry topology = new TopologyRegistry();
topology.setTeam("payments-platform");
topology.setTier(ServiceTier.CRITICAL);
topology.setDomain("commerce");
topology.setOnCallChannel("#payments-oncall");
topology.registerDependency(new DependencyDescriptor(
"postgres", DependencyType.DATABASE, DependencyCriticality.REQUIRED,
"postgresql", 5000, true, "", "/health/postgres"
));
topology.registerConsumer(new ConsumerDescriptor(
"checkout-service", ConsumptionPattern.SYNCHRONOUS, 200
));
RollingBaselineProvider
RollingBaselineProvider rolling = new RollingBaselineProvider(1000, 10);
// Record observations (typically done by SpanProcessor)
rolling.record("POST /api/payments", 45.0, false);
// Query baseline
Optional<RollingWindow.Snapshot> snapshot = rolling.getSnapshot("POST /api/payments");
// snapshot.p50(), snapshot.p99(), snapshot.mean(), snapshot.stddev(), snapshot.errorRate()
// As a BaselineProvider
Optional<OperationBaseline> baseline = rolling.getBaseline("POST /api/payments");
CompositeBaselineProvider
// Chain: static → rolling → default
CompositeBaselineProvider composite = new CompositeBaselineProvider(
staticProvider, rollingProvider
);
// Returns the first non-empty baseline
Optional<OperationBaseline> baseline = composite.getBaseline("POST /api/payments");
SloTracker
SloTracker tracker = new SloTracker();
tracker.register(SloDefinition.builder("payment-availability")
.operationName("POST /api/payments")
.type(SloDefinition.SloType.AVAILABILITY)
.target(0.999) // 99.9%
.build());
tracker.register(SloDefinition.builder("payment-latency")
.operationName("POST /api/payments")
.type(SloDefinition.SloType.LATENCY_P99)
.target(0.99) // 99% under P99 threshold
.build());
// Record (typically done by SpanProcessor)
tracker.recordSuccess("POST /api/payments");
tracker.recordFailure("POST /api/payments");
// Query
SloStatus status = tracker.getStatus("payment-availability");
// status.target() → 0.999
// status.actual() → 0.995
// status.budgetRemaining() → 0.50
// status.burnRate() → 0.50
// Alert check
List<SloAlert> alerts = tracker.checkAlerts();
// alerts[0].severity() → CRITICAL | WARNING | INFO
PatternMatcher
PatternMatcher matcher = new PatternMatcher(2.0, 5.0, 3);
// Feed observations (typically done by SpanProcessor)
matcher.recordLatency("POST /api/payments", 312.0);
matcher.recordDependencyError("stripe-api");
// Detect patterns
List<IncidentPattern> patterns = matcher.detectPatterns(
"POST /api/payments", 312.0, true, baselineSnapshot
);
// patterns may contain: LATENCY_DEGRADATION, CASCADE_FAILURE, etc.
ServiceHealthAggregator
ServiceHealthAggregator health = new ServiceHealthAggregator(rollingBaselines, sloTracker);
// Feed span data
health.recordSpan("POST /api/payments", 312.0, false);
health.recordDependencyCall("stripe-api", 2100.0, true);
// Query
ServiceHealthSummary summary = health.getHealthSummary("payment-service");
Optional<OperationSummary> op = health.getOperationHealth("POST /api/payments");
MCP Server
McpServer server = new AgentTelMcpServerBuilder()
.port(8081)
.contextProvider(agentContextProvider)
.remediationExecutor(remediationExecutor)
.build();
// Register custom tools
server.registerTool(
new McpToolDefinition("custom_tool", "Description",
Map.of("param", new ParameterDefinition("string", "Param description")),
List.of("param")),
args -> "Result: " + args.get("param")
);
server.start();
// ...
server.stop();
Configuration Properties
All properties are under the agenttel prefix in application.yml or application.properties.
Topology
agenttel:
topology:
team: payments-platform # Owning team
tier: critical # critical | standard | internal | experimental
domain: commerce # Business domain
on-call-channel: "#payments-oncall"
repo-url: "https://github.com/org/payment-service"
Dependencies
agenttel:
dependencies:
- name: postgres
type: database # database | rest_api | grpc | message_broker | cache | ...
criticality: required # required | degraded | optional
protocol: postgresql
timeout-ms: 5000
circuit-breaker: true
fallback: "Return cached data"
health-endpoint: "/health/postgres"
- name: stripe-api
type: rest_api
criticality: required
Consumers
agenttel:
consumers:
- name: checkout-service
pattern: synchronous # synchronous | asynchronous | batch | streaming
sla-latency-ms: 200
Baselines
agenttel:
baselines:
rolling-window-size: 1000 # Observations per sliding window (default: 1000)
rolling-min-samples: 10 # Min samples before baseline is valid (default: 10)
Anomaly Detection
agenttel:
anomaly-detection:
z-score-threshold: 3.0 # Z-score threshold for anomaly detection (default: 3.0)
Enums Reference
ServiceTier
| Value |
Description |
CRITICAL |
User-facing, revenue-impacting |
STANDARD |
Important but not immediately revenue-impacting |
INTERNAL |
Internal tooling |
EXPERIMENTAL |
Non-production |
DependencyType
| Value |
Description |
INTERNAL_SERVICE |
Another service in the same organization |
EXTERNAL_API |
Third-party API |
DATABASE |
Database (SQL or NoSQL) |
MESSAGE_BROKER |
Kafka, RabbitMQ, SQS, etc. |
CACHE |
Redis, Memcached, etc. |
OBJECT_STORE |
S3, GCS, etc. |
IDENTITY_PROVIDER |
Auth0, Okta, etc. |
DependencyCriticality
| Value |
Description |
REQUIRED |
Failure causes outage |
DEGRADED |
Failure causes reduced functionality |
OPTIONAL |
Failure has no user impact |
EscalationLevel
| Value |
Description |
AUTO_RESOLVE |
Agent can handle autonomously |
NOTIFY_TEAM |
Async notification |
PAGE_ONCALL |
Page on-call engineer |
INCIDENT_COMMANDER |
Full incident management |
ConsumptionPattern
| Value |
Description |
SYNCHRONOUS |
Request-response |
ASYNCHRONOUS |
Fire-and-forget or callback |
BATCH |
Periodic bulk processing |
STREAMING |
Continuous data flow |
IncidentPattern
| Value |
Description |
CASCADE_FAILURE |
Multiple downstream failures |
MEMORY_LEAK |
Monotonically increasing latency |
THUNDERING_HERD |
Traffic spike after recovery |
COLD_START |
High latency on fresh instances |
ERROR_RATE_SPIKE |
Sudden error rate increase |
LATENCY_DEGRADATION |
Sustained latency elevation |