API Reference¶

Complete reference for the AgentTel API surface — annotations, programmatic APIs, configuration properties, and enums.

Annotations¶

@AgentObservable¶

Service-level annotation declaring topology metadata. Applied to the main application class.

@AgentObservable(
    service = "payment-service",
    team = "payments-platform",
    tier = ServiceTier.CRITICAL,
    domain = "commerce",
    onCallChannel = "#payments-oncall"
)
@SpringBootApplication
public class PaymentServiceApplication { }

Parameter	Type	Default	Description
`service`	`String`	`""`	Service name (defaults to Spring application name)
`team`	`String`	required	Owning team identifier
`tier`	`ServiceTier`	`STANDARD`	Service criticality tier
`domain`	`String`	`""`	Business domain
`onCallChannel`	`String`	`""`	Escalation channel

@DeclareDependency¶

Declares a service dependency. Applied to the main application class. Repeatable.

@DeclareDependency(
    name = "postgres",
    type = DependencyType.DATABASE,
    criticality = DependencyCriticality.REQUIRED,
    timeoutMs = 5000,
    circuitBreaker = true
)
@DeclareDependency(
    name = "stripe-api",
    type = DependencyType.EXTERNAL_API,
    criticality = DependencyCriticality.REQUIRED,
    fallback = "Return cached pricing"
)

Parameter	Type	Default	Description
`name`	`String`	required	Dependency name
`type`	`DependencyType`	required	Dependency type
`criticality`	`DependencyCriticality`	`REQUIRED`	Impact of failure
`protocol`	`String`	`""`	Communication protocol
`timeoutMs`	`long`	`0`	Configured timeout in milliseconds
`circuitBreaker`	`boolean`	`false`	Whether circuit breaker is enabled
`fallback`	`String`	`""`	Fallback description
`healthEndpoint`	`String`	`""`	Health check endpoint

@DeclareConsumer¶

Declares a downstream consumer of this service. Applied to the main application class. Repeatable.

@DeclareConsumer(
    name = "checkout-service",
    pattern = ConsumptionPattern.SYNCHRONOUS,
    slaLatencyMs = 200
)

Parameter	Type	Default	Description
`name`	`String`	required	Consumer service name
`pattern`	`ConsumptionPattern`	`SYNCHRONOUS`	How the consumer calls this service
`slaLatencyMs`	`long`	`0`	Consumer's latency SLA

@AgentOperation¶

Method-level annotation declaring operational semantics. Applied to Spring MVC/WebFlux endpoints or any traced method.

@AgentOperation(
    expectedLatencyP50 = "45ms",
    expectedLatencyP99 = "200ms",
    expectedErrorRate = 0.001,
    retryable = true,
    idempotent = true,
    runbookUrl = "https://wiki/runbooks/process-payment",
    fallbackDescription = "Return cached pricing",
    escalationLevel = EscalationLevel.PAGE_ONCALL,
    safeToRestart = true
)

Parameter	Type	Default	Description
`expectedLatencyP50`	`String`	`""`	Expected P50 latency (e.g., `"45ms"`, `"1.5s"`)
`expectedLatencyP99`	`String`	`""`	Expected P99 latency
`expectedErrorRate`	`double`	`0.0`	Expected error rate (0.0–1.0)
`retryable`	`boolean`	`false`	Whether the operation can be retried
`retryAfterMs`	`long`	`0`	Suggested retry delay
`idempotent`	`boolean`	`false`	Whether retries are safe (same result)
`runbookUrl`	`String`	`""`	Operational runbook URL
`fallbackDescription`	`String`	`""`	Description of fallback behavior
`escalationLevel`	`EscalationLevel`	`NOTIFY_TEAM`	How to escalate issues
`safeToRestart`	`boolean`	`false`	Whether the service can be safely restarted

Programmatic API¶

AgentTelEngine¶

The main orchestrator. Builds and wires all core components.

AgentTelEngine engine = AgentTelEngine.builder()
    .openTelemetry(openTelemetry)
    .topology(topologyRegistry)
    .addStaticBaseline("POST /api/payments",
        new OperationBaseline(45.0, 200.0, 0.001))
    .patternMatcher(new PatternMatcher(2.0, 5.0, 3))
    .rollingBaselineProvider(new RollingBaselineProvider(1000, 10))
    .sloTracker(sloTracker)
    .build();

// Get the SpanProcessor to register with OTel SDK
SpanProcessor processor = engine.createSpanProcessor();

TopologyRegistry¶

TopologyRegistry topology = new TopologyRegistry();
topology.setTeam("payments-platform");
topology.setTier(ServiceTier.CRITICAL);
topology.setDomain("commerce");
topology.setOnCallChannel("#payments-oncall");

topology.registerDependency(new DependencyDescriptor(
    "postgres", DependencyType.DATABASE, DependencyCriticality.REQUIRED,
    "postgresql", 5000, true, "", "/health/postgres"
));

topology.registerConsumer(new ConsumerDescriptor(
    "checkout-service", ConsumptionPattern.SYNCHRONOUS, 200
));

RollingBaselineProvider¶

RollingBaselineProvider rolling = new RollingBaselineProvider(1000, 10);

// Record observations (typically done by SpanProcessor)
rolling.record("POST /api/payments", 45.0, false);

// Query baseline
Optional<RollingWindow.Snapshot> snapshot = rolling.getSnapshot("POST /api/payments");
// snapshot.p50(), snapshot.p99(), snapshot.mean(), snapshot.stddev(), snapshot.errorRate()

// As a BaselineProvider
Optional<OperationBaseline> baseline = rolling.getBaseline("POST /api/payments");

CompositeBaselineProvider¶

// Chain: static → rolling → default
CompositeBaselineProvider composite = new CompositeBaselineProvider(
    staticProvider, rollingProvider
);

// Returns the first non-empty baseline
Optional<OperationBaseline> baseline = composite.getBaseline("POST /api/payments");

SloTracker¶

SloTracker tracker = new SloTracker();

tracker.register(SloDefinition.builder("payment-availability")
    .operationName("POST /api/payments")
    .type(SloDefinition.SloType.AVAILABILITY)
    .target(0.999)  // 99.9%
    .build());

tracker.register(SloDefinition.builder("payment-latency")
    .operationName("POST /api/payments")
    .type(SloDefinition.SloType.LATENCY_P99)
    .target(0.99)   // 99% under P99 threshold
    .build());

// Record (typically done by SpanProcessor)
tracker.recordSuccess("POST /api/payments");
tracker.recordFailure("POST /api/payments");

// Query
SloStatus status = tracker.getStatus("payment-availability");
// status.target()          → 0.999
// status.actual()          → 0.995
// status.budgetRemaining() → 0.50
// status.burnRate()        → 0.50

// Alert check
List<SloAlert> alerts = tracker.checkAlerts();
// alerts[0].severity()  → CRITICAL | WARNING | INFO

PatternMatcher¶

PatternMatcher matcher = new PatternMatcher(2.0, 5.0, 3);

// Feed observations (typically done by SpanProcessor)
matcher.recordLatency("POST /api/payments", 312.0);
matcher.recordDependencyError("stripe-api");

// Detect patterns
List<IncidentPattern> patterns = matcher.detectPatterns(
    "POST /api/payments", 312.0, true, baselineSnapshot
);
// patterns may contain: LATENCY_DEGRADATION, CASCADE_FAILURE, etc.

ServiceHealthAggregator¶

ServiceHealthAggregator health = new ServiceHealthAggregator(rollingBaselines, sloTracker);

// Feed span data
health.recordSpan("POST /api/payments", 312.0, false);
health.recordDependencyCall("stripe-api", 2100.0, true);

// Query
ServiceHealthSummary summary = health.getHealthSummary("payment-service");
Optional<OperationSummary> op = health.getOperationHealth("POST /api/payments");

MCP Server¶

McpServer server = new AgentTelMcpServerBuilder()
    .port(8081)
    .contextProvider(agentContextProvider)
    .remediationExecutor(remediationExecutor)
    .build();

// Register custom tools
server.registerTool(
    new McpToolDefinition("custom_tool", "Description",
        Map.of("param", new ParameterDefinition("string", "Param description")),
        List.of("param")),
    args -> "Result: " + args.get("param")
);

server.start();
// ...
server.stop();

Configuration Properties¶

All properties are under the agenttel prefix in application.yml or application.properties.

Topology¶

agenttel:
  topology:
    team: payments-platform       # Owning team
    tier: critical                # critical | standard | internal | experimental
    domain: commerce              # Business domain
    on-call-channel: "#payments-oncall"
    repo-url: "https://github.com/org/payment-service"

Dependencies¶

agenttel:
  dependencies:
    - name: postgres
      type: database              # database | rest_api | grpc | message_broker | cache | ...
      criticality: required       # required | degraded | optional
      protocol: postgresql
      timeout-ms: 5000
      circuit-breaker: true
      fallback: "Return cached data"
      health-endpoint: "/health/postgres"
    - name: stripe-api
      type: rest_api
      criticality: required

Consumers¶

agenttel:
  consumers:
    - name: checkout-service
      pattern: synchronous        # synchronous | asynchronous | batch | streaming
      sla-latency-ms: 200

Baselines¶

agenttel:
  baselines:
    rolling-window-size: 1000     # Observations per sliding window (default: 1000)
    rolling-min-samples: 10       # Min samples before baseline is valid (default: 10)

Anomaly Detection¶

agenttel:
  anomaly-detection:
    z-score-threshold: 3.0        # Z-score threshold for anomaly detection (default: 3.0)

Enums Reference¶

ServiceTier¶

Value	Description
`CRITICAL`	User-facing, revenue-impacting
`STANDARD`	Important but not immediately revenue-impacting
`INTERNAL`	Internal tooling
`EXPERIMENTAL`	Non-production

DependencyType¶

Value	Description
`INTERNAL_SERVICE`	Another service in the same organization
`EXTERNAL_API`	Third-party API
`DATABASE`	Database (SQL or NoSQL)
`MESSAGE_BROKER`	Kafka, RabbitMQ, SQS, etc.
`CACHE`	Redis, Memcached, etc.
`OBJECT_STORE`	S3, GCS, etc.
`IDENTITY_PROVIDER`	Auth0, Okta, etc.

DependencyCriticality¶

Value	Description
`REQUIRED`	Failure causes outage
`DEGRADED`	Failure causes reduced functionality
`OPTIONAL`	Failure has no user impact

EscalationLevel¶

Value	Description
`AUTO_RESOLVE`	Agent can handle autonomously
`NOTIFY_TEAM`	Async notification
`PAGE_ONCALL`	Page on-call engineer
`INCIDENT_COMMANDER`	Full incident management

ConsumptionPattern¶

Value	Description
`SYNCHRONOUS`	Request-response
`ASYNCHRONOUS`	Fire-and-forget or callback
`BATCH`	Periodic bulk processing
`STREAMING`	Continuous data flow

IncidentPattern¶

Value	Description
`CASCADE_FAILURE`	Multiple downstream failures
`MEMORY_LEAK`	Monotonically increasing latency
`THUNDERING_HERD`	Traffic spike after recovery
`COLD_START`	High latency on fresh instances
`ERROR_RATE_SPIKE`	Sudden error rate increase
`LATENCY_DEGRADATION`	Sustained latency elevation