Skip to content

Semantic Conventions

AgentTel defines a set of semantic convention extensions to OpenTelemetry, organized into categories of agent-ready attributes plus structured events. Backend attributes use the agenttel.* namespace and frontend attributes use the agenttel.client.* namespace. All coexist with standard OTel conventions.

Design Philosophy

Standard OpenTelemetry conventions answer "What happened?" — an HTTP span records the method, URL, status code, and duration. AgentTel adds "What does an AI agent need to know to reason about and act on this?" — the behavioral baseline, whether retrying is safe, who to page, and what the dependency graph looks like.


1. Topology Attributes

Service identity and dependency graph. Set as resource attributes at startup.

Service Identity

Attribute Type Description Example
agenttel.topology.team string Owning team identifier "payments-platform"
agenttel.topology.tier string Service criticality tier "critical"
agenttel.topology.domain string Business domain "commerce"
agenttel.topology.on_call_channel string Escalation channel "#payments-oncall"
agenttel.topology.repo_url string Source repository URL "https://github.com/org/repo"

Service Tiers

Tier Value Meaning
Critical "critical" User-facing, revenue-impacting. Pages on-call immediately.
Standard "standard" Important but not immediately revenue-impacting.
Internal "internal" Internal tooling and infrastructure.
Experimental "experimental" Non-production or experimental services.

Dependency Graph

Attribute Type Description
agenttel.topology.dependencies string (JSON) JSON array of dependency descriptors
agenttel.topology.consumers string (JSON) JSON array of consumer descriptors

Dependency Descriptor Schema:

{
  "name": "postgres",
  "type": "database",
  "criticality": "required",
  "protocol": "postgresql",
  "timeout_ms": 5000,
  "circuit_breaker": true,
  "fallback": "Return cached data",
  "health_endpoint": "/health/postgres"
}

Dependency Types: internal_service, external_api, database, message_broker, cache, object_store, identity_provider

Dependency Criticality:

Level Value Meaning
Required "required" Failure causes outage. No fallback.
Degraded "degraded" Failure causes reduced functionality. Partial fallback available.
Optional "optional" Failure has no direct user impact.

Consumer Descriptor Schema:

{
  "name": "checkout-service",
  "consumption_pattern": "synchronous",
  "sla_latency_ms": 200
}

Consumption Patterns: synchronous, asynchronous, batch, streaming


2. Baseline Attributes

What "normal" looks like for each operation. Set as span attributes by the AgentTelSpanProcessor.

Attribute Type Description Example
agenttel.baseline.latency_p50_ms double Expected P50 latency 45.0
agenttel.baseline.latency_p99_ms double Expected P99 latency 200.0
agenttel.baseline.error_rate double Expected error rate (0.0–1.0) 0.001
agenttel.baseline.throughput_rps double Expected requests per second 150.0
agenttel.baseline.source string How the baseline was determined "static"

Baseline Sources

Source Value Description
Static "static" From @AgentOperation annotation or configuration file
Rolling "rolling" Computed from a sliding window of observed traffic
Composite "composite" Static baseline with rolling fallback for gaps
Default "default" System default when no baseline is available

Rolling Baseline Metrics

The RollingBaselineProvider maintains per-operation sliding windows that compute:

Metric Description
P50, P95, P99 Latency percentiles from observed traffic
Mean, Stddev Statistical summary for z-score anomaly detection
Error Rate Observed error rate over the window
Sample Count Number of observations in the current window

Baseline Confidence

Added at export time by the AgentTelEnrichingSpanExporter. Tells agents how much to trust the baseline.

Attribute Type Description Example
agenttel.baseline.sample_count long Number of observations in current baseline 250
agenttel.baseline.confidence string Confidence level based on sample count "high"
Sample Count Confidence Meaning
< 30 "low" Baseline is unreliable — insufficient data
30–200 "medium" Baseline is usable but may not capture edge cases
> 200 "high" Baseline is reliable and statistically significant

Configuration

Property Default Description
agenttel.baselines.rolling-window-size 1000 Number of observations per sliding window
agenttel.baselines.rolling-min-samples 10 Minimum samples before a rolling baseline is considered valid

3. Decision Attributes

What an AI agent is permitted and equipped to do. Set as span attributes from @AgentOperation annotations.

Attribute Type Description Example
agenttel.decision.retryable boolean Whether the operation can be retried true
agenttel.decision.retry_after_ms long Suggested retry delay in milliseconds 1000
agenttel.decision.idempotent boolean Whether repeated calls produce the same result true
agenttel.decision.fallback_available boolean Whether a fallback path exists true
agenttel.decision.fallback_description string Human-readable fallback description "Return cached pricing"
agenttel.decision.runbook_url string Link to operational runbook "https://wiki/..."
agenttel.decision.escalation_level string Escalation procedure "page_oncall"
agenttel.decision.safe_to_restart boolean Whether service restart is safe during this operation true

Escalation Levels

Level Value Meaning
Auto-Resolve "auto_resolve" Agent can handle autonomously without human involvement
Notify Team "notify_team" Send asynchronous notification to the owning team
Page On-Call "page_oncall" Page the on-call engineer immediately
Incident Commander "incident_commander" Escalate to incident management process

4. Anomaly Attributes

Real-time deviation detection. Set as span attributes by the AgentTelSpanProcessor when anomalous behavior is detected.

Attribute Type Description Example
agenttel.anomaly.detected boolean Whether an anomaly was detected on this span true
agenttel.anomaly.pattern string Identified incident pattern "cascade_failure"
agenttel.anomaly.score double Anomaly severity score (0.0–1.0) 0.85
agenttel.anomaly.latency_z_score double Z-score of latency deviation from baseline 4.2

Incident Patterns

Pattern Value Detection Method Description
Cascade Failure "cascade_failure" 3+ dependencies with errors in recent window Multiple downstream services failing simultaneously
Latency Degradation "latency_degradation" Current latency > 2x rolling P50 Sustained latency elevation above baseline
Error Rate Spike "error_rate_spike" Recent error rate > 5x baseline Sudden increase in error rate
Memory Leak "memory_leak" Positive slope in latency linear regression Monotonically increasing latency trend
Thundering Herd "thundering_herd" Traffic burst exceeding normal patterns Sudden traffic spike after recovery
Cold Start "cold_start" High latency with low request count Elevated latency on fresh instances

Detection Configuration

Property Default Description
agenttel.anomaly-detection.z-score-threshold 3.0 Z-score above which latency is anomalous
latencyDegradationThreshold 2.0 Multiplier over P50 to trigger degradation pattern
errorRateSpikeThreshold 5.0 Multiplier over baseline error rate to trigger spike pattern
cascadeFailureMinServices 3 Minimum failing dependencies for cascade detection

5. Error Classification Attributes

Structured error categorization added at export time by the AgentTelEnrichingSpanExporter. Tells agents why a span failed, not just that it failed.

Attribute Type Description Example
agenttel.error.category string Error category for agent decision-making "dependency_timeout"
agenttel.error.root_exception string Root exception class name "java.net.SocketTimeoutException"
agenttel.error.dependency string Dependency involved in the error (if applicable) "postgres"

Error Categories

Category Value Classification Rules Agent Action
Dependency Timeout "dependency_timeout" Exception contains Timeout/SocketTimeout Retry with backoff, check dependency health
Connection Error "connection_error" Exception contains Connection/ConnectException Check dependency availability, circuit break
Code Bug "code_bug" NullPointer, ClassCast, IndexOutOfBounds, IllegalState Do not retry — needs code fix
Rate Limited "rate_limited" HTTP 429 Back off, reduce traffic, request quota increase
Auth Failure "auth_failure" HTTP 401/403 Check credentials/tokens, do not retry
Resource Exhaustion "resource_exhaustion" OutOfMemory, StackOverflow Scale up, restart instances
Data Validation "data_validation" HTTP 400/422, Validation/IllegalArgument exceptions Do not retry — fix input
Unknown "unknown" Everything else Investigate manually

6. Causality & Severity Attributes

Root cause analysis and business impact assessment, added at export time by the AgentTelEnrichingSpanExporter.

Causality Attributes

Attribute Type Description Example
agenttel.cause.hint string Human-readable cause description "Dependency postgres is unhealthy: Connection refused"
agenttel.cause.category string Cause category "dependency"
agenttel.cause.dependency string Specific dependency if cause is dependency-related "postgres"

Cause Categories: dependency, code, infrastructure, traffic, unknown

Severity Attributes

Attribute Type Description Example
agenttel.severity.anomaly_score double Anomaly score (mirrors anomaly.score) 0.85
agenttel.severity.user_facing boolean Whether this affects user-facing services true
agenttel.severity.business_impact string Business impact level "critical"

Business Impact Levels:

Impact Condition
"critical" Anomaly score > 0.8
"high" Error on critical-tier service
"medium" Error on standard service or moderate anomaly
"low" Minor anomaly or data validation error

7. Change Correlation Attributes

Correlates anomalies with recent changes. Added to incident context by the ChangeCorrelationEngine.

Attribute Type Description Example
agenttel.correlation.likely_cause string Most likely change type "deployment"
agenttel.correlation.change_id string ID of the correlated change "deploy-v2.1.0"
agenttel.correlation.time_delta_ms long Time between change and anomaly onset 1800000
agenttel.correlation.confidence double Correlation confidence (0.0–1.0) 0.85

Change Types: DEPLOYMENT, CONFIG, SCALING, FEATURE_FLAG, DEPENDENCY_UPDATE


SLO Attributes

Error budget consumption tracking. Set as span attributes when SLOs are registered.

Attribute Type Description Example
agenttel.slo.name string SLO identifier "payment-availability"
agenttel.slo.target double SLO target (0.0–1.0) 0.999
agenttel.slo.budget_remaining double Remaining error budget fraction (0.0–1.0) 0.85
agenttel.slo.burn_rate double Budget consumption rate 0.15

SLO Types

Type Description Example Target
AVAILABILITY Percentage of successful (non-error) requests 99.9%
LATENCY_P99 Percentage of requests completing under P99 threshold 99.0%
LATENCY_P50 Percentage of requests completing under P50 threshold 95.0%
ERROR_RATE Maximum acceptable error rate 0.1%

Alert Thresholds

Budget alerts are emitted when remaining budget crosses these thresholds:

Remaining Budget Severity Action
<= 50% INFO Informational — budget consumption is elevated
<= 25% WARNING Warning — budget at risk of exhaustion
<= 10% CRITICAL Critical — budget nearly exhausted, immediate action needed

6. GenAI Attributes

Extensions for AI/ML workload observability. Set on spans created by GenAI instrumentation wrappers.

Standard OTel GenAI Attributes

AgentTel populates the emerging OTel GenAI semantic conventions:

Attribute Type Description
gen_ai.operation.name string "chat", "text_completion", "embeddings"
gen_ai.system string Provider: "openai", "anthropic", "aws_bedrock"
gen_ai.request.model string Requested model identifier
gen_ai.response.model string Actual model used in response
gen_ai.usage.input_tokens long Input/prompt token count
gen_ai.usage.output_tokens long Output/completion token count
gen_ai.response.finish_reasons string[] Completion stop reasons

AgentTel GenAI Extensions

Attribute Type Description Example
agenttel.genai.framework string Instrumentation source "langchain4j", "spring_ai"
agenttel.genai.cost_usd double Estimated cost in USD 0.000795
agenttel.genai.prompt_template_id string Prompt template identifier "customer-support-v2"
agenttel.genai.prompt_template_version string Prompt template version "1.3"
agenttel.genai.rag_source_count long Number of RAG sources retrieved 5
agenttel.genai.rag_relevance_score_avg double Average relevance score 0.87
agenttel.genai.guardrail_triggered boolean Whether a guardrail fired false
agenttel.genai.guardrail_name string Name of triggered guardrail "pii_filter"
agenttel.genai.cache_hit boolean Whether a cached response was used false

7. Frontend Attributes

Client-side telemetry from agenttel-web (browser SDK). Set on spans emitted by the browser and exported via OTLP.

Resource Attributes

Set once per browser application at initialization.

Attribute Type Description Example
agenttel.client.app.name string Application name "checkout-web"
agenttel.client.app.version string Application version "1.0.0"
agenttel.client.app.platform string Platform identifier "browser"
agenttel.client.app.environment string Deployment environment "production"
agenttel.client.topology.team string Owning team "checkout-frontend"
agenttel.client.topology.domain string Business domain "commerce"

Page & Route Attributes

Attribute Type Description Example
agenttel.client.page.url string Current page URL (path only, no query/hash) "/checkout/payment"
agenttel.client.page.route string Matched route pattern "/checkout/:step"
agenttel.client.page.title string Document title "Checkout - Payment"
agenttel.client.page.business_criticality string Route business criticality "revenue"

Business Criticality Values: revenue, engagement, internal

Baseline Attributes

Per-route baselines for frontend operations.

Attribute Type Description Example
agenttel.client.baseline.page_load_p50_ms double Expected page load P50 800.0
agenttel.client.baseline.page_load_p99_ms double Expected page load P99 2000.0
agenttel.client.baseline.api_call_p50_ms double Expected API response P50 300.0
agenttel.client.baseline.error_rate double Expected error rate (0.0–1.0) 0.01

Decision Attributes

Per-route decision metadata for agent reasoning.

Attribute Type Description Example
agenttel.client.decision.escalation_level string Escalation procedure "page_oncall"
agenttel.client.decision.runbook_url string Operational runbook "https://wiki/runbooks/checkout"
agenttel.client.decision.fallback_page string Fallback route on failure "/maintenance"
agenttel.client.decision.retry_on_failure boolean Whether to retry failed page loads true

Anomaly Attributes

Client-side anomaly detection results.

Attribute Type Description Example
agenttel.client.anomaly.detected boolean Whether a client-side anomaly was detected true
agenttel.client.anomaly.pattern string Detected anomaly pattern "rage_click"
agenttel.client.anomaly.score double Anomaly severity (0.0–1.0) 0.75

Client-Side Anomaly Patterns:

Pattern Value Detection Description
Rage Click "rage_click" N+ clicks on same element within time window User frustration — UI is unresponsive
API Failure Cascade "api_failure_cascade" N+ API failures within time window Backend instability visible to user
Slow Page Load "slow_page_load" Load time exceeds baseline by multiplier Performance degradation on route
Error Loop "error_loop" N+ errors on same route within time window Repeating failure preventing user progress
Funnel Drop-off "funnel_dropoff" Journey abandonment above baseline User journey failing at specific step

Journey Attributes

Multi-step user journey tracking.

Attribute Type Description Example
agenttel.client.journey.name string Journey identifier "checkout"
agenttel.client.journey.step int Current step index (0-based) 3
agenttel.client.journey.step_name string Step route/name "/checkout/payment"
agenttel.client.journey.status string Journey status "in_progress"
agenttel.client.journey.duration_ms double Time since journey start 45000.0

Journey Status Values: in_progress, completed, abandoned

Correlation Attributes

Cross-stack trace linking between frontend and backend.

Attribute Type Description Example
agenttel.client.correlation.backend_trace_id string Backend trace ID from response "abc123def456"
agenttel.client.correlation.traceparent string W3C Trace Context header sent "00-abc...-01"

Page Load Attributes

Captured from the Navigation Timing API on page load spans.

Attribute Type Description Example
agenttel.client.page_load.dom_load_ms double DOM content loaded time 450.0
agenttel.client.page_load.ttfb_ms double Time to first byte 120.0
agenttel.client.page_load.transfer_size_bytes long Page transfer size 245000

API Call Attributes

Captured from intercepted fetch and XMLHttpRequest calls.

Attribute Type Description Example
agenttel.client.api.method string HTTP method "POST"
agenttel.client.api.url string Request URL (path only) "/api/payments"
agenttel.client.api.status_code int Response status code 200
agenttel.client.api.duration_ms double Response time 312.0

Anomaly Detection Configuration

Property Default Description
rageClickThreshold 3 Clicks on same element to trigger rage click
rageClickWindowMs 2000 Time window for rage click detection
apiFailureCascadeThreshold 3 API failures to trigger cascade
apiFailureCascadeWindowMs 10000 Time window for cascade detection
slowPageLoadMultiplier 2.0 Multiplier over baseline P50 to trigger slow load
errorLoopThreshold 5 Errors on same route to trigger error loop
errorLoopWindowMs 30000 Time window for error loop detection

8. Structured Events

AgentTel emits structured events via the OTel Logs API for significant state changes that agents should react to.

agenttel.anomaly.detected

Emitted when a span's behavior deviates significantly from baseline.

{
  "event.name": "agenttel.anomaly.detected",
  "severity": "WARN",
  "body": {
    "operation": "POST /api/payments",
    "pattern": "latency_degradation",
    "anomaly_score": 0.85,
    "z_score": 4.2,
    "current_latency_ms": 312.0,
    "baseline_p50_ms": 45.0
  }
}

agenttel.slo.budget_alert

Emitted when an SLO's error budget crosses a threshold (50%, 25%, 10%).

{
  "event.name": "agenttel.slo.budget_alert",
  "severity": "WARN",
  "body": {
    "slo_name": "payment-availability",
    "severity": "WARNING",
    "budget_remaining": 0.22,
    "burn_rate": 0.78
  }
}

agenttel.dependency.state_change

Emitted when a dependency's observed health transitions.

{
  "event.name": "agenttel.dependency.state_change",
  "severity": "WARN",
  "body": {
    "dependency": "postgres",
    "previous_state": "healthy",
    "current_state": "degraded",
    "error_rate": 0.15
  }
}

Relationship to OpenTelemetry

AgentTel is a strict extension of OpenTelemetry. Backend attributes use the agenttel.* namespace, frontend attributes use agenttel.client.*, and GenAI attributes use the emerging gen_ai.* conventions. AgentTel-enriched spans remain fully compatible with any OTel backend — Jaeger, Zipkin, Grafana Tempo, Datadog, Splunk, New Relic, and others.

The backend library implements standard OTel interfaces (SpanProcessor, SpanExporter, Resource) and composes cleanly with any other OTel instrumentation. The frontend SDK exports spans via OTLP HTTP to any OTel-compatible collector.