Semantic Conventions¶

AgentTel defines a set of semantic convention extensions to OpenTelemetry, organized into categories of agent-ready attributes plus structured events. Backend attributes use the agenttel.* namespace and frontend attributes use the agenttel.client.* namespace. All coexist with standard OTel conventions.

Design Philosophy¶

Standard OpenTelemetry conventions answer "What happened?" — an HTTP span records the method, URL, status code, and duration. AgentTel adds "What does an AI agent need to know to reason about and act on this?" — the behavioral baseline, whether retrying is safe, who to page, and what the dependency graph looks like.

1. Topology Attributes¶

Service identity and dependency graph. Set as resource attributes at startup.

Service Identity¶

Attribute	Type	Description	Example
`agenttel.topology.team`	string	Owning team identifier	`"payments-platform"`
`agenttel.topology.tier`	string	Service criticality tier	`"critical"`
`agenttel.topology.domain`	string	Business domain	`"commerce"`
`agenttel.topology.on_call_channel`	string	Escalation channel	`"#payments-oncall"`
`agenttel.topology.repo_url`	string	Source repository URL	`"https://github.com/org/repo"`

Service Tiers¶

Tier	Value	Meaning
Critical	`"critical"`	User-facing, revenue-impacting. Pages on-call immediately.
Standard	`"standard"`	Important but not immediately revenue-impacting.
Internal	`"internal"`	Internal tooling and infrastructure.
Experimental	`"experimental"`	Non-production or experimental services.

Dependency Graph¶

Attribute	Type	Description
`agenttel.topology.dependencies`	string (JSON)	JSON array of dependency descriptors
`agenttel.topology.consumers`	string (JSON)	JSON array of consumer descriptors

Dependency Descriptor Schema:

{
  "name": "postgres",
  "type": "database",
  "criticality": "required",
  "protocol": "postgresql",
  "timeout_ms": 5000,
  "circuit_breaker": true,
  "fallback": "Return cached data",
  "health_endpoint": "/health/postgres"
}

Dependency Types: internal_service, external_api, database, message_broker, cache, object_store, identity_provider

Dependency Criticality:

Level	Value	Meaning
Required	`"required"`	Failure causes outage. No fallback.
Degraded	`"degraded"`	Failure causes reduced functionality. Partial fallback available.
Optional	`"optional"`	Failure has no direct user impact.

Consumer Descriptor Schema:

{
  "name": "checkout-service",
  "consumption_pattern": "synchronous",
  "sla_latency_ms": 200
}

Consumption Patterns: synchronous, asynchronous, batch, streaming

2. Baseline Attributes¶

What "normal" looks like for each operation. Set as span attributes by the AgentTelSpanProcessor.

Attribute	Type	Description	Example
`agenttel.baseline.latency_p50_ms`	double	Expected P50 latency	`45.0`
`agenttel.baseline.latency_p99_ms`	double	Expected P99 latency	`200.0`
`agenttel.baseline.error_rate`	double	Expected error rate (0.0–1.0)	`0.001`
`agenttel.baseline.throughput_rps`	double	Expected requests per second	`150.0`
`agenttel.baseline.source`	string	How the baseline was determined	`"static"`

Baseline Sources¶

Source	Value	Description
Static	`"static"`	From `@AgentOperation` annotation or configuration file
Rolling	`"rolling"`	Computed from a sliding window of observed traffic
Composite	`"composite"`	Static baseline with rolling fallback for gaps
Default	`"default"`	System default when no baseline is available

Rolling Baseline Metrics¶

The RollingBaselineProvider maintains per-operation sliding windows that compute:

Metric	Description
P50, P95, P99	Latency percentiles from observed traffic
Mean, Stddev	Statistical summary for z-score anomaly detection
Error Rate	Observed error rate over the window
Sample Count	Number of observations in the current window

Baseline Confidence¶

Added at export time by the AgentTelEnrichingSpanExporter. Tells agents how much to trust the baseline.

Attribute	Type	Description	Example
`agenttel.baseline.sample_count`	long	Number of observations in current baseline	`250`
`agenttel.baseline.confidence`	string	Confidence level based on sample count	`"high"`

Sample Count	Confidence	Meaning
< 30	`"low"`	Baseline is unreliable — insufficient data
30–200	`"medium"`	Baseline is usable but may not capture edge cases
> 200	`"high"`	Baseline is reliable and statistically significant

Configuration¶

Property	Default	Description
`agenttel.baselines.rolling-window-size`	`1000`	Number of observations per sliding window
`agenttel.baselines.rolling-min-samples`	`10`	Minimum samples before a rolling baseline is considered valid

3. Decision Attributes¶

What an AI agent is permitted and equipped to do. Set as span attributes from @AgentOperation annotations.

Attribute	Type	Description	Example
`agenttel.decision.retryable`	boolean	Whether the operation can be retried	`true`
`agenttel.decision.retry_after_ms`	long	Suggested retry delay in milliseconds	`1000`
`agenttel.decision.idempotent`	boolean	Whether repeated calls produce the same result	`true`
`agenttel.decision.fallback_available`	boolean	Whether a fallback path exists	`true`
`agenttel.decision.fallback_description`	string	Human-readable fallback description	`"Return cached pricing"`
`agenttel.decision.runbook_url`	string	Link to operational runbook	`"https://wiki/..."`
`agenttel.decision.escalation_level`	string	Escalation procedure	`"page_oncall"`
`agenttel.decision.safe_to_restart`	boolean	Whether service restart is safe during this operation	`true`

Escalation Levels¶

Level	Value	Meaning
Auto-Resolve	`"auto_resolve"`	Agent can handle autonomously without human involvement
Notify Team	`"notify_team"`	Send asynchronous notification to the owning team
Page On-Call	`"page_oncall"`	Page the on-call engineer immediately
Incident Commander	`"incident_commander"`	Escalate to incident management process

4. Anomaly Attributes¶

Real-time deviation detection. Set as span attributes by the AgentTelSpanProcessor when anomalous behavior is detected.

Attribute	Type	Description	Example
`agenttel.anomaly.detected`	boolean	Whether an anomaly was detected on this span	`true`
`agenttel.anomaly.pattern`	string	Identified incident pattern	`"cascade_failure"`
`agenttel.anomaly.score`	double	Anomaly severity score (0.0–1.0)	`0.85`
`agenttel.anomaly.latency_z_score`	double	Z-score of latency deviation from baseline	`4.2`

Incident Patterns¶

Pattern	Value	Detection Method	Description
Cascade Failure	`"cascade_failure"`	3+ dependencies with errors in recent window	Multiple downstream services failing simultaneously
Latency Degradation	`"latency_degradation"`	Current latency > 2x rolling P50	Sustained latency elevation above baseline
Error Rate Spike	`"error_rate_spike"`	Recent error rate > 5x baseline	Sudden increase in error rate
Memory Leak	`"memory_leak"`	Positive slope in latency linear regression	Monotonically increasing latency trend
Thundering Herd	`"thundering_herd"`	Traffic burst exceeding normal patterns	Sudden traffic spike after recovery
Cold Start	`"cold_start"`	High latency with low request count	Elevated latency on fresh instances

Detection Configuration¶

Property	Default	Description
`agenttel.anomaly-detection.z-score-threshold`	`3.0`	Z-score above which latency is anomalous
`latencyDegradationThreshold`	`2.0`	Multiplier over P50 to trigger degradation pattern
`errorRateSpikeThreshold`	`5.0`	Multiplier over baseline error rate to trigger spike pattern
`cascadeFailureMinServices`	`3`	Minimum failing dependencies for cascade detection

5. Error Classification Attributes¶

Structured error categorization added at export time by the AgentTelEnrichingSpanExporter. Tells agents why a span failed, not just that it failed.

Attribute	Type	Description	Example
`agenttel.error.category`	string	Error category for agent decision-making	`"dependency_timeout"`
`agenttel.error.root_exception`	string	Root exception class name	`"java.net.SocketTimeoutException"`
`agenttel.error.dependency`	string	Dependency involved in the error (if applicable)	`"postgres"`

Error Categories¶

Category	Value	Classification Rules	Agent Action
Dependency Timeout	`"dependency_timeout"`	Exception contains `Timeout`/`SocketTimeout`	Retry with backoff, check dependency health
Connection Error	`"connection_error"`	Exception contains `Connection`/`ConnectException`	Check dependency availability, circuit break
Code Bug	`"code_bug"`	`NullPointer`, `ClassCast`, `IndexOutOfBounds`, `IllegalState`	Do not retry — needs code fix
Rate Limited	`"rate_limited"`	HTTP 429	Back off, reduce traffic, request quota increase
Auth Failure	`"auth_failure"`	HTTP 401/403	Check credentials/tokens, do not retry
Resource Exhaustion	`"resource_exhaustion"`	`OutOfMemory`, `StackOverflow`	Scale up, restart instances
Data Validation	`"data_validation"`	HTTP 400/422, `Validation`/`IllegalArgument` exceptions	Do not retry — fix input
Unknown	`"unknown"`	Everything else	Investigate manually

6. Causality & Severity Attributes¶

Root cause analysis and business impact assessment, added at export time by the AgentTelEnrichingSpanExporter.

Causality Attributes¶

Attribute	Type	Description	Example
`agenttel.cause.hint`	string	Human-readable cause description	`"Dependency postgres is unhealthy: Connection refused"`
`agenttel.cause.category`	string	Cause category	`"dependency"`
`agenttel.cause.dependency`	string	Specific dependency if cause is dependency-related	`"postgres"`

Cause Categories: dependency, code, infrastructure, traffic, unknown

Severity Attributes¶

Attribute	Type	Description	Example
`agenttel.severity.anomaly_score`	double	Anomaly score (mirrors anomaly.score)	`0.85`
`agenttel.severity.user_facing`	boolean	Whether this affects user-facing services	`true`
`agenttel.severity.business_impact`	string	Business impact level	`"critical"`

Business Impact Levels:

Impact	Condition
`"critical"`	Anomaly score > 0.8
`"high"`	Error on critical-tier service
`"medium"`	Error on standard service or moderate anomaly
`"low"`	Minor anomaly or data validation error

7. Change Correlation Attributes¶

Correlates anomalies with recent changes. Added to incident context by the ChangeCorrelationEngine.

Attribute	Type	Description	Example
`agenttel.correlation.likely_cause`	string	Most likely change type	`"deployment"`
`agenttel.correlation.change_id`	string	ID of the correlated change	`"deploy-v2.1.0"`
`agenttel.correlation.time_delta_ms`	long	Time between change and anomaly onset	`1800000`
`agenttel.correlation.confidence`	double	Correlation confidence (0.0–1.0)	`0.85`

Change Types: DEPLOYMENT, CONFIG, SCALING, FEATURE_FLAG, DEPENDENCY_UPDATE

SLO Attributes¶

Error budget consumption tracking. Set as span attributes when SLOs are registered.

Attribute	Type	Description	Example
`agenttel.slo.name`	string	SLO identifier	`"payment-availability"`
`agenttel.slo.target`	double	SLO target (0.0–1.0)	`0.999`
`agenttel.slo.budget_remaining`	double	Remaining error budget fraction (0.0–1.0)	`0.85`
`agenttel.slo.burn_rate`	double	Budget consumption rate	`0.15`

SLO Types¶

Type	Description	Example Target
`AVAILABILITY`	Percentage of successful (non-error) requests	99.9%
`LATENCY_P99`	Percentage of requests completing under P99 threshold	99.0%
`LATENCY_P50`	Percentage of requests completing under P50 threshold	95.0%
`ERROR_RATE`	Maximum acceptable error rate	0.1%

Alert Thresholds¶

Budget alerts are emitted when remaining budget crosses these thresholds:

Remaining Budget	Severity	Action
<= 50%	`INFO`	Informational — budget consumption is elevated
<= 25%	`WARNING`	Warning — budget at risk of exhaustion
<= 10%	`CRITICAL`	Critical — budget nearly exhausted, immediate action needed

6. GenAI Attributes¶

Extensions for AI/ML workload observability. Set on spans created by GenAI instrumentation wrappers.

Standard OTel GenAI Attributes¶

AgentTel populates the emerging OTel GenAI semantic conventions:

Attribute	Type	Description
`gen_ai.operation.name`	string	`"chat"`, `"text_completion"`, `"embeddings"`
`gen_ai.system`	string	Provider: `"openai"`, `"anthropic"`, `"aws_bedrock"`
`gen_ai.request.model`	string	Requested model identifier
`gen_ai.response.model`	string	Actual model used in response
`gen_ai.usage.input_tokens`	long	Input/prompt token count
`gen_ai.usage.output_tokens`	long	Output/completion token count
`gen_ai.response.finish_reasons`	string[]	Completion stop reasons

AgentTel GenAI Extensions¶

Attribute	Type	Description	Example
`agenttel.genai.framework`	string	Instrumentation source	`"langchain4j"`, `"spring_ai"`
`agenttel.genai.cost_usd`	double	Estimated cost in USD	`0.000795`
`agenttel.genai.prompt_template_id`	string	Prompt template identifier	`"customer-support-v2"`
`agenttel.genai.prompt_template_version`	string	Prompt template version	`"1.3"`
`agenttel.genai.rag_source_count`	long	Number of RAG sources retrieved	`5`
`agenttel.genai.rag_relevance_score_avg`	double	Average relevance score	`0.87`
`agenttel.genai.guardrail_triggered`	boolean	Whether a guardrail fired	`false`
`agenttel.genai.guardrail_name`	string	Name of triggered guardrail	`"pii_filter"`
`agenttel.genai.cache_hit`	boolean	Whether a cached response was used	`false`

7. Frontend Attributes¶

Client-side telemetry from agenttel-web (browser SDK). Set on spans emitted by the browser and exported via OTLP.

Resource Attributes¶

Set once per browser application at initialization.

Attribute	Type	Description	Example
`agenttel.client.app.name`	string	Application name	`"checkout-web"`
`agenttel.client.app.version`	string	Application version	`"1.0.0"`
`agenttel.client.app.platform`	string	Platform identifier	`"browser"`
`agenttel.client.app.environment`	string	Deployment environment	`"production"`
`agenttel.client.topology.team`	string	Owning team	`"checkout-frontend"`
`agenttel.client.topology.domain`	string	Business domain	`"commerce"`

Page & Route Attributes¶

Attribute	Type	Description	Example
`agenttel.client.page.url`	string	Current page URL (path only, no query/hash)	`"/checkout/payment"`
`agenttel.client.page.route`	string	Matched route pattern	`"/checkout/:step"`
`agenttel.client.page.title`	string	Document title	`"Checkout - Payment"`
`agenttel.client.page.business_criticality`	string	Route business criticality	`"revenue"`

Business Criticality Values: revenue, engagement, internal

Baseline Attributes¶

Per-route baselines for frontend operations.

Attribute	Type	Description	Example
`agenttel.client.baseline.page_load_p50_ms`	double	Expected page load P50	`800.0`
`agenttel.client.baseline.page_load_p99_ms`	double	Expected page load P99	`2000.0`
`agenttel.client.baseline.api_call_p50_ms`	double	Expected API response P50	`300.0`
`agenttel.client.baseline.error_rate`	double	Expected error rate (0.0–1.0)	`0.01`

Decision Attributes¶

Per-route decision metadata for agent reasoning.

Attribute	Type	Description	Example
`agenttel.client.decision.escalation_level`	string	Escalation procedure	`"page_oncall"`
`agenttel.client.decision.runbook_url`	string	Operational runbook	`"https://wiki/runbooks/checkout"`
`agenttel.client.decision.fallback_page`	string	Fallback route on failure	`"/maintenance"`
`agenttel.client.decision.retry_on_failure`	boolean	Whether to retry failed page loads	`true`

Anomaly Attributes¶

Client-side anomaly detection results.

Attribute	Type	Description	Example
`agenttel.client.anomaly.detected`	boolean	Whether a client-side anomaly was detected	`true`
`agenttel.client.anomaly.pattern`	string	Detected anomaly pattern	`"rage_click"`
`agenttel.client.anomaly.score`	double	Anomaly severity (0.0–1.0)	`0.75`

Client-Side Anomaly Patterns:

Pattern	Value	Detection	Description
Rage Click	`"rage_click"`	N+ clicks on same element within time window	User frustration — UI is unresponsive
API Failure Cascade	`"api_failure_cascade"`	N+ API failures within time window	Backend instability visible to user
Slow Page Load	`"slow_page_load"`	Load time exceeds baseline by multiplier	Performance degradation on route
Error Loop	`"error_loop"`	N+ errors on same route within time window	Repeating failure preventing user progress
Funnel Drop-off	`"funnel_dropoff"`	Journey abandonment above baseline	User journey failing at specific step

Journey Attributes¶

Multi-step user journey tracking.

Attribute	Type	Description	Example
`agenttel.client.journey.name`	string	Journey identifier	`"checkout"`
`agenttel.client.journey.step`	int	Current step index (0-based)	`3`
`agenttel.client.journey.step_name`	string	Step route/name	`"/checkout/payment"`
`agenttel.client.journey.status`	string	Journey status	`"in_progress"`
`agenttel.client.journey.duration_ms`	double	Time since journey start	`45000.0`

Journey Status Values: in_progress, completed, abandoned

Correlation Attributes¶

Cross-stack trace linking between frontend and backend.

Attribute	Type	Description	Example
`agenttel.client.correlation.backend_trace_id`	string	Backend trace ID from response	`"abc123def456"`
`agenttel.client.correlation.traceparent`	string	W3C Trace Context header sent	`"00-abc...-01"`

Page Load Attributes¶

Captured from the Navigation Timing API on page load spans.

Attribute	Type	Description	Example
`agenttel.client.page_load.dom_load_ms`	double	DOM content loaded time	`450.0`
`agenttel.client.page_load.ttfb_ms`	double	Time to first byte	`120.0`
`agenttel.client.page_load.transfer_size_bytes`	long	Page transfer size	`245000`

API Call Attributes¶

Captured from intercepted fetch and XMLHttpRequest calls.

Attribute	Type	Description	Example
`agenttel.client.api.method`	string	HTTP method	`"POST"`
`agenttel.client.api.url`	string	Request URL (path only)	`"/api/payments"`
`agenttel.client.api.status_code`	int	Response status code	`200`
`agenttel.client.api.duration_ms`	double	Response time	`312.0`

Anomaly Detection Configuration¶

Property	Default	Description
`rageClickThreshold`	`3`	Clicks on same element to trigger rage click
`rageClickWindowMs`	`2000`	Time window for rage click detection
`apiFailureCascadeThreshold`	`3`	API failures to trigger cascade
`apiFailureCascadeWindowMs`	`10000`	Time window for cascade detection
`slowPageLoadMultiplier`	`2.0`	Multiplier over baseline P50 to trigger slow load
`errorLoopThreshold`	`5`	Errors on same route to trigger error loop
`errorLoopWindowMs`	`30000`	Time window for error loop detection

8. Structured Events¶

AgentTel emits structured events via the OTel Logs API for significant state changes that agents should react to.

agenttel.anomaly.detected¶

Emitted when a span's behavior deviates significantly from baseline.

{
  "event.name": "agenttel.anomaly.detected",
  "severity": "WARN",
  "body": {
    "operation": "POST /api/payments",
    "pattern": "latency_degradation",
    "anomaly_score": 0.85,
    "z_score": 4.2,
    "current_latency_ms": 312.0,
    "baseline_p50_ms": 45.0
  }
}

agenttel.slo.budget_alert¶

Emitted when an SLO's error budget crosses a threshold (50%, 25%, 10%).

{
  "event.name": "agenttel.slo.budget_alert",
  "severity": "WARN",
  "body": {
    "slo_name": "payment-availability",
    "severity": "WARNING",
    "budget_remaining": 0.22,
    "burn_rate": 0.78
  }
}

agenttel.dependency.state_change¶

Emitted when a dependency's observed health transitions.

{
  "event.name": "agenttel.dependency.state_change",
  "severity": "WARN",
  "body": {
    "dependency": "postgres",
    "previous_state": "healthy",
    "current_state": "degraded",
    "error_rate": 0.15
  }
}

Relationship to OpenTelemetry¶

AgentTel is a strict extension of OpenTelemetry. Backend attributes use the agenttel.* namespace, frontend attributes use agenttel.client.*, and GenAI attributes use the emerging gen_ai.* conventions. AgentTel-enriched spans remain fully compatible with any OTel backend — Jaeger, Zipkin, Grafana Tempo, Datadog, Splunk, New Relic, and others.

The backend library implements standard OTel interfaces (SpanProcessor, SpanExporter, Resource) and composes cleanly with any other OTel instrumentation. The frontend SDK exports spans via OTLP HTTP to any OTel-compatible collector.