Attribute Dictionary¶
Complete reference for every attribute AgentTel adds to OpenTelemetry spans and resources. Each entry describes what the attribute is, why an AI agent needs it, and when it appears.
Quick navigation: Topology | Baselines | Decisions | Anomaly | Error Classification | Causality | Severity | Change Correlation | SLO | Deployment | GenAI | Agent Identity | Sessions | Circuit Breaker | Agentic | Frontend
Alphabetical Index¶
All agenttel.* attributes sorted alphabetically. Click any key to jump to its category.
| Attribute Key | Category |
|---|---|
agenttel.agent.id |
Agent Identity |
agenttel.agent.role |
Agent Identity |
agenttel.agent.session_id |
Agent Identity |
agenttel.agentic.agent.framework |
Agentic |
agenttel.agentic.agent.name |
Agentic |
agenttel.agentic.agent.type |
Agentic |
agenttel.agentic.agent.version |
Agentic |
agenttel.agentic.capability.system_prompt_hash |
Agentic |
agenttel.agentic.capability.tool_count |
Agentic |
agenttel.agentic.capability.tools |
Agentic |
agenttel.agentic.code.exit_code |
Agentic |
agenttel.agentic.code.language |
Agentic |
agenttel.agentic.code.sandboxed |
Agentic |
agenttel.agentic.code.status |
Agentic |
agenttel.agentic.conversation.id |
Agentic |
agenttel.agentic.conversation.message_count |
Agentic |
agenttel.agentic.conversation.speaker_role |
Agentic |
agenttel.agentic.conversation.turn |
Agentic |
agenttel.agentic.cost.cached_read_tokens |
Agentic |
agenttel.agentic.cost.cached_write_tokens |
Agentic |
agenttel.agentic.cost.input_tokens |
Agentic |
agenttel.agentic.cost.llm_calls |
Agentic |
agenttel.agentic.cost.output_tokens |
Agentic |
agenttel.agentic.cost.reasoning_tokens |
Agentic |
agenttel.agentic.cost.total_usd |
Agentic |
agenttel.agentic.error.category |
Agentic |
agenttel.agentic.error.retryable |
Agentic |
agenttel.agentic.error.source |
Agentic |
agenttel.agentic.eval.criteria |
Agentic |
agenttel.agentic.eval.feedback |
Agentic |
agenttel.agentic.eval.score |
Agentic |
agenttel.agentic.eval.scorer_name |
Agentic |
agenttel.agentic.eval.type |
Agentic |
agenttel.agentic.guardrail.action |
Agentic |
agenttel.agentic.guardrail.name |
Agentic |
agenttel.agentic.guardrail.reason |
Agentic |
agenttel.agentic.guardrail.triggered |
Agentic |
agenttel.agentic.handoff.chain_depth |
Agentic |
agenttel.agentic.handoff.from_agent |
Agentic |
agenttel.agentic.handoff.reason |
Agentic |
agenttel.agentic.handoff.to_agent |
Agentic |
agenttel.agentic.human.checkpoint_type |
Agentic |
agenttel.agentic.human.decision |
Agentic |
agenttel.agentic.human.wait_ms |
Agentic |
agenttel.agentic.invocation.goal |
Agentic |
agenttel.agentic.invocation.id |
Agentic |
agenttel.agentic.invocation.max_steps |
Agentic |
agenttel.agentic.invocation.status |
Agentic |
agenttel.agentic.invocation.steps |
Agentic |
agenttel.agentic.memory.items |
Agentic |
agenttel.agentic.memory.operation |
Agentic |
agenttel.agentic.memory.store_type |
Agentic |
agenttel.agentic.orchestration.aggregation |
Agentic |
agenttel.agentic.orchestration.coordinator_id |
Agentic |
agenttel.agentic.orchestration.parallel_branches |
Agentic |
agenttel.agentic.orchestration.pattern |
Agentic |
agenttel.agentic.orchestration.stage |
Agentic |
agenttel.agentic.orchestration.total_stages |
Agentic |
agenttel.agentic.quality.eval_score |
Agentic |
agenttel.agentic.quality.goal_achieved |
Agentic |
agenttel.agentic.quality.human_interventions |
Agentic |
agenttel.agentic.quality.loop_detected |
Agentic |
agenttel.agentic.quality.loop_iterations |
Agentic |
agenttel.agentic.reranker.input_documents |
Agentic |
agenttel.agentic.reranker.model |
Agentic |
agenttel.agentic.reranker.output_documents |
Agentic |
agenttel.agentic.reranker.top_score |
Agentic |
agenttel.agentic.retrieval.document_count |
Agentic |
agenttel.agentic.retrieval.query |
Agentic |
agenttel.agentic.retrieval.relevance_score_avg |
Agentic |
agenttel.agentic.retrieval.relevance_score_min |
Agentic |
agenttel.agentic.retrieval.store_type |
Agentic |
agenttel.agentic.retrieval.top_k |
Agentic |
agenttel.agentic.step.iteration |
Agentic |
agenttel.agentic.step.number |
Agentic |
agenttel.agentic.step.tool_name |
Agentic |
agenttel.agentic.step.tool_status |
Agentic |
agenttel.agentic.step.type |
Agentic |
agenttel.agentic.task.depth |
Agentic |
agenttel.agentic.task.id |
Agentic |
agenttel.agentic.task.name |
Agentic |
agenttel.agentic.task.parent_id |
Agentic |
agenttel.agentic.task.status |
Agentic |
agenttel.anomaly.detected |
Anomaly |
agenttel.anomaly.latency_z_score |
Anomaly |
agenttel.anomaly.pattern |
Anomaly |
agenttel.anomaly.score |
Anomaly |
agenttel.baseline.confidence |
Baselines |
agenttel.baseline.error_rate |
Baselines |
agenttel.baseline.latency_p50_ms |
Baselines |
agenttel.baseline.latency_p99_ms |
Baselines |
agenttel.baseline.sample_count |
Baselines |
agenttel.baseline.slo |
Baselines |
agenttel.baseline.source |
Baselines |
agenttel.baseline.throughput_rps |
Baselines |
agenttel.baseline.updated_at |
Baselines |
agenttel.cause.category |
Causality |
agenttel.cause.correlated_event_id |
Causality |
agenttel.cause.correlated_span_id |
Causality |
agenttel.cause.dependency |
Causality |
agenttel.cause.hint |
Causality |
agenttel.cause.started_at |
Causality |
agenttel.circuit_breaker.dependency |
Circuit Breaker |
agenttel.circuit_breaker.failure_count |
Circuit Breaker |
agenttel.circuit_breaker.name |
Circuit Breaker |
agenttel.circuit_breaker.new_state |
Circuit Breaker |
agenttel.circuit_breaker.previous_state |
Circuit Breaker |
agenttel.client.anomaly.detected |
Frontend |
agenttel.client.anomaly.pattern |
Frontend |
agenttel.client.anomaly.score |
Frontend |
agenttel.client.app.environment |
Frontend |
agenttel.client.app.name |
Frontend |
agenttel.client.app.platform |
Frontend |
agenttel.client.app.version |
Frontend |
agenttel.client.baseline.api_call_p50_ms |
Frontend |
agenttel.client.baseline.interaction_error_rate |
Frontend |
agenttel.client.baseline.page_load_p50_ms |
Frontend |
agenttel.client.baseline.page_load_p99_ms |
Frontend |
agenttel.client.baseline.source |
Frontend |
agenttel.client.correlation.backend_operation |
Frontend |
agenttel.client.correlation.backend_service |
Frontend |
agenttel.client.correlation.backend_trace_id |
Frontend |
agenttel.client.decision.escalation_level |
Frontend |
agenttel.client.decision.fallback_page |
Frontend |
agenttel.client.decision.retry_on_failure |
Frontend |
agenttel.client.decision.runbook_url |
Frontend |
agenttel.client.decision.user_facing |
Frontend |
agenttel.client.interaction.outcome |
Frontend |
agenttel.client.interaction.response_time_ms |
Frontend |
agenttel.client.interaction.target |
Frontend |
agenttel.client.interaction.type |
Frontend |
agenttel.client.journey.name |
Frontend |
agenttel.client.journey.started_at |
Frontend |
agenttel.client.journey.step |
Frontend |
agenttel.client.journey.total_steps |
Frontend |
agenttel.client.page.business_criticality |
Frontend |
agenttel.client.page.route |
Frontend |
agenttel.client.page.title |
Frontend |
agenttel.client.topology.domain |
Frontend |
agenttel.client.topology.team |
Frontend |
agenttel.correlation.change_id |
Change Correlation |
agenttel.correlation.confidence |
Change Correlation |
agenttel.correlation.likely_cause |
Change Correlation |
agenttel.correlation.time_delta_ms |
Change Correlation |
agenttel.decision.escalation_level |
Decisions |
agenttel.decision.fallback_available |
Decisions |
agenttel.decision.fallback_description |
Decisions |
agenttel.decision.idempotent |
Decisions |
agenttel.decision.known_issue_id |
Decisions |
agenttel.decision.retryable |
Decisions |
agenttel.decision.retry_after_ms |
Decisions |
agenttel.decision.runbook_url |
Decisions |
agenttel.decision.safe_to_restart |
Decisions |
agenttel.deployment.commit_sha |
Deployment |
agenttel.deployment.id |
Deployment |
agenttel.deployment.previous_version |
Deployment |
agenttel.deployment.strategy |
Deployment |
agenttel.deployment.timestamp |
Deployment |
agenttel.deployment.version |
Deployment |
agenttel.error.category |
Error Classification |
agenttel.error.dependency |
Error Classification |
agenttel.error.root_exception |
Error Classification |
agenttel.genai.cache_hit |
GenAI |
agenttel.genai.cost_usd |
GenAI |
agenttel.genai.framework |
GenAI |
agenttel.genai.guardrail_name |
GenAI |
agenttel.genai.guardrail_triggered |
GenAI |
agenttel.genai.rag_relevance_score_avg |
GenAI |
agenttel.genai.rag_source_count |
GenAI |
agenttel.session.id |
Sessions |
agenttel.session.incident_id |
Sessions |
agenttel.severity.anomaly_score |
Severity |
agenttel.severity.business_impact |
Severity |
agenttel.severity.impact_scope |
Severity |
agenttel.severity.pattern |
Severity |
agenttel.severity.user_facing |
Severity |
agenttel.slo.budget_remaining |
SLO |
agenttel.slo.burn_rate |
SLO |
agenttel.slo.name |
SLO |
agenttel.slo.target |
SLO |
agenttel.topology.consumers |
Topology |
agenttel.topology.dependencies |
Topology |
agenttel.topology.domain |
Topology |
agenttel.topology.on_call_channel |
Topology |
agenttel.topology.repo_url |
Topology |
agenttel.topology.team |
Topology |
agenttel.topology.tier |
Topology |
gen_ai.operation.name |
GenAI (OTel Standard) |
gen_ai.request.max_tokens |
GenAI (OTel Standard) |
gen_ai.request.model |
GenAI (OTel Standard) |
gen_ai.request.temperature |
GenAI (OTel Standard) |
gen_ai.request.top_p |
GenAI (OTel Standard) |
gen_ai.response.finish_reasons |
GenAI (OTel Standard) |
gen_ai.response.id |
GenAI (OTel Standard) |
gen_ai.response.model |
GenAI (OTel Standard) |
gen_ai.system |
GenAI (OTel Standard) |
gen_ai.usage.input_tokens |
GenAI (OTel Standard) |
gen_ai.usage.output_tokens |
GenAI (OTel Standard) |
Topology¶
Service identity and dependency graph. Set once per service as OTel Resource attributes at startup. These attributes travel with every span exported by the service, giving agents immediate context about ownership, criticality, and the dependency graph without requiring a separate lookup.
Set by:
AgentTelResourceProvider(OTel SPIResourceProvider) -- runs at SDK initialization, reads fromAgentTelGlobalStatewhich is populated by@AgentObservableannotations, YAML configuration, or programmatic registration viaTopologyRegistry.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.topology.team |
string | Free-form, e.g. "payments-platform" |
Agent knows who to page when something breaks |
agenttel.topology.tier |
string | critical, standard, internal, experimental |
Agent prioritizes critical services over internal tooling |
agenttel.topology.domain |
string | Free-form, e.g. "commerce" |
Agent scopes blast radius to the right business domain |
agenttel.topology.on_call_channel |
string | Free-form, e.g. "#payments-oncall" |
Agent knows where to escalate when human intervention is needed |
agenttel.topology.repo_url |
string | URL, e.g. "https://github.com/org/repo" |
Agent can link alerts to source code for faster diagnosis |
agenttel.topology.dependencies |
string (JSON) | JSON array of dependency descriptors | Agent understands the upstream dependency graph |
agenttel.topology.consumers |
string (JSON) | JSON array of consumer descriptors | Agent understands downstream impact of failures |
Detailed Reference¶
agenttel.topology.tier¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelResourceProvider |
| Appears on | Resource attributes |
| Default | Not set (attribute absent if not configured) |
Why: An AI agent responding to an incident must prioritize. A failure in a critical service (user-facing, revenue-impacting) demands an immediate page, while the same failure in an experimental service might only warrant a log entry. Without tier information, the agent treats all services equally, leading to alert fatigue or missed critical issues.
Use case: Agent receives anomaly alerts from both payment-service (tier=critical) and internal-report-generator (tier=internal). It pages on-call for the payment service immediately but only sends a Slack notification for the report generator.
Example value: "critical"
Possible values:
| Tier | Meaning |
|---|---|
critical |
User-facing, revenue-impacting. Pages on-call immediately. |
standard |
Important but not immediately revenue-impacting. |
internal |
Internal tooling and infrastructure. |
experimental |
Non-production or experimental services. |
agenttel.topology.dependencies¶
| Property | Value |
|---|---|
| Type | string (JSON-encoded array) |
| Set by | AgentTelResourceProvider |
| Appears on | Resource attributes |
| Default | Not set (attribute absent if no dependencies declared) |
Why: When an agent detects a failure, it needs to understand whether the root cause is in this service or in a dependency. The dependency graph -- including criticality, timeout configuration, circuit breaker status, and fallback availability -- lets the agent trace failures upstream and determine the correct remediation path.
Use case: Agent sees payment-service throwing SocketTimeoutException. It checks agenttel.topology.dependencies, finds that postgres is a required dependency with circuit_breaker: true and timeout_ms: 5000. The agent knows to check postgres health and that a circuit breaker should eventually protect the service.
Example value:
[
{
"name": "postgres",
"type": "database",
"criticality": "required",
"protocol": "postgresql",
"timeout_ms": 5000,
"circuit_breaker": true,
"fallback": "Return cached data",
"health_endpoint": "/health/postgres"
}
]
agenttel.topology.consumers¶
| Property | Value |
|---|---|
| Type | string (JSON-encoded array) |
| Set by | AgentTelResourceProvider |
| Appears on | Resource attributes |
| Default | Not set (attribute absent if no consumers declared) |
Why: When a service degrades, the agent needs to know which downstream services are affected. Consumer descriptors encode who calls this service, whether those calls are synchronous (blocking the caller) or asynchronous (buffered), and what SLA expectations exist. This lets the agent accurately scope the blast radius of an incident.
Use case: Agent detects latency degradation in pricing-service. It reads agenttel.topology.consumers and finds that checkout-service calls it synchronously with a 200ms SLA. The agent knows checkout will be directly impacted and escalates accordingly.
Example value:
Baselines¶
What "normal" looks like for each operation. Set as span attributes on every span for a registered operation. Baselines are the foundation for anomaly detection -- without knowing what "normal" is, an agent cannot determine whether current behavior is problematic.
Set by:
AgentTelSpanProcessor(static/rolling baselines from@AgentOperationannotations or YAML config) andAgentTelEnrichingSpanExporter(confidence metrics added at export time).
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.baseline.latency_p50_ms |
double | >= 0, e.g. 45.0 |
Agent knows the median expected latency |
agenttel.baseline.latency_p99_ms |
double | >= 0, e.g. 200.0 |
Agent knows the tail latency expectation |
agenttel.baseline.error_rate |
double | 0.0--1.0, e.g. 0.001 |
Agent knows the expected background error rate |
agenttel.baseline.throughput_rps |
double | >= 0, e.g. 150.0 |
Agent knows expected traffic volume |
agenttel.baseline.source |
string | static, rolling, composite, default |
Agent knows how the baseline was determined |
agenttel.baseline.updated_at |
string | ISO 8601 timestamp | Agent knows how fresh the baseline is |
agenttel.baseline.slo |
string | SLO identifier, e.g. "payment-availability" |
Agent links the baseline to a specific SLO |
agenttel.baseline.sample_count |
long | >= 0, e.g. 250 |
Agent gauges statistical significance |
agenttel.baseline.confidence |
string | low, medium, high |
Agent weighs how much to trust the baseline |
Detailed Reference¶
agenttel.baseline.latency_p50_ms¶
| Property | Value |
|---|---|
| Type | double |
| Set by | AgentTelSpanProcessor |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no baseline is registered for the operation) |
Why: The P50 (median) latency is the single most useful baseline metric for an AI agent. It represents what a typical request looks like. When the agent observes a span whose duration is 5x or 10x the P50, it can immediately flag a latency degradation anomaly. Without this number, the agent has no frame of reference for whether 312ms is good, bad, or catastrophic for a given operation.
Use case: Agent detects that the current span for POST /api/payments took 312ms while agenttel.baseline.latency_p50_ms is 45ms. This is a 6.9x deviation, clearly indicating a latency degradation anomaly. The agent checks the dependency graph and finds the root cause is elevated postgres latency.
Example value: 45.0
agenttel.baseline.confidence¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no rolling baseline data exists) |
Why: Not all baselines are equally trustworthy. A rolling baseline computed from 5 observations is far less reliable than one computed from 500. The confidence level tells the agent whether to act decisively on a deviation or to treat it as uncertain. An agent should never page on-call based on a low-confidence baseline.
Use case: Agent detects a 3x latency deviation on a newly deployed endpoint. It checks agenttel.baseline.confidence and finds low (only 12 samples). Instead of paging on-call, the agent logs the anomaly and continues collecting data. Once confidence reaches high, the same deviation would trigger an immediate escalation.
Example value: "high"
Confidence thresholds:
| Sample Count | Confidence | Meaning |
|---|---|---|
| < 30 | low |
Baseline is unreliable -- insufficient data |
| 30--200 | medium |
Baseline is usable but may not capture edge cases |
| > 200 | high |
Baseline is statistically significant and reliable |
agenttel.baseline.source¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelSpanProcessor |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no baseline is registered) |
Why: An agent's response should vary depending on how the baseline was determined. A static baseline from configuration reflects an intentional SLA target. A rolling baseline computed from live traffic reflects actual behavior (which may have drifted). A default baseline is a system-provided fallback with minimal confidence. Knowing the source lets the agent calibrate its anomaly detection thresholds appropriately.
Use case: Agent detects elevated latency. The baseline source is rolling, meaning it was computed from recent traffic. The agent knows this baseline adapts over time and checks the updated_at timestamp to ensure it is fresh enough to be meaningful.
Example value: "static"
Possible values:
| Source | Meaning |
|---|---|
static |
From @AgentOperation annotation or YAML configuration file |
rolling |
Computed from a sliding window of observed traffic |
composite |
Static baseline with rolling fallback for unset fields |
default |
System default when no baseline is available |
Decisions¶
What an AI agent is permitted and equipped to do when a problem occurs. Set as span attributes from @AgentOperation annotations or YAML configuration. Decision attributes encode human operator intent -- they are the guardrails that prevent an agent from taking harmful actions.
Set by:
AgentTelSpanProcessor-- reads fromOperationContextRegistry, which is populated by@AgentOperationannotations (scanned byAgentTelAnnotationBeanPostProcessorin Spring Boot) or YAML config (loaded byAgentTelConfigLoaderin the javaagent extension).
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.decision.retryable |
boolean | true / false |
Agent knows if retrying the operation is safe |
agenttel.decision.retry_after_ms |
long | >= 0, e.g. 1000 |
Agent knows how long to wait before retrying |
agenttel.decision.idempotent |
boolean | true / false |
Agent knows if duplicate calls are safe |
agenttel.decision.fallback_available |
boolean | true / false |
Agent knows an alternative path exists |
agenttel.decision.fallback_description |
string | Free-form, e.g. "Return cached pricing" |
Agent knows what the fallback does |
agenttel.decision.runbook_url |
string | URL | Agent can reference operational documentation |
agenttel.decision.escalation_level |
string | auto_resolve, notify_team, page_oncall, incident_commander |
Agent knows the correct escalation path |
agenttel.decision.known_issue_id |
string | Issue ID, e.g. "JIRA-1234" |
Agent links the problem to a known issue |
agenttel.decision.safe_to_restart |
boolean | true / false |
Agent knows if restarting the service is safe |
Detailed Reference¶
agenttel.decision.retryable¶
| Property | Value |
|---|---|
| Type | boolean |
| Set by | AgentTelSpanProcessor |
| Appears on | Span attributes |
| Default | Not set (attribute absent -- agent should assume not retryable) |
Why: Retrying a failed operation is one of the most common automated remediation actions, but it is also one of the most dangerous. Retrying a non-idempotent payment operation could charge a customer twice. This attribute encodes human operator knowledge about whether retry is safe for each specific operation.
Use case: Agent detects a dependency_timeout error on POST /api/payments. It checks agenttel.decision.retryable and finds false. Even though the error is transient, the agent does not retry because the operation is not marked as safe to retry. Instead, it follows the escalation path.
Example value: true
agenttel.decision.escalation_level¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelSpanProcessor |
| Appears on | Span attributes |
| Default | Not set (attribute absent -- agent should default to notify_team) |
Why: Different operations warrant different levels of human involvement when they fail. A background data sync job might be safe for the agent to handle autonomously, while a payment processing failure requires an immediate page to the on-call engineer. The escalation level encodes this operational judgment so the agent responds proportionally.
Use case: Agent detects cascading failures in the payment service. It checks agenttel.decision.escalation_level and finds page_oncall. It immediately pages the on-call engineer via the channel specified in agenttel.topology.on_call_channel, rather than attempting autonomous remediation.
Example value: "page_oncall"
Possible values:
| Level | Meaning |
|---|---|
auto_resolve |
Agent can handle autonomously without human involvement |
notify_team |
Send asynchronous notification to the owning team |
page_oncall |
Page the on-call engineer immediately |
incident_commander |
Escalate to incident management process |
agenttel.decision.fallback_description¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelSpanProcessor |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no fallback is described) |
Why: When agenttel.decision.fallback_available is true, the agent needs to know what the fallback actually does so it can decide whether activating it is appropriate for the current failure mode. A fallback that returns cached data is suitable for a dependency timeout but not for a data corruption issue.
Use case: Agent detects that the pricing service dependency is down. It checks agenttel.decision.fallback_available (true) and reads agenttel.decision.fallback_description: "Return cached pricing from Redis, stale up to 5 minutes." The agent activates the fallback and notifies the team that cached pricing is being served.
Example value: "Return cached pricing from Redis, stale up to 5 minutes"
Anomaly¶
Real-time deviation detection results. Set as span attributes by the AgentTelSpanProcessor when a span's behavior deviates significantly from the registered baseline. Anomaly attributes are only present on spans where anomalous behavior was detected -- their absence means the span is behaving normally.
Set by:
AgentTelSpanProcessorvia theAnomalyDetectorandPatternMatchercomponents -- runs duringonEnd()span processing, comparing observed behavior against baselines fromOperationContextRegistry.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.anomaly.detected |
boolean | true / false |
Agent knows this span is anomalous |
agenttel.anomaly.pattern |
string | cascade_failure, latency_degradation, error_rate_spike, memory_leak, thundering_herd, cold_start |
Agent knows the type of incident |
agenttel.anomaly.score |
double | 0.0--1.0 | Agent gauges the severity of the anomaly |
agenttel.anomaly.latency_z_score |
double | Any positive value, typically 0--10+ | Agent measures how many standard deviations from normal |
Detailed Reference¶
agenttel.anomaly.pattern¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelSpanProcessor (via PatternMatcher) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no pattern is detected) |
Why: Knowing that something is anomalous is necessary but insufficient. An agent needs to know what kind of anomaly it is to take the right action. A cascade_failure requires checking multiple dependencies, a memory_leak requires restarting instances, and a cold_start requires patience. The pattern classification maps directly to different remediation playbooks.
Use case: Agent sees agenttel.anomaly.pattern = cascade_failure on the payment service. It checks agenttel.topology.dependencies and finds that 3 of 4 downstream dependencies are returning errors. The agent identifies the common upstream cause (a failing load balancer) and creates an incident linking all affected services.
Example value: "cascade_failure"
Pattern detection methods:
| Pattern | Detection | Typical Remediation |
|---|---|---|
cascade_failure |
3+ dependencies with errors in recent window | Identify common upstream cause, circuit break |
latency_degradation |
Current latency > 2x rolling P50 | Check dependency latency, scale up |
error_rate_spike |
Recent error rate > 5x baseline | Check recent deployments, rollback if needed |
memory_leak |
Positive slope in latency linear regression | Restart instances, investigate heap usage |
thundering_herd |
Traffic burst exceeding normal patterns | Rate limit, shed load, scale out |
cold_start |
High latency with low request count | Wait for warm-up, pre-warm caches |
agenttel.anomaly.score¶
| Property | Value |
|---|---|
| Type | double |
| Set by | AgentTelSpanProcessor (via AnomalyDetector) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no anomaly is detected) |
Why: The anomaly score provides a normalized severity metric (0.0 to 1.0) that lets agents compare anomalies across different operations and services. A score of 0.3 might warrant monitoring, while 0.9 demands immediate action. This score feeds into the severity assessment and business impact calculation.
Use case: Agent receives anomaly alerts from two services simultaneously. payment-service has agenttel.anomaly.score = 0.92 and notification-service has score = 0.35. The agent triages the payment service first because the higher score indicates a more severe deviation from normal behavior.
Example value: 0.85
Error Classification¶
Structured error categorization that tells agents why a span failed, not just that it failed. Set as span attributes at export time for spans with error status.
Set by:
AgentTelEnrichingSpanExporter(viaErrorClassifier) -- runs during span export, analyzes exception types, HTTP status codes, and exception messages to classify errors.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.error.category |
string | dependency_timeout, connection_error, code_bug, rate_limited, auth_failure, resource_exhaustion, data_validation, unknown |
Agent knows the failure class and appropriate response |
agenttel.error.root_exception |
string | Java exception class name, e.g. "java.net.SocketTimeoutException" |
Agent classifies the root cause at the code level |
agenttel.error.dependency |
string | Dependency name, e.g. "postgres" |
Agent knows which dependency caused the failure |
Detailed Reference¶
agenttel.error.category¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter (via ErrorClassifier) |
| Appears on | Span attributes |
| Default | "unknown" (set on all error spans; defaults to unknown when classification rules do not match) |
Why: Standard OTel error status tells the agent that a span failed, but the same "error" status covers both a NullPointerException (code bug, do not retry) and a SocketTimeoutException (transient dependency issue, retry is appropriate). Error classification maps raw exceptions to actionable categories, each with a distinct remediation strategy.
Use case: Agent sees error spans on POST /api/payments. It reads agenttel.error.category = dependency_timeout and agenttel.error.dependency = postgres. Instead of investigating application code, the agent checks postgres health, finds connection pool exhaustion, and triggers a scaling action.
Example value: "dependency_timeout"
Classification rules:
| Category | Triggering Conditions | Agent Action |
|---|---|---|
dependency_timeout |
Exception contains Timeout / SocketTimeout |
Retry with backoff, check dependency health |
connection_error |
Exception contains Connection / ConnectException |
Check dependency availability, circuit break |
code_bug |
NullPointerException, ClassCastException, IndexOutOfBoundsException, IllegalStateException |
Do not retry -- needs code fix |
rate_limited |
HTTP 429 | Back off, reduce traffic, request quota increase |
auth_failure |
HTTP 401 / 403 | Check credentials/tokens, do not retry |
resource_exhaustion |
OutOfMemoryError, StackOverflowError |
Scale up, restart instances |
data_validation |
HTTP 400 / 422, ValidationException, IllegalArgumentException |
Do not retry -- fix input |
unknown |
Everything else | Investigate manually |
agenttel.error.root_exception¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter (via ErrorClassifier) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no exception is recorded on the span) |
Why: While agenttel.error.category provides high-level classification, the root exception class name gives agents the precision to match against known issues, search issue trackers, and correlate with specific code paths. It records the deepest cause in the exception chain, stripping away wrapper exceptions.
Use case: Agent sees agenttel.error.root_exception = org.postgresql.util.PSQLException and cross-references it with agenttel.decision.known_issue_id = "JIRA-5678". It finds the known issue is a connection pool sizing bug with a documented workaround and applies the fix automatically.
Example value: "java.net.SocketTimeoutException"
Causality¶
Root cause analysis attributes that help agents trace failures back to their origin. Set as span attributes at export time.
Set by:
AgentTelEnrichingSpanExporter(viaCausalityTrackerandOperationDependencyTracker) -- runs during span export, correlates error spans with dependency health data and recent events.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.cause.hint |
string | Human-readable description | Agent gets a concise root cause explanation |
agenttel.cause.category |
string | dependency, code, infrastructure, traffic, unknown |
Agent categorizes the root cause domain |
agenttel.cause.dependency |
string | Dependency name | Agent identifies the specific failing dependency |
agenttel.cause.correlated_span_id |
string | Span ID (hex) | Agent traces to the root cause span |
agenttel.cause.correlated_event_id |
string | Event ID | Agent links to the triggering event |
agenttel.cause.started_at |
string | ISO 8601 timestamp | Agent knows when the issue first appeared |
Detailed Reference¶
agenttel.cause.hint¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter (via CausalityTracker) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when causality cannot be determined) |
Why: The cause hint is a human-readable, agent-consumable explanation of why a span failed. It synthesizes information from error classification, dependency health, and recent events into a single actionable sentence. This is the attribute agents include in incident summaries and escalation messages.
Use case: Agent constructs an incident report and includes the cause hint: "Dependency postgres is unhealthy: Connection refused on host db-primary:5432. First observed 3 minutes ago." This gives the on-call engineer immediate context without needing to dig through logs.
Example value: "Dependency postgres is unhealthy: Connection refused on host db-primary:5432"
agenttel.cause.category¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter (via CausalityTracker) |
| Appears on | Span attributes |
| Default | "unknown" (set when causality analysis runs but cannot determine a specific category) |
Why: The cause category tells the agent which domain the root cause belongs to, driving the selection of the appropriate remediation playbook. A dependency cause means the agent should investigate upstream services. A code cause means no automated fix is possible. An infrastructure cause points to compute, network, or storage issues.
Use case: Agent sees agenttel.cause.category = infrastructure combined with agenttel.anomaly.pattern = latency_degradation. It checks node-level metrics, finds elevated CPU on the host, and triggers an auto-scaling action rather than investigating application code.
Example value: "dependency"
Possible values:
| Category | Meaning |
|---|---|
dependency |
Failure caused by an upstream dependency |
code |
Failure caused by application code (bugs, unhandled cases) |
infrastructure |
Failure caused by compute, network, or storage issues |
traffic |
Failure caused by traffic patterns (overload, thundering herd) |
unknown |
Root cause could not be determined |
Severity¶
Business impact assessment that helps agents prioritize their response. Set as span attributes at export time.
Set by:
AgentTelEnrichingSpanExporter-- synthesizes anomaly scores, service tier, and error status into a business impact assessment.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.severity.anomaly_score |
double | 0.0--1.0 | Agent gauges the overall severity magnitude |
agenttel.severity.pattern |
string | Incident pattern name | Agent knows the type of incident for playbook selection |
agenttel.severity.impact_scope |
string | operation_specific, service_wide, cross_service |
Agent scopes the blast radius |
agenttel.severity.business_impact |
string | critical, high, medium, low |
Agent prioritizes response based on business impact |
agenttel.severity.user_facing |
boolean | true / false |
Agent knows if end users are affected |
Detailed Reference¶
agenttel.severity.business_impact¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no anomaly or error is detected) |
Why: The business impact level is the single most important triage signal for an agent. It combines anomaly severity with service tier to produce a unified priority. An error on a critical-tier service is automatically high impact even if the anomaly score is moderate, while the same error on an internal-tier service is low impact.
Use case: Agent is handling three simultaneous alerts. It sorts by agenttel.severity.business_impact: the critical alert (payment processing down, score > 0.8) gets immediate attention, the high alert (error on critical-tier order service) gets queued, and the low alert (validation error on internal admin tool) gets logged for later review.
Example value: "critical"
Determination rules:
| Impact | Condition |
|---|---|
critical |
Anomaly score > 0.8 |
high |
Error on critical-tier service |
medium |
Error on standard-tier service or moderate anomaly |
low |
Minor anomaly or data validation error |
agenttel.severity.impact_scope¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEnrichingSpanExporter |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no severity assessment is performed) |
Why: An issue affecting a single operation requires a different response than one affecting the entire service or multiple services. The impact scope tells the agent whether to investigate narrowly (one endpoint) or broadly (service-wide or cross-service), and whether to coordinate with agents monitoring other services.
Use case: Agent sees agenttel.severity.impact_scope = cross_service on an anomaly in the API gateway. It queries health data for all downstream services, correlates the timing with a recent deployment event, and coordinates a multi-service incident response.
Example value: "service_wide"
Change Correlation¶
Correlates anomalies with recent changes to help agents identify the probable trigger. Set on incident context objects constructed by the agent layer.
Set by:
ChangeCorrelationEnginein theagenttel-agentmodule -- analyzes recent deployment events, configuration changes, and scaling events against anomaly onset timestamps.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.correlation.likely_cause |
string | deployment, config, scaling, feature_flag, dependency_update |
Agent identifies the probable trigger for the incident |
agenttel.correlation.change_id |
string | Change identifier, e.g. "deploy-v2.1.0" |
Agent links to the specific change for rollback decisions |
agenttel.correlation.time_delta_ms |
long | >= 0, e.g. 1800000 |
Agent gauges temporal proximity between change and anomaly |
agenttel.correlation.confidence |
double | 0.0--1.0 | Agent weighs the strength of the correlation |
Detailed Reference¶
agenttel.correlation.likely_cause¶
| Property | Value |
|---|---|
| Type | string |
| Set by | ChangeCorrelationEngine |
| Appears on | Incident context (used in MCP tool responses and incident reports) |
| Default | Not set (attribute absent when no correlated change is found) |
Why: The most common cause of production incidents is a recent change. When an agent can automatically correlate an anomaly with a deployment that happened 10 minutes ago, it can suggest or execute a rollback without human investigation. This attribute identifies the type of change most likely responsible.
Use case: Agent detects an error rate spike starting 12 minutes ago. ChangeCorrelationEngine finds a deployment (deploy-v2.1.0) that completed 15 minutes ago with confidence 0.92. The agent recommends rollback to v2.0.9 and includes the deployment diff link in the incident report.
Example value: "deployment"
agenttel.correlation.confidence¶
| Property | Value |
|---|---|
| Type | double |
| Set by | ChangeCorrelationEngine |
| Appears on | Incident context |
| Default | Not set (attribute absent when no correlated change is found) |
Why: Not all correlations are meaningful -- a config change 6 hours ago is less likely to be the cause than a deployment 5 minutes ago. The confidence score encodes temporal proximity, change scope, and historical correlation patterns so the agent can decide whether to recommend rollback (high confidence) or just flag the correlation for human review (low confidence).
Use case: Agent finds two correlated changes: a deployment 3 minutes before the anomaly (confidence 0.95) and a config change 2 hours before (confidence 0.15). It recommends rolling back the deployment and ignores the config change.
Example value: 0.85
SLO¶
Error budget consumption tracking. Set as span attributes when SLOs are registered for the operation.
Set by:
AgentTelSpanProcessor(viaSloTracker) -- evaluates each span against registered SLO definitions and computes budget consumption in real time.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.slo.name |
string | SLO identifier, e.g. "payment-availability" |
Agent tracks which SLO is being measured |
agenttel.slo.target |
double | 0.0--1.0, e.g. 0.999 |
Agent knows the SLO target to compare against |
agenttel.slo.budget_remaining |
double | 0.0--1.0, e.g. 0.85 |
Agent knows how much error budget remains |
agenttel.slo.burn_rate |
double | >= 0, e.g. 0.15 |
Agent detects accelerating budget consumption |
Detailed Reference¶
agenttel.slo.budget_remaining¶
| Property | Value |
|---|---|
| Type | double |
| Set by | AgentTelSpanProcessor (via SloTracker) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no SLO is registered for the operation) |
Why: Error budget remaining is the key metric for SLO-driven incident response. When the budget is > 50%, minor anomalies can be monitored. When it drops below 25%, the agent should restrict risky changes. Below 10%, the agent should escalate aggressively and consider freezing deployments. This single number drives a graduated response strategy.
Use case: Agent detects intermittent errors on the payment service. It checks agenttel.slo.budget_remaining = 0.12 (12% remaining) and agenttel.slo.burn_rate = 0.78 (consuming budget at 78% of the sustainable rate). The agent emits a critical SLO budget alert and recommends pausing non-essential deployments until the error rate stabilizes.
Example value: 0.85
agenttel.slo.burn_rate¶
| Property | Value |
|---|---|
| Type | double |
| Set by | AgentTelSpanProcessor (via SloTracker) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when no SLO is registered for the operation) |
Why: While budget_remaining is a snapshot, burn_rate shows the velocity of budget consumption. A burn rate of 1.0 means the budget is being consumed at exactly the sustainable rate. A burn rate of 10.0 means the budget will be exhausted 10x faster than expected. This lets agents predict budget exhaustion and escalate proactively before the budget runs out.
Use case: Agent sees agenttel.slo.burn_rate = 5.2, meaning the error budget is being consumed 5x faster than sustainable. Even though budget_remaining is still 0.45 (healthy), the agent projects budget exhaustion within hours and proactively notifies the team.
Example value: 0.15
Deployment¶
Deployment tracking attributes for change correlation. Set on span events and startup events by the deployment event emitter.
Set by:
DeploymentEventEmitter(inagenttel-core) andAgentTelDeploymentEventListener(inagenttel-spring-boot-starter) -- emits a structured event at service startup containing deployment metadata.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.deployment.id |
string | Deployment ID, e.g. "deploy-20240115-1430" |
Agent tracks individual deployments |
agenttel.deployment.version |
string | Version string, e.g. "2.1.0" |
Agent knows the current running version |
agenttel.deployment.commit_sha |
string | Git SHA, e.g. "a1b2c3d4" |
Agent can link to exact code changes |
agenttel.deployment.previous_version |
string | Version string, e.g. "2.0.9" |
Agent knows what to rollback to |
agenttel.deployment.strategy |
string | blue-green, canary, rolling |
Agent understands the deployment mechanism |
agenttel.deployment.timestamp |
string | ISO 8601, e.g. "2024-01-15T14:30:00Z" |
Agent knows when the deployment happened |
Detailed Reference¶
agenttel.deployment.previous_version¶
| Property | Value |
|---|---|
| Type | string |
| Set by | DeploymentEventEmitter |
| Appears on | Span events (deployment event) |
| Default | Not set (attribute absent on first deployment or when previous version is unknown) |
Why: When an agent decides to recommend or execute a rollback, it needs to know which version to rollback to. Without previous_version, the agent can identify that the current deployment is problematic but cannot specify the safe target version, requiring human intervention to determine the rollback target.
Use case: Agent correlates a latency spike with the deployment of version 2.1.0 (deployed 8 minutes ago). It reads agenttel.deployment.previous_version = "2.0.9" and recommends: "Rollback payment-service from v2.1.0 to v2.0.9 -- latency degradation correlated with deployment (confidence: 0.93)."
Example value: "2.0.9"
agenttel.deployment.strategy¶
| Property | Value |
|---|---|
| Type | string |
| Set by | DeploymentEventEmitter |
| Appears on | Span events (deployment event) |
| Default | Not set (attribute absent when deployment strategy is not configured) |
Why: The deployment strategy determines the blast radius of a bad deployment and the rollback mechanism. A canary deployment means only a fraction of traffic is affected -- the agent can halt the canary rather than doing a full rollback. A blue-green deployment allows instant rollback by switching traffic. A rolling deployment may require waiting for all instances to be replaced.
Use case: Agent detects errors in a canary deployment (strategy=canary). Instead of triggering a full rollback, it halts canary promotion and routes all traffic back to the stable version, minimizing user impact.
Example value: "canary"
GenAI¶
Attributes for AI/ML workload observability. Combines the standard OTel gen_ai.* namespace with AgentTel extensions in the agenttel.genai.* namespace.
Set by: GenAI instrumentation wrappers in the
agenttel-genaimodule --TracingChatLanguageModel(LangChain4j),SpringAiSpanEnricher(Spring AI),TracingAnthropicClient,TracingOpenAIClient,BedrockTracing, andCostEnrichingSpanExporter.
AgentTel GenAI Extensions¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.genai.framework |
string | langchain4j, spring_ai, anthropic, openai, bedrock |
Agent knows the instrumentation source framework |
agenttel.genai.cost_usd |
double | >= 0, e.g. 0.000795 |
Agent tracks per-request cost for budget monitoring |
agenttel.genai.rag_source_count |
long | >= 0, e.g. 5 |
Agent monitors RAG retrieval volume |
agenttel.genai.rag_relevance_score_avg |
double | 0.0--1.0, e.g. 0.87 |
Agent assesses retrieval quality |
agenttel.genai.guardrail_triggered |
boolean | true / false |
Agent monitors safety guardrail activations |
agenttel.genai.guardrail_name |
string | Guardrail identifier, e.g. "pii_filter" |
Agent knows which guardrail was triggered |
agenttel.genai.cache_hit |
boolean | true / false |
Agent tracks cache efficiency for cost optimization |
Standard OTel GenAI Attributes¶
These follow the emerging OTel GenAI semantic conventions (gen_ai.* namespace). AgentTel populates them for all supported providers.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
gen_ai.operation.name |
string | chat, text_completion, embeddings |
Agent identifies the GenAI operation type |
gen_ai.system |
string | openai, anthropic, aws_bedrock |
Agent knows which provider is being called |
gen_ai.request.model |
string | Model identifier, e.g. "gpt-4" |
Agent knows which model was requested |
gen_ai.response.model |
string | Model identifier, e.g. "gpt-4-0125-preview" |
Agent knows the actual model that responded |
gen_ai.usage.input_tokens |
long | >= 0 | Agent monitors input token consumption |
gen_ai.usage.output_tokens |
long | >= 0 | Agent monitors output token consumption |
gen_ai.request.temperature |
double | 0.0--2.0 | Agent sees the sampling temperature used |
gen_ai.request.max_tokens |
long | >= 0 | Agent sees the max output token limit |
gen_ai.request.top_p |
double | 0.0--1.0 | Agent sees the nucleus sampling parameter |
gen_ai.response.id |
string | Response identifier | Agent correlates responses across retries |
gen_ai.response.finish_reasons |
string[] | stop, length, tool_calls |
Agent knows why generation stopped |
Detailed Reference¶
agenttel.genai.cost_usd¶
| Property | Value |
|---|---|
| Type | double |
| Set by | CostEnrichingSpanExporter (via ModelCostCalculator) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when model pricing is not configured) |
Why: GenAI API costs can spike unexpectedly -- a prompt injection or retry loop can burn through budget in minutes. By attaching per-request cost to every span, agents can detect cost anomalies in real time, enforce budget limits, and alert when spending exceeds thresholds.
Use case: Agent aggregates agenttel.genai.cost_usd over a 5-minute window and detects that spending is 10x the rolling average. It investigates and finds a retry loop caused by a downstream timeout, halts the retry, and reports the cost impact ($47.30 in unnecessary spending).
Example value: 0.000795
agenttel.genai.rag_relevance_score_avg¶
| Property | Value |
|---|---|
| Type | double |
| Set by | TracingContentRetriever (LangChain4j) or SpringAiSpanEnricher |
| Appears on | Span attributes |
| Default | Not set (attribute absent when RAG is not used or relevance scores are not available) |
Why: Low retrieval quality directly impacts LLM response quality. When the average relevance score drops below a threshold, it means the retrieval pipeline is returning irrelevant documents, leading to hallucinations or poor answers. Agents can detect this degradation and alert on retrieval quality before users notice response quality issues.
Use case: Agent monitors agenttel.genai.rag_relevance_score_avg across requests to a customer support chatbot. It detects a drop from 0.87 to 0.42 after a vector index rebuild. The agent alerts the ML engineering team that retrieval quality has degraded and the index rebuild may need to be reverted.
Example value: 0.87
gen_ai.response.finish_reasons¶
| Property | Value |
|---|---|
| Type | string[] (string array) |
| Set by | GenAI instrumentation wrappers (TracingChatLanguageModel, TracingAnthropicClient, etc.) |
| Appears on | Span attributes |
| Default | Not set (attribute absent when the provider does not return finish reasons) |
Why: The finish reason tells the agent why the LLM stopped generating. A stop finish is normal. A length finish means the output was truncated at max_tokens, which may indicate the response is incomplete and the user got a degraded experience. A tool_calls finish means the LLM wants to invoke a tool. Agents can detect elevated length finishes as a quality degradation signal.
Use case: Agent detects that 40% of responses from the code generation endpoint finish with reason length. It recommends increasing max_tokens from 1024 to 2048 for that operation and estimates the cost impact using agenttel.genai.cost_usd data.
Example value: ["stop"]
Agent Identity¶
Tracks which AI agent performed each action. Set on action spans created when agents interact with the system through MCP tools.
Set by:
AgentActionTrackerin theagenttel-agentmodule -- wraps each agent action (MCP tool invocation, remediation execution) in a span with identity attributes.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.agent.id |
string | Agent identifier, e.g. "diag-agent-1" |
Track which agent performed each action |
agenttel.agent.role |
string | observer, diagnostician, remediator, admin |
Track the agent's role and permission level |
agenttel.agent.session_id |
string | Session UUID | Link actions to a collaboration session |
Detailed Reference¶
agenttel.agent.role¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentActionTracker |
| Appears on | Span attributes (on agent action spans) |
| Default | Not set (attribute absent when agent identity is not registered) |
Why: Different agents have different permission levels. An observer agent can read telemetry but cannot take remediation actions. A remediator can execute approved playbooks. An admin can perform any action. By recording the role on each span, the system maintains an audit trail of who did what, and the ToolPermissionRegistry can enforce role-based access control on MCP tools.
Use case: Audit review reveals that a remediator-role agent restarted a payment service instance during an incident. The role on the span confirms the agent was authorized for this action class. If an observer-role agent had attempted the same action, the ToolPermissionRegistry would have denied it.
Example value: "diagnostician"
Predefined roles:
| Role | Permissions |
|---|---|
observer |
Read-only access to telemetry data, health status, and incident context |
diagnostician |
Observer permissions plus ability to run diagnostic queries and trace analysis |
remediator |
Diagnostician permissions plus ability to execute approved remediation playbooks |
admin |
Full access to all tools including service restarts and configuration changes |
Sessions¶
Shared incident session tracking for multi-agent collaboration. Set on session-related operations managed by the SessionManager.
Set by:
SessionManagerandIncidentSessionin theagenttel-agentmodule -- creates and manages collaborative sessions where multiple agents can work on the same incident.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.session.id |
string | UUID, e.g. "a3f2b1c4-d5e6-7890-abcd-ef1234567890" |
Uniquely identifies the collaboration session |
agenttel.session.incident_id |
string | Incident identifier, e.g. "inc-payment-spike-20240115" |
Links the session to a specific incident |
Detailed Reference¶
agenttel.session.id¶
| Property | Value |
|---|---|
| Type | string |
| Set by | SessionManager |
| Appears on | Span attributes (on session-scoped operations) |
| Default | Not set (attribute absent when no session is active) |
Why: When multiple agents collaborate on an incident, they need a shared context. The session ID links all agent actions, diagnostic queries, and remediation steps to the same incident investigation. This enables post-incident review of the full agent collaboration timeline and prevents duplicate work.
Use case: A diagnostician agent and a remediator agent are both investigating a payment service outage. Both record agenttel.session.id = "a3f2b1c4" on their spans. In post-incident review, the team can trace the complete investigation: the diagnostician identified postgres as the root cause at T+2min, and the remediator executed a connection pool scaling action at T+4min.
Example value: "a3f2b1c4-d5e6-7890-abcd-ef1234567890"
Circuit Breaker¶
Circuit breaker state change tracking. Set on event attributes when circuit breakers transition between states.
Set by:
AgentTelEventEmitterin theagenttel-coremodule -- emits structured events when circuit breaker state changes are recorded.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.circuit_breaker.name |
string | Breaker identifier, e.g. "postgres-breaker" |
Agent identifies which circuit breaker changed |
agenttel.circuit_breaker.previous_state |
string | closed, open, half_open |
Agent knows the state before the transition |
agenttel.circuit_breaker.new_state |
string | closed, open, half_open |
Agent knows the current state |
agenttel.circuit_breaker.failure_count |
long | >= 0 | Agent knows how many failures triggered the transition |
agenttel.circuit_breaker.dependency |
string | Dependency name, e.g. "postgres" |
Agent links the breaker to a specific dependency |
Detailed Reference¶
agenttel.circuit_breaker.new_state¶
| Property | Value |
|---|---|
| Type | string |
| Set by | AgentTelEventEmitter |
| Appears on | Event attributes (on circuit breaker state change events) |
| Default | Not set (only present on circuit breaker events) |
Why: Circuit breaker state transitions are critical operational signals. When a breaker opens, it means a dependency has exceeded its failure threshold and the service is now returning fallback responses (or failing fast). When it transitions to half-open, the service is testing whether the dependency has recovered. The agent needs to know these transitions to understand service behavior and correlate them with anomalies.
Use case: Agent sees agenttel.circuit_breaker.new_state = open for the postgres breaker. It correlates this with the dependency_timeout errors on POST /api/payments and confirms that the circuit breaker is protecting the service. The agent monitors for the half_open transition to verify recovery.
Example value: "open"
State machine:
| State | Meaning |
|---|---|
closed |
Normal operation -- requests are forwarded to the dependency |
open |
Failure threshold exceeded -- requests are short-circuited (fallback or fail-fast) |
half_open |
Testing recovery -- a limited number of requests are forwarded to check if the dependency has recovered |
Agentic¶
Agent lifecycle instrumentation attributes from the agenttel-agentic module. These attributes instrument the AI agent runtime — invocations, reasoning, orchestration, cost, quality, and safety.
Set by:
agenttel-agenticmodule classes —AgentTracer,AgentInvocation, scope classes (ToolCallScope,TaskScope,HandoffScope, etc.),AgentCostAggregator,GuardrailRecorder,LoopDetector, andQualityTracker.
The agenttel.agentic.* namespace contains 70+ attributes across 17 categories. For the complete reference with all enum values, span names, and detailed descriptions, see the Agentic Attributes Reference.
Summary of categories:
| Category | Key Attributes | Span Name |
|---|---|---|
| Agent Identity | agent.name, agent.type, agent.framework |
invoke_agent |
| Invocation | invocation.id, invocation.goal, invocation.status, invocation.steps |
invoke_agent |
| Step / Reasoning | step.number, step.type, step.iteration |
agenttel.agentic.step |
| Tool Calls | step.tool_name, step.tool_status |
agenttel.agentic.tool_call |
| Task Tracking | task.id, task.name, task.depth, task.parent_id |
agenttel.agentic.task |
| Orchestration | orchestration.pattern, orchestration.stage, orchestration.parallel_branches |
agenttel.agentic.session |
| Handoff | handoff.from_agent, handoff.to_agent, handoff.chain_depth |
agenttel.agentic.handoff |
| Cost | cost.total_usd, cost.input_tokens, cost.output_tokens, cost.llm_calls |
On invoke_agent / session |
| Quality | quality.goal_achieved, quality.loop_detected, quality.eval_score |
On invoke_agent |
| Guardrail | guardrail.triggered, guardrail.name, guardrail.action |
agenttel.agentic.guardrail |
| Human Checkpoint | human.checkpoint_type, human.decision, human.wait_ms |
agenttel.agentic.human_input |
| Code Execution | code.language, code.status, code.sandboxed |
agenttel.agentic.code_execution |
| Evaluation | eval.scorer_name, eval.score, eval.type |
agenttel.agentic.evaluate |
| Retrieval | retrieval.query, retrieval.document_count, retrieval.relevance_score_avg |
agenttel.agentic.retriever |
| Reranker | reranker.model, reranker.input_documents, reranker.top_score |
agenttel.agentic.reranker |
| Memory | memory.operation, memory.store_type, memory.items |
agenttel.agentic.memory |
| Error Classification | error.source, error.category, error.retryable |
On invoke_agent |
Frontend¶
Client-side telemetry from the @agenttel/web browser SDK. These attributes provide full-stack observability by tracking user-facing behavior, client-side anomalies, and cross-stack trace correlation.
Frontend attributes use the agenttel.client.* namespace to distinguish them from server-side attributes.
Set by:
@agenttel/webbrowser SDK -- instruments page loads, API calls, user interactions, and user journeys. Exports spans via OTLP HTTP to any OTel-compatible collector.
Resource Attributes¶
Set once per browser application at SDK initialization.
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.app.name |
string | App name, e.g. "checkout-web" |
Agent identifies the frontend application |
agenttel.client.app.version |
string | Semver, e.g. "1.0.0" |
Agent tracks frontend version for change correlation |
agenttel.client.app.platform |
string | browser |
Agent knows the runtime platform |
agenttel.client.app.environment |
string | production, staging, etc. |
Agent filters by environment |
agenttel.client.topology.team |
string | Team name, e.g. "checkout-frontend" |
Agent routes frontend issues to the right team |
agenttel.client.topology.domain |
string | Business domain, e.g. "commerce" |
Agent groups frontend with related backend services |
Page and Route Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.page.route |
string | Route pattern, e.g. "/checkout/:step" |
Agent groups spans by route for baseline comparison |
agenttel.client.page.title |
string | Document title, e.g. "Checkout - Payment" |
Agent includes human-readable page context in alerts |
agenttel.client.page.business_criticality |
string | revenue, engagement, internal |
Agent prioritizes revenue-impacting pages |
Baseline Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.baseline.page_load_p50_ms |
double | >= 0, e.g. 800.0 |
Agent knows expected page load time |
agenttel.client.baseline.page_load_p99_ms |
double | >= 0, e.g. 2000.0 |
Agent knows tail page load expectation |
agenttel.client.baseline.api_call_p50_ms |
double | >= 0, e.g. 300.0 |
Agent knows expected API response time from the browser |
agenttel.client.baseline.interaction_error_rate |
double | 0.0--1.0, e.g. 0.01 |
Agent knows expected client-side error rate |
agenttel.client.baseline.source |
string | static, rolling |
Agent knows how the baseline was determined |
Decision Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.decision.escalation_level |
string | auto_resolve, notify_team, page_oncall, incident_commander |
Agent knows the client-side escalation path |
agenttel.client.decision.runbook_url |
string | URL | Agent references frontend operational docs |
agenttel.client.decision.fallback_page |
string | Route path, e.g. "/maintenance" |
Agent knows where to redirect on failure |
agenttel.client.decision.retry_on_failure |
boolean | true / false |
Agent knows if page reload is safe |
agenttel.client.decision.user_facing |
boolean | true / false |
Agent confirms this affects real users |
Anomaly Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.anomaly.detected |
boolean | true / false |
Agent knows a client-side anomaly was detected |
agenttel.client.anomaly.pattern |
string | rage_click, api_failure_cascade, slow_page_load, error_loop, funnel_dropoff |
Agent knows the type of user-facing issue |
agenttel.client.anomaly.score |
double | 0.0--1.0 | Agent gauges client-side anomaly severity |
Client-side anomaly patterns:
| Pattern | Detection | Impact |
|---|---|---|
rage_click |
N+ clicks on same element within time window | User frustration -- UI is unresponsive |
api_failure_cascade |
N+ API failures within time window | Backend instability visible to user |
slow_page_load |
Load time exceeds baseline by multiplier | Performance degradation on route |
error_loop |
N+ errors on same route within time window | Repeating failure preventing user progress |
funnel_dropoff |
Journey abandonment above baseline | User journey failing at specific step |
Journey Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.journey.name |
string | Journey name, e.g. "checkout" |
Agent tracks critical user journeys |
agenttel.client.journey.step |
int | 0-based step index | Agent knows which step the user is on |
agenttel.client.journey.total_steps |
int | Total steps in journey | Agent knows journey completion progress |
agenttel.client.journey.started_at |
string | ISO 8601 timestamp | Agent measures total journey duration |
Interaction Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.interaction.type |
string | click, submit, custom |
Agent categorizes user interaction type |
agenttel.client.interaction.target |
string | Element identifier, e.g. "button#submit-payment" |
Agent identifies the UI element involved |
agenttel.client.interaction.outcome |
string | success, error |
Agent knows if the interaction succeeded |
agenttel.client.interaction.response_time_ms |
double | >= 0 | Agent measures interaction responsiveness |
Correlation Attributes¶
| Attribute | Type | Possible Values | Why an Agent Needs This |
|---|---|---|---|
agenttel.client.correlation.backend_trace_id |
string | 32-char hex trace ID | Agent links browser spans to backend traces |
agenttel.client.correlation.backend_service |
string | Service name | Agent knows which backend service handled the request |
agenttel.client.correlation.backend_operation |
string | Operation name | Agent traces to the specific backend operation |
Detailed Reference¶
agenttel.client.anomaly.pattern¶
| Property | Value |
|---|---|
| Type | string |
| Set by | @agenttel/web browser SDK (anomaly detector) |
| Appears on | Span attributes (on client-side spans) |
| Default | Not set (attribute absent when no anomaly is detected) |
Why: Client-side anomalies like rage clicks and error loops are signals of user frustration that backend metrics may not capture. A backend service can return 200 OK while the JavaScript rendering is broken, leaving users unable to complete their task. Client-side pattern detection catches these user-facing issues that would otherwise go unnoticed.
Use case: Agent detects agenttel.client.anomaly.pattern = rage_click on the checkout page's "Submit Payment" button. It checks agenttel.client.correlation.backend_trace_id and finds the backend call succeeded (200 OK, 45ms). The issue is a frontend rendering bug where the button appears clickable but the form submission is blocked by a JavaScript error. The agent alerts the frontend team with the specific element identifier.
Example value: "rage_click"
agenttel.client.correlation.backend_trace_id¶
| Property | Value |
|---|---|
| Type | string |
| Set by | @agenttel/web browser SDK (from traceparent response header or server-timing header) |
| Appears on | Span attributes (on client-side API call spans) |
| Default | Not set (attribute absent when the backend does not return trace context in response headers) |
Why: Full-stack incident investigation requires linking what the user sees in the browser to what happened on the server. The backend trace ID lets an agent follow a user's API call from the browser, through the API gateway, to the backend service, and into its dependencies. Without this link, frontend and backend incidents are investigated in isolation, missing the full picture.
Use case: Agent detects slow page load on the checkout page. It reads agenttel.client.correlation.backend_trace_id = "abc123def456" and queries the backend tracing system. It finds that the backend span for POST /api/payments shows a dependency_timeout on the fraud detection service, confirming the slow page load is caused by a backend dependency issue, not a frontend problem.
Example value: "abc123def456789012345678abcdef01"
agenttel.client.page.business_criticality¶
| Property | Value |
|---|---|
| Type | string |
| Set by | @agenttel/web browser SDK (from route configuration) |
| Appears on | Span attributes (on page-scoped spans) |
| Default | Not set (attribute absent when business criticality is not configured for the route) |
Why: Not all pages are equally important. The checkout page directly impacts revenue, while the blog page impacts engagement but not transactions. Business criticality lets the agent prioritize frontend issues the same way agenttel.topology.tier prioritizes backend services -- revenue-impacting pages get immediate attention.
Use case: Agent receives anomaly alerts from both the checkout page (criticality=revenue) and the help center page (criticality=engagement). It pages on-call for the checkout page issue immediately because errors there directly lose revenue, while sending a Slack notification for the help center issue.
Example value: "revenue"
Java Constant Reference¶
All backend attribute keys are defined as typed AttributeKey<T> constants in io.agenttel.api.attributes.AgentTelAttributes. Agentic attributes are in io.agenttel.api.attributes.AgenticAttributes. GenAI attributes have additional constants in io.agenttel.genai.conventions.AgentTelGenAiAttributes and io.agenttel.genai.conventions.GenAiAttributes.
Using these constants instead of raw strings provides compile-time type safety:
import io.agenttel.api.attributes.AgentTelAttributes;
// Type-safe attribute access
Double p50 = span.getAttribute(AgentTelAttributes.BASELINE_LATENCY_P50_MS); // Double
String tier = span.getAttribute(AgentTelAttributes.TOPOLOGY_TIER); // String
Boolean retryable = span.getAttribute(AgentTelAttributes.DECISION_RETRYABLE); // Boolean
Long retryAfter = span.getAttribute(AgentTelAttributes.DECISION_RETRY_AFTER_MS); // Long
Constant Naming Convention¶
The constant name follows the pattern: CATEGORY_FIELD_NAME
| Namespace | Constant Prefix | Example |
|---|---|---|
agenttel.topology.* |
TOPOLOGY_ |
TOPOLOGY_TIER |
agenttel.baseline.* |
BASELINE_ |
BASELINE_LATENCY_P50_MS |
agenttel.decision.* |
DECISION_ |
DECISION_RETRYABLE |
agenttel.anomaly.* |
ANOMALY_ |
ANOMALY_DETECTED |
agenttel.error.* |
ERROR_ |
ERROR_CATEGORY |
agenttel.cause.* |
CAUSE_ |
CAUSE_HINT |
agenttel.severity.* |
SEVERITY_ |
SEVERITY_BUSINESS_IMPACT |
agenttel.correlation.* |
CORRELATION_ |
CORRELATION_LIKELY_CAUSE |
agenttel.slo.* |
SLO_ |
SLO_BUDGET_REMAINING |
agenttel.deployment.* |
DEPLOYMENT_ |
DEPLOYMENT_VERSION |
agenttel.genai.* |
GENAI_ |
GENAI_COST_USD |
agenttel.agent.* |
AGENT_ |
AGENT_ROLE |
agenttel.session.* |
SESSION_ |
SESSION_ID |
agenttel.circuit_breaker.* |
CIRCUIT_BREAKER_ |
CIRCUIT_BREAKER_NEW_STATE |
agenttel.agentic.* |
Various (e.g., AGENT_NAME) |
AgenticAttributes.AGENT_NAME |
Attribute Lifecycle Summary¶
The following table summarizes when and where each category of attributes is set:
| Category | Set By | Set When | Appears On |
|---|---|---|---|
| Topology | AgentTelResourceProvider |
SDK initialization (once per service) | Resource attributes |
| Baselines | AgentTelSpanProcessor |
onStart() for every span of a registered operation |
Span attributes |
| Baseline Confidence | AgentTelEnrichingSpanExporter |
Export time | Span attributes |
| Decisions | AgentTelSpanProcessor |
onStart() for every span of a registered operation |
Span attributes |
| Anomaly | AgentTelSpanProcessor |
onEnd() when deviation from baseline is detected |
Span attributes |
| Error Classification | AgentTelEnrichingSpanExporter |
Export time, for error spans only | Span attributes |
| Causality | AgentTelEnrichingSpanExporter |
Export time, for error/anomalous spans | Span attributes |
| Severity | AgentTelEnrichingSpanExporter |
Export time, for error/anomalous spans | Span attributes |
| Change Correlation | ChangeCorrelationEngine |
During incident context construction | Incident context |
| SLO | AgentTelSpanProcessor (via SloTracker) |
onEnd() for spans with registered SLOs |
Span attributes |
| Deployment | DeploymentEventEmitter |
Service startup | Event attributes |
| GenAI | GenAI wrappers + CostEnrichingSpanExporter |
Span creation (wrappers) and export (cost) | Span attributes |
| Agent Identity | AgentActionTracker |
Agent action execution | Span attributes |
| Sessions | SessionManager |
Session creation | Span attributes |
| Circuit Breaker | AgentTelEventEmitter |
Circuit breaker state transition | Event attributes |
| Agentic | agenttel-agentic module |
Agent invocations, steps, tool calls, orchestrations | Span attributes |
| Frontend | @agenttel/web SDK |
Various (page load, API call, interaction, journey) | Span + Resource attributes |