Attribute Dictionary¶

Complete reference for every attribute AgentTel adds to OpenTelemetry spans and resources. Each entry describes what the attribute is, why an AI agent needs it, and when it appears.

Quick navigation: Topology | Baselines | Decisions | Anomaly | Error Classification | Causality | Severity | Change Correlation | SLO | Deployment | GenAI | Agent Identity | Sessions | Circuit Breaker | Agentic | Frontend

Alphabetical Index¶

All agenttel.* attributes sorted alphabetically. Click any key to jump to its category.

Attribute Key	Category
`agenttel.agent.id`	Agent Identity
`agenttel.agent.role`	Agent Identity
`agenttel.agent.session_id`	Agent Identity
`agenttel.agentic.agent.framework`	Agentic
`agenttel.agentic.agent.name`	Agentic
`agenttel.agentic.agent.type`	Agentic
`agenttel.agentic.agent.version`	Agentic
`agenttel.agentic.capability.system_prompt_hash`	Agentic
`agenttel.agentic.capability.tool_count`	Agentic
`agenttel.agentic.capability.tools`	Agentic
`agenttel.agentic.code.exit_code`	Agentic
`agenttel.agentic.code.language`	Agentic
`agenttel.agentic.code.sandboxed`	Agentic
`agenttel.agentic.code.status`	Agentic
`agenttel.agentic.conversation.id`	Agentic
`agenttel.agentic.conversation.message_count`	Agentic
`agenttel.agentic.conversation.speaker_role`	Agentic
`agenttel.agentic.conversation.turn`	Agentic
`agenttel.agentic.cost.cached_read_tokens`	Agentic
`agenttel.agentic.cost.cached_write_tokens`	Agentic
`agenttel.agentic.cost.input_tokens`	Agentic
`agenttel.agentic.cost.llm_calls`	Agentic
`agenttel.agentic.cost.output_tokens`	Agentic
`agenttel.agentic.cost.reasoning_tokens`	Agentic
`agenttel.agentic.cost.total_usd`	Agentic
`agenttel.agentic.error.category`	Agentic
`agenttel.agentic.error.retryable`	Agentic
`agenttel.agentic.error.source`	Agentic
`agenttel.agentic.eval.criteria`	Agentic
`agenttel.agentic.eval.feedback`	Agentic
`agenttel.agentic.eval.score`	Agentic
`agenttel.agentic.eval.scorer_name`	Agentic
`agenttel.agentic.eval.type`	Agentic
`agenttel.agentic.guardrail.action`	Agentic
`agenttel.agentic.guardrail.name`	Agentic
`agenttel.agentic.guardrail.reason`	Agentic
`agenttel.agentic.guardrail.triggered`	Agentic
`agenttel.agentic.handoff.chain_depth`	Agentic
`agenttel.agentic.handoff.from_agent`	Agentic
`agenttel.agentic.handoff.reason`	Agentic
`agenttel.agentic.handoff.to_agent`	Agentic
`agenttel.agentic.human.checkpoint_type`	Agentic
`agenttel.agentic.human.decision`	Agentic
`agenttel.agentic.human.wait_ms`	Agentic
`agenttel.agentic.invocation.goal`	Agentic
`agenttel.agentic.invocation.id`	Agentic
`agenttel.agentic.invocation.max_steps`	Agentic
`agenttel.agentic.invocation.status`	Agentic
`agenttel.agentic.invocation.steps`	Agentic
`agenttel.agentic.memory.items`	Agentic
`agenttel.agentic.memory.operation`	Agentic
`agenttel.agentic.memory.store_type`	Agentic
`agenttel.agentic.orchestration.aggregation`	Agentic
`agenttel.agentic.orchestration.coordinator_id`	Agentic
`agenttel.agentic.orchestration.parallel_branches`	Agentic
`agenttel.agentic.orchestration.pattern`	Agentic
`agenttel.agentic.orchestration.stage`	Agentic
`agenttel.agentic.orchestration.total_stages`	Agentic
`agenttel.agentic.quality.eval_score`	Agentic
`agenttel.agentic.quality.goal_achieved`	Agentic
`agenttel.agentic.quality.human_interventions`	Agentic
`agenttel.agentic.quality.loop_detected`	Agentic
`agenttel.agentic.quality.loop_iterations`	Agentic
`agenttel.agentic.reranker.input_documents`	Agentic
`agenttel.agentic.reranker.model`	Agentic
`agenttel.agentic.reranker.output_documents`	Agentic
`agenttel.agentic.reranker.top_score`	Agentic
`agenttel.agentic.retrieval.document_count`	Agentic
`agenttel.agentic.retrieval.query`	Agentic
`agenttel.agentic.retrieval.relevance_score_avg`	Agentic
`agenttel.agentic.retrieval.relevance_score_min`	Agentic
`agenttel.agentic.retrieval.store_type`	Agentic
`agenttel.agentic.retrieval.top_k`	Agentic
`agenttel.agentic.step.iteration`	Agentic
`agenttel.agentic.step.number`	Agentic
`agenttel.agentic.step.tool_name`	Agentic
`agenttel.agentic.step.tool_status`	Agentic
`agenttel.agentic.step.type`	Agentic
`agenttel.agentic.task.depth`	Agentic
`agenttel.agentic.task.id`	Agentic
`agenttel.agentic.task.name`	Agentic
`agenttel.agentic.task.parent_id`	Agentic
`agenttel.agentic.task.status`	Agentic
`agenttel.anomaly.detected`	Anomaly
`agenttel.anomaly.latency_z_score`	Anomaly
`agenttel.anomaly.pattern`	Anomaly
`agenttel.anomaly.score`	Anomaly
`agenttel.baseline.confidence`	Baselines
`agenttel.baseline.error_rate`	Baselines
`agenttel.baseline.latency_p50_ms`	Baselines
`agenttel.baseline.latency_p99_ms`	Baselines
`agenttel.baseline.sample_count`	Baselines
`agenttel.baseline.slo`	Baselines
`agenttel.baseline.source`	Baselines
`agenttel.baseline.throughput_rps`	Baselines
`agenttel.baseline.updated_at`	Baselines
`agenttel.cause.category`	Causality
`agenttel.cause.correlated_event_id`	Causality
`agenttel.cause.correlated_span_id`	Causality
`agenttel.cause.dependency`	Causality
`agenttel.cause.hint`	Causality
`agenttel.cause.started_at`	Causality
`agenttel.circuit_breaker.dependency`	Circuit Breaker
`agenttel.circuit_breaker.failure_count`	Circuit Breaker
`agenttel.circuit_breaker.name`	Circuit Breaker
`agenttel.circuit_breaker.new_state`	Circuit Breaker
`agenttel.circuit_breaker.previous_state`	Circuit Breaker
`agenttel.client.anomaly.detected`	Frontend
`agenttel.client.anomaly.pattern`	Frontend
`agenttel.client.anomaly.score`	Frontend
`agenttel.client.app.environment`	Frontend
`agenttel.client.app.name`	Frontend
`agenttel.client.app.platform`	Frontend
`agenttel.client.app.version`	Frontend
`agenttel.client.baseline.api_call_p50_ms`	Frontend
`agenttel.client.baseline.interaction_error_rate`	Frontend
`agenttel.client.baseline.page_load_p50_ms`	Frontend
`agenttel.client.baseline.page_load_p99_ms`	Frontend
`agenttel.client.baseline.source`	Frontend
`agenttel.client.correlation.backend_operation`	Frontend
`agenttel.client.correlation.backend_service`	Frontend
`agenttel.client.correlation.backend_trace_id`	Frontend
`agenttel.client.decision.escalation_level`	Frontend
`agenttel.client.decision.fallback_page`	Frontend
`agenttel.client.decision.retry_on_failure`	Frontend
`agenttel.client.decision.runbook_url`	Frontend
`agenttel.client.decision.user_facing`	Frontend
`agenttel.client.interaction.outcome`	Frontend
`agenttel.client.interaction.response_time_ms`	Frontend
`agenttel.client.interaction.target`	Frontend
`agenttel.client.interaction.type`	Frontend
`agenttel.client.journey.name`	Frontend
`agenttel.client.journey.started_at`	Frontend
`agenttel.client.journey.step`	Frontend
`agenttel.client.journey.total_steps`	Frontend
`agenttel.client.page.business_criticality`	Frontend
`agenttel.client.page.route`	Frontend
`agenttel.client.page.title`	Frontend
`agenttel.client.topology.domain`	Frontend
`agenttel.client.topology.team`	Frontend
`agenttel.correlation.change_id`	Change Correlation
`agenttel.correlation.confidence`	Change Correlation
`agenttel.correlation.likely_cause`	Change Correlation
`agenttel.correlation.time_delta_ms`	Change Correlation
`agenttel.decision.escalation_level`	Decisions
`agenttel.decision.fallback_available`	Decisions
`agenttel.decision.fallback_description`	Decisions
`agenttel.decision.idempotent`	Decisions
`agenttel.decision.known_issue_id`	Decisions
`agenttel.decision.retryable`	Decisions
`agenttel.decision.retry_after_ms`	Decisions
`agenttel.decision.runbook_url`	Decisions
`agenttel.decision.safe_to_restart`	Decisions
`agenttel.deployment.commit_sha`	Deployment
`agenttel.deployment.id`	Deployment
`agenttel.deployment.previous_version`	Deployment
`agenttel.deployment.strategy`	Deployment
`agenttel.deployment.timestamp`	Deployment
`agenttel.deployment.version`	Deployment
`agenttel.error.category`	Error Classification
`agenttel.error.dependency`	Error Classification
`agenttel.error.root_exception`	Error Classification
`agenttel.genai.cache_hit`	GenAI
`agenttel.genai.cost_usd`	GenAI
`agenttel.genai.framework`	GenAI
`agenttel.genai.guardrail_name`	GenAI
`agenttel.genai.guardrail_triggered`	GenAI
`agenttel.genai.rag_relevance_score_avg`	GenAI
`agenttel.genai.rag_source_count`	GenAI
`agenttel.session.id`	Sessions
`agenttel.session.incident_id`	Sessions
`agenttel.severity.anomaly_score`	Severity
`agenttel.severity.business_impact`	Severity
`agenttel.severity.impact_scope`	Severity
`agenttel.severity.pattern`	Severity
`agenttel.severity.user_facing`	Severity
`agenttel.slo.budget_remaining`	SLO
`agenttel.slo.burn_rate`	SLO
`agenttel.slo.name`	SLO
`agenttel.slo.target`	SLO
`agenttel.topology.consumers`	Topology
`agenttel.topology.dependencies`	Topology
`agenttel.topology.domain`	Topology
`agenttel.topology.on_call_channel`	Topology
`agenttel.topology.repo_url`	Topology
`agenttel.topology.team`	Topology
`agenttel.topology.tier`	Topology
`gen_ai.operation.name`	GenAI (OTel Standard)
`gen_ai.request.max_tokens`	GenAI (OTel Standard)
`gen_ai.request.model`	GenAI (OTel Standard)
`gen_ai.request.temperature`	GenAI (OTel Standard)
`gen_ai.request.top_p`	GenAI (OTel Standard)
`gen_ai.response.finish_reasons`	GenAI (OTel Standard)
`gen_ai.response.id`	GenAI (OTel Standard)
`gen_ai.response.model`	GenAI (OTel Standard)
`gen_ai.system`	GenAI (OTel Standard)
`gen_ai.usage.input_tokens`	GenAI (OTel Standard)
`gen_ai.usage.output_tokens`	GenAI (OTel Standard)

Topology¶

Service identity and dependency graph. Set once per service as OTel Resource attributes at startup. These attributes travel with every span exported by the service, giving agents immediate context about ownership, criticality, and the dependency graph without requiring a separate lookup.

Set by: AgentTelResourceProvider (OTel SPI ResourceProvider) -- runs at SDK initialization, reads from AgentTelGlobalState which is populated by @AgentObservable annotations, YAML configuration, or programmatic registration via TopologyRegistry.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.topology.team`	string	Free-form, e.g. `"payments-platform"`	Agent knows who to page when something breaks
`agenttel.topology.tier`	string	`critical`, `standard`, `internal`, `experimental`	Agent prioritizes critical services over internal tooling
`agenttel.topology.domain`	string	Free-form, e.g. `"commerce"`	Agent scopes blast radius to the right business domain
`agenttel.topology.on_call_channel`	string	Free-form, e.g. `"#payments-oncall"`	Agent knows where to escalate when human intervention is needed
`agenttel.topology.repo_url`	string	URL, e.g. `"https://github.com/org/repo"`	Agent can link alerts to source code for faster diagnosis
`agenttel.topology.dependencies`	string (JSON)	JSON array of dependency descriptors	Agent understands the upstream dependency graph
`agenttel.topology.consumers`	string (JSON)	JSON array of consumer descriptors	Agent understands downstream impact of failures

Detailed Reference¶

`agenttel.topology.tier`¶

Property	Value
Type	`string`
Set by	`AgentTelResourceProvider`
Appears on	Resource attributes
Default	Not set (attribute absent if not configured)

Why: An AI agent responding to an incident must prioritize. A failure in a critical service (user-facing, revenue-impacting) demands an immediate page, while the same failure in an experimental service might only warrant a log entry. Without tier information, the agent treats all services equally, leading to alert fatigue or missed critical issues.

Use case: Agent receives anomaly alerts from both payment-service (tier=critical) and internal-report-generator (tier=internal). It pages on-call for the payment service immediately but only sends a Slack notification for the report generator.

Example value: "critical"

Possible values:

Tier	Meaning
`critical`	User-facing, revenue-impacting. Pages on-call immediately.
`standard`	Important but not immediately revenue-impacting.
`internal`	Internal tooling and infrastructure.
`experimental`	Non-production or experimental services.

`agenttel.topology.dependencies`¶

Property	Value
Type	`string` (JSON-encoded array)
Set by	`AgentTelResourceProvider`
Appears on	Resource attributes
Default	Not set (attribute absent if no dependencies declared)

Why: When an agent detects a failure, it needs to understand whether the root cause is in this service or in a dependency. The dependency graph -- including criticality, timeout configuration, circuit breaker status, and fallback availability -- lets the agent trace failures upstream and determine the correct remediation path.

Use case: Agent sees payment-service throwing SocketTimeoutException. It checks agenttel.topology.dependencies, finds that postgres is a required dependency with circuit_breaker: true and timeout_ms: 5000. The agent knows to check postgres health and that a circuit breaker should eventually protect the service.

Example value:

[
  {
    "name": "postgres",
    "type": "database",
    "criticality": "required",
    "protocol": "postgresql",
    "timeout_ms": 5000,
    "circuit_breaker": true,
    "fallback": "Return cached data",
    "health_endpoint": "/health/postgres"
  }
]

`agenttel.topology.consumers`¶

Property	Value
Type	`string` (JSON-encoded array)
Set by	`AgentTelResourceProvider`
Appears on	Resource attributes
Default	Not set (attribute absent if no consumers declared)

Why: When a service degrades, the agent needs to know which downstream services are affected. Consumer descriptors encode who calls this service, whether those calls are synchronous (blocking the caller) or asynchronous (buffered), and what SLA expectations exist. This lets the agent accurately scope the blast radius of an incident.

Use case: Agent detects latency degradation in pricing-service. It reads agenttel.topology.consumers and finds that checkout-service calls it synchronously with a 200ms SLA. The agent knows checkout will be directly impacted and escalates accordingly.

Example value:

[
  {
    "name": "checkout-service",
    "consumption_pattern": "synchronous",
    "sla_latency_ms": 200
  }
]

Baselines¶

What "normal" looks like for each operation. Set as span attributes on every span for a registered operation. Baselines are the foundation for anomaly detection -- without knowing what "normal" is, an agent cannot determine whether current behavior is problematic.

Set by: AgentTelSpanProcessor (static/rolling baselines from @AgentOperation annotations or YAML config) and AgentTelEnrichingSpanExporter (confidence metrics added at export time).

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.baseline.latency_p50_ms`	double	>= 0, e.g. `45.0`	Agent knows the median expected latency
`agenttel.baseline.latency_p99_ms`	double	>= 0, e.g. `200.0`	Agent knows the tail latency expectation
`agenttel.baseline.error_rate`	double	0.0--1.0, e.g. `0.001`	Agent knows the expected background error rate
`agenttel.baseline.throughput_rps`	double	>= 0, e.g. `150.0`	Agent knows expected traffic volume
`agenttel.baseline.source`	string	`static`, `rolling`, `composite`, `default`	Agent knows how the baseline was determined
`agenttel.baseline.updated_at`	string	ISO 8601 timestamp	Agent knows how fresh the baseline is
`agenttel.baseline.slo`	string	SLO identifier, e.g. `"payment-availability"`	Agent links the baseline to a specific SLO
`agenttel.baseline.sample_count`	long	>= 0, e.g. `250`	Agent gauges statistical significance
`agenttel.baseline.confidence`	string	`low`, `medium`, `high`	Agent weighs how much to trust the baseline

Detailed Reference¶

`agenttel.baseline.latency_p50_ms`¶

Property	Value
Type	`double`
Set by	`AgentTelSpanProcessor`
Appears on	Span attributes
Default	Not set (attribute absent when no baseline is registered for the operation)

Why: The P50 (median) latency is the single most useful baseline metric for an AI agent. It represents what a typical request looks like. When the agent observes a span whose duration is 5x or 10x the P50, it can immediately flag a latency degradation anomaly. Without this number, the agent has no frame of reference for whether 312ms is good, bad, or catastrophic for a given operation.

Use case: Agent detects that the current span for POST /api/payments took 312ms while agenttel.baseline.latency_p50_ms is 45ms. This is a 6.9x deviation, clearly indicating a latency degradation anomaly. The agent checks the dependency graph and finds the root cause is elevated postgres latency.

Example value: 45.0

`agenttel.baseline.confidence`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter`
Appears on	Span attributes
Default	Not set (attribute absent when no rolling baseline data exists)

Why: Not all baselines are equally trustworthy. A rolling baseline computed from 5 observations is far less reliable than one computed from 500. The confidence level tells the agent whether to act decisively on a deviation or to treat it as uncertain. An agent should never page on-call based on a low-confidence baseline.

Use case: Agent detects a 3x latency deviation on a newly deployed endpoint. It checks agenttel.baseline.confidence and finds low (only 12 samples). Instead of paging on-call, the agent logs the anomaly and continues collecting data. Once confidence reaches high, the same deviation would trigger an immediate escalation.

Example value: "high"

Confidence thresholds:

Sample Count	Confidence	Meaning
< 30	`low`	Baseline is unreliable -- insufficient data
30--200	`medium`	Baseline is usable but may not capture edge cases
> 200	`high`	Baseline is statistically significant and reliable

`agenttel.baseline.source`¶

Property	Value
Type	`string`
Set by	`AgentTelSpanProcessor`
Appears on	Span attributes
Default	Not set (attribute absent when no baseline is registered)

Why: An agent's response should vary depending on how the baseline was determined. A static baseline from configuration reflects an intentional SLA target. A rolling baseline computed from live traffic reflects actual behavior (which may have drifted). A default baseline is a system-provided fallback with minimal confidence. Knowing the source lets the agent calibrate its anomaly detection thresholds appropriately.

Use case: Agent detects elevated latency. The baseline source is rolling, meaning it was computed from recent traffic. The agent knows this baseline adapts over time and checks the updated_at timestamp to ensure it is fresh enough to be meaningful.

Example value: "static"

Possible values:

Source	Meaning
`static`	From `@AgentOperation` annotation or YAML configuration file
`rolling`	Computed from a sliding window of observed traffic
`composite`	Static baseline with rolling fallback for unset fields
`default`	System default when no baseline is available

Decisions¶

What an AI agent is permitted and equipped to do when a problem occurs. Set as span attributes from @AgentOperation annotations or YAML configuration. Decision attributes encode human operator intent -- they are the guardrails that prevent an agent from taking harmful actions.

Set by: AgentTelSpanProcessor -- reads from OperationContextRegistry, which is populated by @AgentOperation annotations (scanned by AgentTelAnnotationBeanPostProcessor in Spring Boot) or YAML config (loaded by AgentTelConfigLoader in the javaagent extension).

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.decision.retryable`	boolean	`true` / `false`	Agent knows if retrying the operation is safe
`agenttel.decision.retry_after_ms`	long	>= 0, e.g. `1000`	Agent knows how long to wait before retrying
`agenttel.decision.idempotent`	boolean	`true` / `false`	Agent knows if duplicate calls are safe
`agenttel.decision.fallback_available`	boolean	`true` / `false`	Agent knows an alternative path exists
`agenttel.decision.fallback_description`	string	Free-form, e.g. `"Return cached pricing"`	Agent knows what the fallback does
`agenttel.decision.runbook_url`	string	URL	Agent can reference operational documentation
`agenttel.decision.escalation_level`	string	`auto_resolve`, `notify_team`, `page_oncall`, `incident_commander`	Agent knows the correct escalation path
`agenttel.decision.known_issue_id`	string	Issue ID, e.g. `"JIRA-1234"`	Agent links the problem to a known issue
`agenttel.decision.safe_to_restart`	boolean	`true` / `false`	Agent knows if restarting the service is safe

Detailed Reference¶

`agenttel.decision.retryable`¶

Property	Value
Type	`boolean`
Set by	`AgentTelSpanProcessor`
Appears on	Span attributes
Default	Not set (attribute absent -- agent should assume not retryable)

Why: Retrying a failed operation is one of the most common automated remediation actions, but it is also one of the most dangerous. Retrying a non-idempotent payment operation could charge a customer twice. This attribute encodes human operator knowledge about whether retry is safe for each specific operation.

Use case: Agent detects a dependency_timeout error on POST /api/payments. It checks agenttel.decision.retryable and finds false. Even though the error is transient, the agent does not retry because the operation is not marked as safe to retry. Instead, it follows the escalation path.

Example value: true

`agenttel.decision.escalation_level`¶

Property	Value
Type	`string`
Set by	`AgentTelSpanProcessor`
Appears on	Span attributes
Default	Not set (attribute absent -- agent should default to `notify_team`)

Why: Different operations warrant different levels of human involvement when they fail. A background data sync job might be safe for the agent to handle autonomously, while a payment processing failure requires an immediate page to the on-call engineer. The escalation level encodes this operational judgment so the agent responds proportionally.

Use case: Agent detects cascading failures in the payment service. It checks agenttel.decision.escalation_level and finds page_oncall. It immediately pages the on-call engineer via the channel specified in agenttel.topology.on_call_channel, rather than attempting autonomous remediation.

Example value: "page_oncall"

Possible values:

Level	Meaning
`auto_resolve`	Agent can handle autonomously without human involvement
`notify_team`	Send asynchronous notification to the owning team
`page_oncall`	Page the on-call engineer immediately
`incident_commander`	Escalate to incident management process

`agenttel.decision.fallback_description`¶

Property	Value
Type	`string`
Set by	`AgentTelSpanProcessor`
Appears on	Span attributes
Default	Not set (attribute absent when no fallback is described)

Why: When agenttel.decision.fallback_available is true, the agent needs to know what the fallback actually does so it can decide whether activating it is appropriate for the current failure mode. A fallback that returns cached data is suitable for a dependency timeout but not for a data corruption issue.

Use case: Agent detects that the pricing service dependency is down. It checks agenttel.decision.fallback_available (true) and reads agenttel.decision.fallback_description: "Return cached pricing from Redis, stale up to 5 minutes." The agent activates the fallback and notifies the team that cached pricing is being served.

Example value: "Return cached pricing from Redis, stale up to 5 minutes"

Anomaly¶

Real-time deviation detection results. Set as span attributes by the AgentTelSpanProcessor when a span's behavior deviates significantly from the registered baseline. Anomaly attributes are only present on spans where anomalous behavior was detected -- their absence means the span is behaving normally.

Set by: AgentTelSpanProcessor via the AnomalyDetector and PatternMatcher components -- runs during onEnd() span processing, comparing observed behavior against baselines from OperationContextRegistry.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.anomaly.detected`	boolean	`true` / `false`	Agent knows this span is anomalous
`agenttel.anomaly.pattern`	string	`cascade_failure`, `latency_degradation`, `error_rate_spike`, `memory_leak`, `thundering_herd`, `cold_start`	Agent knows the type of incident
`agenttel.anomaly.score`	double	0.0--1.0	Agent gauges the severity of the anomaly
`agenttel.anomaly.latency_z_score`	double	Any positive value, typically 0--10+	Agent measures how many standard deviations from normal

Detailed Reference¶

`agenttel.anomaly.pattern`¶

Property	Value
Type	`string`
Set by	`AgentTelSpanProcessor` (via `PatternMatcher`)
Appears on	Span attributes
Default	Not set (attribute absent when no pattern is detected)

Why: Knowing that something is anomalous is necessary but insufficient. An agent needs to know what kind of anomaly it is to take the right action. A cascade_failure requires checking multiple dependencies, a memory_leak requires restarting instances, and a cold_start requires patience. The pattern classification maps directly to different remediation playbooks.

Use case: Agent sees agenttel.anomaly.pattern = cascade_failure on the payment service. It checks agenttel.topology.dependencies and finds that 3 of 4 downstream dependencies are returning errors. The agent identifies the common upstream cause (a failing load balancer) and creates an incident linking all affected services.

Example value: "cascade_failure"

Pattern detection methods:

Pattern	Detection	Typical Remediation
`cascade_failure`	3+ dependencies with errors in recent window	Identify common upstream cause, circuit break
`latency_degradation`	Current latency > 2x rolling P50	Check dependency latency, scale up
`error_rate_spike`	Recent error rate > 5x baseline	Check recent deployments, rollback if needed
`memory_leak`	Positive slope in latency linear regression	Restart instances, investigate heap usage
`thundering_herd`	Traffic burst exceeding normal patterns	Rate limit, shed load, scale out
`cold_start`	High latency with low request count	Wait for warm-up, pre-warm caches

`agenttel.anomaly.score`¶

Property	Value
Type	`double`
Set by	`AgentTelSpanProcessor` (via `AnomalyDetector`)
Appears on	Span attributes
Default	Not set (attribute absent when no anomaly is detected)

Why: The anomaly score provides a normalized severity metric (0.0 to 1.0) that lets agents compare anomalies across different operations and services. A score of 0.3 might warrant monitoring, while 0.9 demands immediate action. This score feeds into the severity assessment and business impact calculation.

Use case: Agent receives anomaly alerts from two services simultaneously. payment-service has agenttel.anomaly.score = 0.92 and notification-service has score = 0.35. The agent triages the payment service first because the higher score indicates a more severe deviation from normal behavior.

Example value: 0.85

Error Classification¶

Structured error categorization that tells agents why a span failed, not just that it failed. Set as span attributes at export time for spans with error status.

Set by: AgentTelEnrichingSpanExporter (via ErrorClassifier) -- runs during span export, analyzes exception types, HTTP status codes, and exception messages to classify errors.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.error.category`	string	`dependency_timeout`, `connection_error`, `code_bug`, `rate_limited`, `auth_failure`, `resource_exhaustion`, `data_validation`, `unknown`	Agent knows the failure class and appropriate response
`agenttel.error.root_exception`	string	Java exception class name, e.g. `"java.net.SocketTimeoutException"`	Agent classifies the root cause at the code level
`agenttel.error.dependency`	string	Dependency name, e.g. `"postgres"`	Agent knows which dependency caused the failure

Detailed Reference¶

`agenttel.error.category`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter` (via `ErrorClassifier`)
Appears on	Span attributes
Default	`"unknown"` (set on all error spans; defaults to `unknown` when classification rules do not match)

Why: Standard OTel error status tells the agent that a span failed, but the same "error" status covers both a NullPointerException (code bug, do not retry) and a SocketTimeoutException (transient dependency issue, retry is appropriate). Error classification maps raw exceptions to actionable categories, each with a distinct remediation strategy.

Use case: Agent sees error spans on POST /api/payments. It reads agenttel.error.category = dependency_timeout and agenttel.error.dependency = postgres. Instead of investigating application code, the agent checks postgres health, finds connection pool exhaustion, and triggers a scaling action.

Example value: "dependency_timeout"

Classification rules:

Category	Triggering Conditions	Agent Action
`dependency_timeout`	Exception contains `Timeout` / `SocketTimeout`	Retry with backoff, check dependency health
`connection_error`	Exception contains `Connection` / `ConnectException`	Check dependency availability, circuit break
`code_bug`	`NullPointerException`, `ClassCastException`, `IndexOutOfBoundsException`, `IllegalStateException`	Do not retry -- needs code fix
`rate_limited`	HTTP 429	Back off, reduce traffic, request quota increase
`auth_failure`	HTTP 401 / 403	Check credentials/tokens, do not retry
`resource_exhaustion`	`OutOfMemoryError`, `StackOverflowError`	Scale up, restart instances
`data_validation`	HTTP 400 / 422, `ValidationException`, `IllegalArgumentException`	Do not retry -- fix input
`unknown`	Everything else	Investigate manually

`agenttel.error.root_exception`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter` (via `ErrorClassifier`)
Appears on	Span attributes
Default	Not set (attribute absent when no exception is recorded on the span)

Why: While agenttel.error.category provides high-level classification, the root exception class name gives agents the precision to match against known issues, search issue trackers, and correlate with specific code paths. It records the deepest cause in the exception chain, stripping away wrapper exceptions.

Use case: Agent sees agenttel.error.root_exception = org.postgresql.util.PSQLException and cross-references it with agenttel.decision.known_issue_id = "JIRA-5678". It finds the known issue is a connection pool sizing bug with a documented workaround and applies the fix automatically.

Example value: "java.net.SocketTimeoutException"

Causality¶

Root cause analysis attributes that help agents trace failures back to their origin. Set as span attributes at export time.

Set by: AgentTelEnrichingSpanExporter (via CausalityTracker and OperationDependencyTracker) -- runs during span export, correlates error spans with dependency health data and recent events.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.cause.hint`	string	Human-readable description	Agent gets a concise root cause explanation
`agenttel.cause.category`	string	`dependency`, `code`, `infrastructure`, `traffic`, `unknown`	Agent categorizes the root cause domain
`agenttel.cause.dependency`	string	Dependency name	Agent identifies the specific failing dependency
`agenttel.cause.correlated_span_id`	string	Span ID (hex)	Agent traces to the root cause span
`agenttel.cause.correlated_event_id`	string	Event ID	Agent links to the triggering event
`agenttel.cause.started_at`	string	ISO 8601 timestamp	Agent knows when the issue first appeared

Detailed Reference¶

`agenttel.cause.hint`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter` (via `CausalityTracker`)
Appears on	Span attributes
Default	Not set (attribute absent when causality cannot be determined)

Why: The cause hint is a human-readable, agent-consumable explanation of why a span failed. It synthesizes information from error classification, dependency health, and recent events into a single actionable sentence. This is the attribute agents include in incident summaries and escalation messages.

Use case: Agent constructs an incident report and includes the cause hint: "Dependency postgres is unhealthy: Connection refused on host db-primary:5432. First observed 3 minutes ago." This gives the on-call engineer immediate context without needing to dig through logs.

Example value: "Dependency postgres is unhealthy: Connection refused on host db-primary:5432"

`agenttel.cause.category`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter` (via `CausalityTracker`)
Appears on	Span attributes
Default	`"unknown"` (set when causality analysis runs but cannot determine a specific category)

Why: The cause category tells the agent which domain the root cause belongs to, driving the selection of the appropriate remediation playbook. A dependency cause means the agent should investigate upstream services. A code cause means no automated fix is possible. An infrastructure cause points to compute, network, or storage issues.

Use case: Agent sees agenttel.cause.category = infrastructure combined with agenttel.anomaly.pattern = latency_degradation. It checks node-level metrics, finds elevated CPU on the host, and triggers an auto-scaling action rather than investigating application code.

Example value: "dependency"

Possible values:

Category	Meaning
`dependency`	Failure caused by an upstream dependency
`code`	Failure caused by application code (bugs, unhandled cases)
`infrastructure`	Failure caused by compute, network, or storage issues
`traffic`	Failure caused by traffic patterns (overload, thundering herd)
`unknown`	Root cause could not be determined

Severity¶

Business impact assessment that helps agents prioritize their response. Set as span attributes at export time.

Set by: AgentTelEnrichingSpanExporter -- synthesizes anomaly scores, service tier, and error status into a business impact assessment.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.severity.anomaly_score`	double	0.0--1.0	Agent gauges the overall severity magnitude
`agenttel.severity.pattern`	string	Incident pattern name	Agent knows the type of incident for playbook selection
`agenttel.severity.impact_scope`	string	`operation_specific`, `service_wide`, `cross_service`	Agent scopes the blast radius
`agenttel.severity.business_impact`	string	`critical`, `high`, `medium`, `low`	Agent prioritizes response based on business impact
`agenttel.severity.user_facing`	boolean	`true` / `false`	Agent knows if end users are affected

Detailed Reference¶

`agenttel.severity.business_impact`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter`
Appears on	Span attributes
Default	Not set (attribute absent when no anomaly or error is detected)

Why: The business impact level is the single most important triage signal for an agent. It combines anomaly severity with service tier to produce a unified priority. An error on a critical-tier service is automatically high impact even if the anomaly score is moderate, while the same error on an internal-tier service is low impact.

Use case: Agent is handling three simultaneous alerts. It sorts by agenttel.severity.business_impact: the critical alert (payment processing down, score > 0.8) gets immediate attention, the high alert (error on critical-tier order service) gets queued, and the low alert (validation error on internal admin tool) gets logged for later review.

Example value: "critical"

Determination rules:

Impact	Condition
`critical`	Anomaly score > 0.8
`high`	Error on critical-tier service
`medium`	Error on standard-tier service or moderate anomaly
`low`	Minor anomaly or data validation error

`agenttel.severity.impact_scope`¶

Property	Value
Type	`string`
Set by	`AgentTelEnrichingSpanExporter`
Appears on	Span attributes
Default	Not set (attribute absent when no severity assessment is performed)

Why: An issue affecting a single operation requires a different response than one affecting the entire service or multiple services. The impact scope tells the agent whether to investigate narrowly (one endpoint) or broadly (service-wide or cross-service), and whether to coordinate with agents monitoring other services.

Use case: Agent sees agenttel.severity.impact_scope = cross_service on an anomaly in the API gateway. It queries health data for all downstream services, correlates the timing with a recent deployment event, and coordinates a multi-service incident response.

Example value: "service_wide"

Change Correlation¶

Correlates anomalies with recent changes to help agents identify the probable trigger. Set on incident context objects constructed by the agent layer.

Set by: ChangeCorrelationEngine in the agenttel-agent module -- analyzes recent deployment events, configuration changes, and scaling events against anomaly onset timestamps.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.correlation.likely_cause`	string	`deployment`, `config`, `scaling`, `feature_flag`, `dependency_update`	Agent identifies the probable trigger for the incident
`agenttel.correlation.change_id`	string	Change identifier, e.g. `"deploy-v2.1.0"`	Agent links to the specific change for rollback decisions
`agenttel.correlation.time_delta_ms`	long	>= 0, e.g. `1800000`	Agent gauges temporal proximity between change and anomaly
`agenttel.correlation.confidence`	double	0.0--1.0	Agent weighs the strength of the correlation

Detailed Reference¶

`agenttel.correlation.likely_cause`¶

Property	Value
Type	`string`
Set by	`ChangeCorrelationEngine`
Appears on	Incident context (used in MCP tool responses and incident reports)
Default	Not set (attribute absent when no correlated change is found)

Why: The most common cause of production incidents is a recent change. When an agent can automatically correlate an anomaly with a deployment that happened 10 minutes ago, it can suggest or execute a rollback without human investigation. This attribute identifies the type of change most likely responsible.

Use case: Agent detects an error rate spike starting 12 minutes ago. ChangeCorrelationEngine finds a deployment (deploy-v2.1.0) that completed 15 minutes ago with confidence 0.92. The agent recommends rollback to v2.0.9 and includes the deployment diff link in the incident report.

Example value: "deployment"

`agenttel.correlation.confidence`¶

Property	Value
Type	`double`
Set by	`ChangeCorrelationEngine`
Appears on	Incident context
Default	Not set (attribute absent when no correlated change is found)

Why: Not all correlations are meaningful -- a config change 6 hours ago is less likely to be the cause than a deployment 5 minutes ago. The confidence score encodes temporal proximity, change scope, and historical correlation patterns so the agent can decide whether to recommend rollback (high confidence) or just flag the correlation for human review (low confidence).

Use case: Agent finds two correlated changes: a deployment 3 minutes before the anomaly (confidence 0.95) and a config change 2 hours before (confidence 0.15). It recommends rolling back the deployment and ignores the config change.

Example value: 0.85

SLO¶

Error budget consumption tracking. Set as span attributes when SLOs are registered for the operation.

Set by: AgentTelSpanProcessor (via SloTracker) -- evaluates each span against registered SLO definitions and computes budget consumption in real time.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.slo.name`	string	SLO identifier, e.g. `"payment-availability"`	Agent tracks which SLO is being measured
`agenttel.slo.target`	double	0.0--1.0, e.g. `0.999`	Agent knows the SLO target to compare against
`agenttel.slo.budget_remaining`	double	0.0--1.0, e.g. `0.85`	Agent knows how much error budget remains
`agenttel.slo.burn_rate`	double	>= 0, e.g. `0.15`	Agent detects accelerating budget consumption

Detailed Reference¶

`agenttel.slo.budget_remaining`¶

Property	Value
Type	`double`
Set by	`AgentTelSpanProcessor` (via `SloTracker`)
Appears on	Span attributes
Default	Not set (attribute absent when no SLO is registered for the operation)

Why: Error budget remaining is the key metric for SLO-driven incident response. When the budget is > 50%, minor anomalies can be monitored. When it drops below 25%, the agent should restrict risky changes. Below 10%, the agent should escalate aggressively and consider freezing deployments. This single number drives a graduated response strategy.

Use case: Agent detects intermittent errors on the payment service. It checks agenttel.slo.budget_remaining = 0.12 (12% remaining) and agenttel.slo.burn_rate = 0.78 (consuming budget at 78% of the sustainable rate). The agent emits a critical SLO budget alert and recommends pausing non-essential deployments until the error rate stabilizes.

Example value: 0.85

`agenttel.slo.burn_rate`¶

Property	Value
Type	`double`
Set by	`AgentTelSpanProcessor` (via `SloTracker`)
Appears on	Span attributes
Default	Not set (attribute absent when no SLO is registered for the operation)

Why: While budget_remaining is a snapshot, burn_rate shows the velocity of budget consumption. A burn rate of 1.0 means the budget is being consumed at exactly the sustainable rate. A burn rate of 10.0 means the budget will be exhausted 10x faster than expected. This lets agents predict budget exhaustion and escalate proactively before the budget runs out.

Use case: Agent sees agenttel.slo.burn_rate = 5.2, meaning the error budget is being consumed 5x faster than sustainable. Even though budget_remaining is still 0.45 (healthy), the agent projects budget exhaustion within hours and proactively notifies the team.

Example value: 0.15

Deployment¶

Deployment tracking attributes for change correlation. Set on span events and startup events by the deployment event emitter.

Set by: DeploymentEventEmitter (in agenttel-core) and AgentTelDeploymentEventListener (in agenttel-spring-boot-starter) -- emits a structured event at service startup containing deployment metadata.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.deployment.id`	string	Deployment ID, e.g. `"deploy-20240115-1430"`	Agent tracks individual deployments
`agenttel.deployment.version`	string	Version string, e.g. `"2.1.0"`	Agent knows the current running version
`agenttel.deployment.commit_sha`	string	Git SHA, e.g. `"a1b2c3d4"`	Agent can link to exact code changes
`agenttel.deployment.previous_version`	string	Version string, e.g. `"2.0.9"`	Agent knows what to rollback to
`agenttel.deployment.strategy`	string	`blue-green`, `canary`, `rolling`	Agent understands the deployment mechanism
`agenttel.deployment.timestamp`	string	ISO 8601, e.g. `"2024-01-15T14:30:00Z"`	Agent knows when the deployment happened

Detailed Reference¶

`agenttel.deployment.previous_version`¶

Property	Value
Type	`string`
Set by	`DeploymentEventEmitter`
Appears on	Span events (deployment event)
Default	Not set (attribute absent on first deployment or when previous version is unknown)

Why: When an agent decides to recommend or execute a rollback, it needs to know which version to rollback to. Without previous_version, the agent can identify that the current deployment is problematic but cannot specify the safe target version, requiring human intervention to determine the rollback target.

Use case: Agent correlates a latency spike with the deployment of version 2.1.0 (deployed 8 minutes ago). It reads agenttel.deployment.previous_version = "2.0.9" and recommends: "Rollback payment-service from v2.1.0 to v2.0.9 -- latency degradation correlated with deployment (confidence: 0.93)."

Example value: "2.0.9"

`agenttel.deployment.strategy`¶

Property	Value
Type	`string`
Set by	`DeploymentEventEmitter`
Appears on	Span events (deployment event)
Default	Not set (attribute absent when deployment strategy is not configured)

Why: The deployment strategy determines the blast radius of a bad deployment and the rollback mechanism. A canary deployment means only a fraction of traffic is affected -- the agent can halt the canary rather than doing a full rollback. A blue-green deployment allows instant rollback by switching traffic. A rolling deployment may require waiting for all instances to be replaced.

Use case: Agent detects errors in a canary deployment (strategy=canary). Instead of triggering a full rollback, it halts canary promotion and routes all traffic back to the stable version, minimizing user impact.

Example value: "canary"

GenAI¶

Attributes for AI/ML workload observability. Combines the standard OTel gen_ai.* namespace with AgentTel extensions in the agenttel.genai.* namespace.

Set by: GenAI instrumentation wrappers in the agenttel-genai module -- TracingChatLanguageModel (LangChain4j), SpringAiSpanEnricher (Spring AI), TracingAnthropicClient, TracingOpenAIClient, BedrockTracing, and CostEnrichingSpanExporter.

AgentTel GenAI Extensions¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.genai.framework`	string	`langchain4j`, `spring_ai`, `anthropic`, `openai`, `bedrock`	Agent knows the instrumentation source framework
`agenttel.genai.cost_usd`	double	>= 0, e.g. `0.000795`	Agent tracks per-request cost for budget monitoring
`agenttel.genai.rag_source_count`	long	>= 0, e.g. `5`	Agent monitors RAG retrieval volume
`agenttel.genai.rag_relevance_score_avg`	double	0.0--1.0, e.g. `0.87`	Agent assesses retrieval quality
`agenttel.genai.guardrail_triggered`	boolean	`true` / `false`	Agent monitors safety guardrail activations
`agenttel.genai.guardrail_name`	string	Guardrail identifier, e.g. `"pii_filter"`	Agent knows which guardrail was triggered
`agenttel.genai.cache_hit`	boolean	`true` / `false`	Agent tracks cache efficiency for cost optimization

Standard OTel GenAI Attributes¶

These follow the emerging OTel GenAI semantic conventions (gen_ai.* namespace). AgentTel populates them for all supported providers.

Attribute	Type	Possible Values	Why an Agent Needs This
`gen_ai.operation.name`	string	`chat`, `text_completion`, `embeddings`	Agent identifies the GenAI operation type
`gen_ai.system`	string	`openai`, `anthropic`, `aws_bedrock`	Agent knows which provider is being called
`gen_ai.request.model`	string	Model identifier, e.g. `"gpt-4"`	Agent knows which model was requested
`gen_ai.response.model`	string	Model identifier, e.g. `"gpt-4-0125-preview"`	Agent knows the actual model that responded
`gen_ai.usage.input_tokens`	long	>= 0	Agent monitors input token consumption
`gen_ai.usage.output_tokens`	long	>= 0	Agent monitors output token consumption
`gen_ai.request.temperature`	double	0.0--2.0	Agent sees the sampling temperature used
`gen_ai.request.max_tokens`	long	>= 0	Agent sees the max output token limit
`gen_ai.request.top_p`	double	0.0--1.0	Agent sees the nucleus sampling parameter
`gen_ai.response.id`	string	Response identifier	Agent correlates responses across retries
`gen_ai.response.finish_reasons`	string[]	`stop`, `length`, `tool_calls`	Agent knows why generation stopped

Detailed Reference¶

`agenttel.genai.cost_usd`¶

Property	Value
Type	`double`
Set by	`CostEnrichingSpanExporter` (via `ModelCostCalculator`)
Appears on	Span attributes
Default	Not set (attribute absent when model pricing is not configured)

Why: GenAI API costs can spike unexpectedly -- a prompt injection or retry loop can burn through budget in minutes. By attaching per-request cost to every span, agents can detect cost anomalies in real time, enforce budget limits, and alert when spending exceeds thresholds.

Use case: Agent aggregates agenttel.genai.cost_usd over a 5-minute window and detects that spending is 10x the rolling average. It investigates and finds a retry loop caused by a downstream timeout, halts the retry, and reports the cost impact ($47.30 in unnecessary spending).

Example value: 0.000795

`agenttel.genai.rag_relevance_score_avg`¶

Property	Value
Type	`double`
Set by	`TracingContentRetriever` (LangChain4j) or `SpringAiSpanEnricher`
Appears on	Span attributes
Default	Not set (attribute absent when RAG is not used or relevance scores are not available)

Why: Low retrieval quality directly impacts LLM response quality. When the average relevance score drops below a threshold, it means the retrieval pipeline is returning irrelevant documents, leading to hallucinations or poor answers. Agents can detect this degradation and alert on retrieval quality before users notice response quality issues.

Use case: Agent monitors agenttel.genai.rag_relevance_score_avg across requests to a customer support chatbot. It detects a drop from 0.87 to 0.42 after a vector index rebuild. The agent alerts the ML engineering team that retrieval quality has degraded and the index rebuild may need to be reverted.

Example value: 0.87

`gen_ai.response.finish_reasons`¶

Property	Value
Type	`string[]` (string array)
Set by	GenAI instrumentation wrappers (`TracingChatLanguageModel`, `TracingAnthropicClient`, etc.)
Appears on	Span attributes
Default	Not set (attribute absent when the provider does not return finish reasons)

Why: The finish reason tells the agent why the LLM stopped generating. A stop finish is normal. A length finish means the output was truncated at max_tokens, which may indicate the response is incomplete and the user got a degraded experience. A tool_calls finish means the LLM wants to invoke a tool. Agents can detect elevated length finishes as a quality degradation signal.

Use case: Agent detects that 40% of responses from the code generation endpoint finish with reason length. It recommends increasing max_tokens from 1024 to 2048 for that operation and estimates the cost impact using agenttel.genai.cost_usd data.

Example value: ["stop"]

Agent Identity¶

Tracks which AI agent performed each action. Set on action spans created when agents interact with the system through MCP tools.

Set by: AgentActionTracker in the agenttel-agent module -- wraps each agent action (MCP tool invocation, remediation execution) in a span with identity attributes.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.agent.id`	string	Agent identifier, e.g. `"diag-agent-1"`	Track which agent performed each action
`agenttel.agent.role`	string	`observer`, `diagnostician`, `remediator`, `admin`	Track the agent's role and permission level
`agenttel.agent.session_id`	string	Session UUID	Link actions to a collaboration session

Detailed Reference¶

`agenttel.agent.role`¶

Property	Value
Type	`string`
Set by	`AgentActionTracker`
Appears on	Span attributes (on agent action spans)
Default	Not set (attribute absent when agent identity is not registered)

Why: Different agents have different permission levels. An observer agent can read telemetry but cannot take remediation actions. A remediator can execute approved playbooks. An admin can perform any action. By recording the role on each span, the system maintains an audit trail of who did what, and the ToolPermissionRegistry can enforce role-based access control on MCP tools.

Use case: Audit review reveals that a remediator-role agent restarted a payment service instance during an incident. The role on the span confirms the agent was authorized for this action class. If an observer-role agent had attempted the same action, the ToolPermissionRegistry would have denied it.

Example value: "diagnostician"

Predefined roles:

Role	Permissions
`observer`	Read-only access to telemetry data, health status, and incident context
`diagnostician`	Observer permissions plus ability to run diagnostic queries and trace analysis
`remediator`	Diagnostician permissions plus ability to execute approved remediation playbooks
`admin`	Full access to all tools including service restarts and configuration changes

Sessions¶

Shared incident session tracking for multi-agent collaboration. Set on session-related operations managed by the SessionManager.

Set by: SessionManager and IncidentSession in the agenttel-agent module -- creates and manages collaborative sessions where multiple agents can work on the same incident.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.session.id`	string	UUID, e.g. `"a3f2b1c4-d5e6-7890-abcd-ef1234567890"`	Uniquely identifies the collaboration session
`agenttel.session.incident_id`	string	Incident identifier, e.g. `"inc-payment-spike-20240115"`	Links the session to a specific incident

Detailed Reference¶

`agenttel.session.id`¶

Property	Value
Type	`string`
Set by	`SessionManager`
Appears on	Span attributes (on session-scoped operations)
Default	Not set (attribute absent when no session is active)

Why: When multiple agents collaborate on an incident, they need a shared context. The session ID links all agent actions, diagnostic queries, and remediation steps to the same incident investigation. This enables post-incident review of the full agent collaboration timeline and prevents duplicate work.

Use case: A diagnostician agent and a remediator agent are both investigating a payment service outage. Both record agenttel.session.id = "a3f2b1c4" on their spans. In post-incident review, the team can trace the complete investigation: the diagnostician identified postgres as the root cause at T+2min, and the remediator executed a connection pool scaling action at T+4min.

Example value: "a3f2b1c4-d5e6-7890-abcd-ef1234567890"

Circuit Breaker¶

Circuit breaker state change tracking. Set on event attributes when circuit breakers transition between states.

Set by: AgentTelEventEmitter in the agenttel-core module -- emits structured events when circuit breaker state changes are recorded.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.circuit_breaker.name`	string	Breaker identifier, e.g. `"postgres-breaker"`	Agent identifies which circuit breaker changed
`agenttel.circuit_breaker.previous_state`	string	`closed`, `open`, `half_open`	Agent knows the state before the transition
`agenttel.circuit_breaker.new_state`	string	`closed`, `open`, `half_open`	Agent knows the current state
`agenttel.circuit_breaker.failure_count`	long	>= 0	Agent knows how many failures triggered the transition
`agenttel.circuit_breaker.dependency`	string	Dependency name, e.g. `"postgres"`	Agent links the breaker to a specific dependency

Detailed Reference¶

`agenttel.circuit_breaker.new_state`¶

Property	Value
Type	`string`
Set by	`AgentTelEventEmitter`
Appears on	Event attributes (on circuit breaker state change events)
Default	Not set (only present on circuit breaker events)

Why: Circuit breaker state transitions are critical operational signals. When a breaker opens, it means a dependency has exceeded its failure threshold and the service is now returning fallback responses (or failing fast). When it transitions to half-open, the service is testing whether the dependency has recovered. The agent needs to know these transitions to understand service behavior and correlate them with anomalies.

Use case: Agent sees agenttel.circuit_breaker.new_state = open for the postgres breaker. It correlates this with the dependency_timeout errors on POST /api/payments and confirms that the circuit breaker is protecting the service. The agent monitors for the half_open transition to verify recovery.

Example value: "open"

State machine:

State	Meaning
`closed`	Normal operation -- requests are forwarded to the dependency
`open`	Failure threshold exceeded -- requests are short-circuited (fallback or fail-fast)
`half_open`	Testing recovery -- a limited number of requests are forwarded to check if the dependency has recovered

Agentic¶

Agent lifecycle instrumentation attributes from the agenttel-agentic module. These attributes instrument the AI agent runtime — invocations, reasoning, orchestration, cost, quality, and safety.

Set by: agenttel-agentic module classes — AgentTracer, AgentInvocation, scope classes (ToolCallScope, TaskScope, HandoffScope, etc.), AgentCostAggregator, GuardrailRecorder, LoopDetector, and QualityTracker.

The agenttel.agentic.* namespace contains 70+ attributes across 17 categories. For the complete reference with all enum values, span names, and detailed descriptions, see the Agentic Attributes Reference.

Summary of categories:

Category	Key Attributes	Span Name
Agent Identity	`agent.name`, `agent.type`, `agent.framework`	`invoke_agent`
Invocation	`invocation.id`, `invocation.goal`, `invocation.status`, `invocation.steps`	`invoke_agent`
Step / Reasoning	`step.number`, `step.type`, `step.iteration`	`agenttel.agentic.step`
Tool Calls	`step.tool_name`, `step.tool_status`	`agenttel.agentic.tool_call`
Task Tracking	`task.id`, `task.name`, `task.depth`, `task.parent_id`	`agenttel.agentic.task`
Orchestration	`orchestration.pattern`, `orchestration.stage`, `orchestration.parallel_branches`	`agenttel.agentic.session`
Handoff	`handoff.from_agent`, `handoff.to_agent`, `handoff.chain_depth`	`agenttel.agentic.handoff`
Cost	`cost.total_usd`, `cost.input_tokens`, `cost.output_tokens`, `cost.llm_calls`	On `invoke_agent` / session
Quality	`quality.goal_achieved`, `quality.loop_detected`, `quality.eval_score`	On `invoke_agent`
Guardrail	`guardrail.triggered`, `guardrail.name`, `guardrail.action`	`agenttel.agentic.guardrail`
Human Checkpoint	`human.checkpoint_type`, `human.decision`, `human.wait_ms`	`agenttel.agentic.human_input`
Code Execution	`code.language`, `code.status`, `code.sandboxed`	`agenttel.agentic.code_execution`
Evaluation	`eval.scorer_name`, `eval.score`, `eval.type`	`agenttel.agentic.evaluate`
Retrieval	`retrieval.query`, `retrieval.document_count`, `retrieval.relevance_score_avg`	`agenttel.agentic.retriever`
Reranker	`reranker.model`, `reranker.input_documents`, `reranker.top_score`	`agenttel.agentic.reranker`
Memory	`memory.operation`, `memory.store_type`, `memory.items`	`agenttel.agentic.memory`
Error Classification	`error.source`, `error.category`, `error.retryable`	On `invoke_agent`

Frontend¶

Client-side telemetry from the @agenttel/web browser SDK. These attributes provide full-stack observability by tracking user-facing behavior, client-side anomalies, and cross-stack trace correlation.

Frontend attributes use the agenttel.client.* namespace to distinguish them from server-side attributes.

Set by: @agenttel/web browser SDK -- instruments page loads, API calls, user interactions, and user journeys. Exports spans via OTLP HTTP to any OTel-compatible collector.

Resource Attributes¶

Set once per browser application at SDK initialization.

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.app.name`	string	App name, e.g. `"checkout-web"`	Agent identifies the frontend application
`agenttel.client.app.version`	string	Semver, e.g. `"1.0.0"`	Agent tracks frontend version for change correlation
`agenttel.client.app.platform`	string	`browser`	Agent knows the runtime platform
`agenttel.client.app.environment`	string	`production`, `staging`, etc.	Agent filters by environment
`agenttel.client.topology.team`	string	Team name, e.g. `"checkout-frontend"`	Agent routes frontend issues to the right team
`agenttel.client.topology.domain`	string	Business domain, e.g. `"commerce"`	Agent groups frontend with related backend services

Page and Route Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.page.route`	string	Route pattern, e.g. `"/checkout/:step"`	Agent groups spans by route for baseline comparison
`agenttel.client.page.title`	string	Document title, e.g. `"Checkout - Payment"`	Agent includes human-readable page context in alerts
`agenttel.client.page.business_criticality`	string	`revenue`, `engagement`, `internal`	Agent prioritizes revenue-impacting pages

Baseline Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.baseline.page_load_p50_ms`	double	>= 0, e.g. `800.0`	Agent knows expected page load time
`agenttel.client.baseline.page_load_p99_ms`	double	>= 0, e.g. `2000.0`	Agent knows tail page load expectation
`agenttel.client.baseline.api_call_p50_ms`	double	>= 0, e.g. `300.0`	Agent knows expected API response time from the browser
`agenttel.client.baseline.interaction_error_rate`	double	0.0--1.0, e.g. `0.01`	Agent knows expected client-side error rate
`agenttel.client.baseline.source`	string	`static`, `rolling`	Agent knows how the baseline was determined

Decision Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.decision.escalation_level`	string	`auto_resolve`, `notify_team`, `page_oncall`, `incident_commander`	Agent knows the client-side escalation path
`agenttel.client.decision.runbook_url`	string	URL	Agent references frontend operational docs
`agenttel.client.decision.fallback_page`	string	Route path, e.g. `"/maintenance"`	Agent knows where to redirect on failure
`agenttel.client.decision.retry_on_failure`	boolean	`true` / `false`	Agent knows if page reload is safe
`agenttel.client.decision.user_facing`	boolean	`true` / `false`	Agent confirms this affects real users

Anomaly Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.anomaly.detected`	boolean	`true` / `false`	Agent knows a client-side anomaly was detected
`agenttel.client.anomaly.pattern`	string	`rage_click`, `api_failure_cascade`, `slow_page_load`, `error_loop`, `funnel_dropoff`	Agent knows the type of user-facing issue
`agenttel.client.anomaly.score`	double	0.0--1.0	Agent gauges client-side anomaly severity

Client-side anomaly patterns:

Pattern	Detection	Impact
`rage_click`	N+ clicks on same element within time window	User frustration -- UI is unresponsive
`api_failure_cascade`	N+ API failures within time window	Backend instability visible to user
`slow_page_load`	Load time exceeds baseline by multiplier	Performance degradation on route
`error_loop`	N+ errors on same route within time window	Repeating failure preventing user progress
`funnel_dropoff`	Journey abandonment above baseline	User journey failing at specific step

Journey Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.journey.name`	string	Journey name, e.g. `"checkout"`	Agent tracks critical user journeys
`agenttel.client.journey.step`	int	0-based step index	Agent knows which step the user is on
`agenttel.client.journey.total_steps`	int	Total steps in journey	Agent knows journey completion progress
`agenttel.client.journey.started_at`	string	ISO 8601 timestamp	Agent measures total journey duration

Interaction Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.interaction.type`	string	`click`, `submit`, `custom`	Agent categorizes user interaction type
`agenttel.client.interaction.target`	string	Element identifier, e.g. `"button#submit-payment"`	Agent identifies the UI element involved
`agenttel.client.interaction.outcome`	string	`success`, `error`	Agent knows if the interaction succeeded
`agenttel.client.interaction.response_time_ms`	double	>= 0	Agent measures interaction responsiveness

Correlation Attributes¶

Attribute	Type	Possible Values	Why an Agent Needs This
`agenttel.client.correlation.backend_trace_id`	string	32-char hex trace ID	Agent links browser spans to backend traces
`agenttel.client.correlation.backend_service`	string	Service name	Agent knows which backend service handled the request
`agenttel.client.correlation.backend_operation`	string	Operation name	Agent traces to the specific backend operation

Detailed Reference¶

`agenttel.client.anomaly.pattern`¶

Property	Value
Type	`string`
Set by	`@agenttel/web` browser SDK (anomaly detector)
Appears on	Span attributes (on client-side spans)
Default	Not set (attribute absent when no anomaly is detected)

Why: Client-side anomalies like rage clicks and error loops are signals of user frustration that backend metrics may not capture. A backend service can return 200 OK while the JavaScript rendering is broken, leaving users unable to complete their task. Client-side pattern detection catches these user-facing issues that would otherwise go unnoticed.

Use case: Agent detects agenttel.client.anomaly.pattern = rage_click on the checkout page's "Submit Payment" button. It checks agenttel.client.correlation.backend_trace_id and finds the backend call succeeded (200 OK, 45ms). The issue is a frontend rendering bug where the button appears clickable but the form submission is blocked by a JavaScript error. The agent alerts the frontend team with the specific element identifier.

Example value: "rage_click"

`agenttel.client.correlation.backend_trace_id`¶

Property	Value
Type	`string`
Set by	`@agenttel/web` browser SDK (from `traceparent` response header or server-timing header)
Appears on	Span attributes (on client-side API call spans)
Default	Not set (attribute absent when the backend does not return trace context in response headers)

Why: Full-stack incident investigation requires linking what the user sees in the browser to what happened on the server. The backend trace ID lets an agent follow a user's API call from the browser, through the API gateway, to the backend service, and into its dependencies. Without this link, frontend and backend incidents are investigated in isolation, missing the full picture.

Use case: Agent detects slow page load on the checkout page. It reads agenttel.client.correlation.backend_trace_id = "abc123def456" and queries the backend tracing system. It finds that the backend span for POST /api/payments shows a dependency_timeout on the fraud detection service, confirming the slow page load is caused by a backend dependency issue, not a frontend problem.

Example value: "abc123def456789012345678abcdef01"

`agenttel.client.page.business_criticality`¶

Property	Value
Type	`string`
Set by	`@agenttel/web` browser SDK (from route configuration)
Appears on	Span attributes (on page-scoped spans)
Default	Not set (attribute absent when business criticality is not configured for the route)

Why: Not all pages are equally important. The checkout page directly impacts revenue, while the blog page impacts engagement but not transactions. Business criticality lets the agent prioritize frontend issues the same way agenttel.topology.tier prioritizes backend services -- revenue-impacting pages get immediate attention.

Use case: Agent receives anomaly alerts from both the checkout page (criticality=revenue) and the help center page (criticality=engagement). It pages on-call for the checkout page issue immediately because errors there directly lose revenue, while sending a Slack notification for the help center issue.

Example value: "revenue"

Java Constant Reference¶

All backend attribute keys are defined as typed AttributeKey<T> constants in io.agenttel.api.attributes.AgentTelAttributes. Agentic attributes are in io.agenttel.api.attributes.AgenticAttributes. GenAI attributes have additional constants in io.agenttel.genai.conventions.AgentTelGenAiAttributes and io.agenttel.genai.conventions.GenAiAttributes.

Using these constants instead of raw strings provides compile-time type safety:

import io.agenttel.api.attributes.AgentTelAttributes;

// Type-safe attribute access
Double p50 = span.getAttribute(AgentTelAttributes.BASELINE_LATENCY_P50_MS);  // Double
String tier = span.getAttribute(AgentTelAttributes.TOPOLOGY_TIER);           // String
Boolean retryable = span.getAttribute(AgentTelAttributes.DECISION_RETRYABLE); // Boolean
Long retryAfter = span.getAttribute(AgentTelAttributes.DECISION_RETRY_AFTER_MS); // Long

Constant Naming Convention¶

The constant name follows the pattern: CATEGORY_FIELD_NAME

Namespace	Constant Prefix	Example
`agenttel.topology.*`	`TOPOLOGY_`	`TOPOLOGY_TIER`
`agenttel.baseline.*`	`BASELINE_`	`BASELINE_LATENCY_P50_MS`
`agenttel.decision.*`	`DECISION_`	`DECISION_RETRYABLE`
`agenttel.anomaly.*`	`ANOMALY_`	`ANOMALY_DETECTED`
`agenttel.error.*`	`ERROR_`	`ERROR_CATEGORY`
`agenttel.cause.*`	`CAUSE_`	`CAUSE_HINT`
`agenttel.severity.*`	`SEVERITY_`	`SEVERITY_BUSINESS_IMPACT`
`agenttel.correlation.*`	`CORRELATION_`	`CORRELATION_LIKELY_CAUSE`
`agenttel.slo.*`	`SLO_`	`SLO_BUDGET_REMAINING`
`agenttel.deployment.*`	`DEPLOYMENT_`	`DEPLOYMENT_VERSION`
`agenttel.genai.*`	`GENAI_`	`GENAI_COST_USD`
`agenttel.agent.*`	`AGENT_`	`AGENT_ROLE`
`agenttel.session.*`	`SESSION_`	`SESSION_ID`
`agenttel.circuit_breaker.*`	`CIRCUIT_BREAKER_`	`CIRCUIT_BREAKER_NEW_STATE`
`agenttel.agentic.*`	Various (e.g., `AGENT_NAME`)	`AgenticAttributes.AGENT_NAME`

Attribute Lifecycle Summary¶

The following table summarizes when and where each category of attributes is set:

Category	Set By	Set When	Appears On
Topology	`AgentTelResourceProvider`	SDK initialization (once per service)	Resource attributes
Baselines	`AgentTelSpanProcessor`	`onStart()` for every span of a registered operation	Span attributes
Baseline Confidence	`AgentTelEnrichingSpanExporter`	Export time	Span attributes
Decisions	`AgentTelSpanProcessor`	`onStart()` for every span of a registered operation	Span attributes
Anomaly	`AgentTelSpanProcessor`	`onEnd()` when deviation from baseline is detected	Span attributes
Error Classification	`AgentTelEnrichingSpanExporter`	Export time, for error spans only	Span attributes
Causality	`AgentTelEnrichingSpanExporter`	Export time, for error/anomalous spans	Span attributes
Severity	`AgentTelEnrichingSpanExporter`	Export time, for error/anomalous spans	Span attributes
Change Correlation	`ChangeCorrelationEngine`	During incident context construction	Incident context
SLO	`AgentTelSpanProcessor` (via `SloTracker`)	`onEnd()` for spans with registered SLOs	Span attributes
Deployment	`DeploymentEventEmitter`	Service startup	Event attributes
GenAI	GenAI wrappers + `CostEnrichingSpanExporter`	Span creation (wrappers) and export (cost)	Span attributes
Agent Identity	`AgentActionTracker`	Agent action execution	Span attributes
Sessions	`SessionManager`	Session creation	Span attributes
Circuit Breaker	`AgentTelEventEmitter`	Circuit breaker state transition	Event attributes
Agentic	`agenttel-agentic` module	Agent invocations, steps, tool calls, orchestrations	Span attributes
Frontend	`@agenttel/web` SDK	Various (page load, API call, interaction, journey)	Span + Resource attributes

Attribute Dictionary¶

Alphabetical Index¶

Topology¶

Detailed Reference¶

agenttel.topology.tier¶

agenttel.topology.dependencies¶

agenttel.topology.consumers¶

Baselines¶

Detailed Reference¶

agenttel.baseline.latency_p50_ms¶

agenttel.baseline.confidence¶

agenttel.baseline.source¶

Decisions¶

Detailed Reference¶

agenttel.decision.retryable¶

agenttel.decision.escalation_level¶

agenttel.decision.fallback_description¶

Anomaly¶

Detailed Reference¶

agenttel.anomaly.pattern¶

agenttel.anomaly.score¶

Error Classification¶

Detailed Reference¶

agenttel.error.category¶

agenttel.error.root_exception¶

Causality¶

Detailed Reference¶

agenttel.cause.hint¶

agenttel.cause.category¶

Severity¶

Detailed Reference¶

agenttel.severity.business_impact¶

agenttel.severity.impact_scope¶

Change Correlation¶

Detailed Reference¶

agenttel.correlation.likely_cause¶

agenttel.correlation.confidence¶

SLO¶

Detailed Reference¶

agenttel.slo.budget_remaining¶

agenttel.slo.burn_rate¶

Deployment¶

Detailed Reference¶

agenttel.deployment.previous_version¶

agenttel.deployment.strategy¶

GenAI¶

AgentTel GenAI Extensions¶

Standard OTel GenAI Attributes¶

Detailed Reference¶

agenttel.genai.cost_usd¶

agenttel.genai.rag_relevance_score_avg¶

gen_ai.response.finish_reasons¶

Agent Identity¶

Detailed Reference¶

agenttel.agent.role¶

Sessions¶

Detailed Reference¶

agenttel.session.id¶

Circuit Breaker¶

Detailed Reference¶

agenttel.circuit_breaker.new_state¶

Agentic¶

Frontend¶

Resource Attributes¶

Page and Route Attributes¶

Baseline Attributes¶

Decision Attributes¶

Anomaly Attributes¶

Journey Attributes¶

Interaction Attributes¶

Correlation Attributes¶

Detailed Reference¶

agenttel.client.anomaly.pattern¶

agenttel.client.correlation.backend_trace_id¶

agenttel.client.page.business_criticality¶

Java Constant Reference¶

Constant Naming Convention¶

Attribute Lifecycle Summary¶

`agenttel.topology.tier`¶

`agenttel.topology.dependencies`¶

`agenttel.topology.consumers`¶

`agenttel.baseline.latency_p50_ms`¶

`agenttel.baseline.confidence`¶

`agenttel.baseline.source`¶

`agenttel.decision.retryable`¶

`agenttel.decision.escalation_level`¶

`agenttel.decision.fallback_description`¶

`agenttel.anomaly.pattern`¶

`agenttel.anomaly.score`¶

`agenttel.error.category`¶

`agenttel.error.root_exception`¶

`agenttel.cause.hint`¶

`agenttel.cause.category`¶

`agenttel.severity.business_impact`¶

`agenttel.severity.impact_scope`¶

`agenttel.correlation.likely_cause`¶

`agenttel.correlation.confidence`¶

`agenttel.slo.budget_remaining`¶

`agenttel.slo.burn_rate`¶

`agenttel.deployment.previous_version`¶

`agenttel.deployment.strategy`¶

`agenttel.genai.cost_usd`¶

`agenttel.genai.rag_relevance_score_avg`¶

`gen_ai.response.finish_reasons`¶

`agenttel.agent.role`¶

`agenttel.session.id`¶

`agenttel.circuit_breaker.new_state`¶

`agenttel.client.anomaly.pattern`¶

`agenttel.client.correlation.backend_trace_id`¶

`agenttel.client.page.business_criticality`¶