Event Catalog¶
Complete reference for every structured event AgentTel emits via the OpenTelemetry Logs API.
Overview¶
AgentTel emits events for significant state changes that AI agents should react to. Events are
emitted as OTel log records with a structured JSON body using the AgentTelEventEmitter class,
which wraps the OTel Logs Bridge API.
All events use:
event.nameattribute (OTel standard) to identify the event type- Severity level: INFO, WARN, or ERROR
- JSON-serialized body with event-specific fields
Event names are defined as constants in io.agenttel.api.events.AgentTelEvents.
Event Summary¶
| Event | Severity | Trigger | Agent Action |
|---|---|---|---|
agenttel.anomaly.detected |
WARN | Span deviates from baseline | Investigate via get_incident_context |
agenttel.slo.budget_alert |
WARN / ERROR | SLO budget crosses threshold | Check SLO compliance via get_slo_report |
agenttel.dependency.state_change |
WARN | Dependency health transitions | Correlate with operation health |
agenttel.circuit_breaker.state_change |
WARN | Circuit breaker transitions | Monitor self-protection status |
agenttel.deployment.info |
INFO | Application starts | Record for change correlation |
How Events Are Emitted¶
All events flow through AgentTelEventEmitter, which serializes the body map to JSON and
emits an OTel LogRecord:
otelLogger.logRecordBuilder()
.setSeverity(severity)
.setAttribute(AttributeKey.stringKey("event.name"), eventName)
.setBody(bodyJson)
.emit();
Events are transported through the standard OTel Logs pipeline, meaning they appear in any configured OTel log exporter (OTLP, console, etc.).
agenttel.anomaly.detected¶
Emitted when a span's behavior deviates significantly from baseline. The anomaly detector uses z-score comparison: if the absolute z-score of the observed latency exceeds the configured threshold (default: 3.0), the span is classified as anomalous. Pattern matching may also trigger this event independently when higher-level incident patterns (cascade failure, thundering herd, etc.) are detected.
Trigger
When a span's latency z-score exceeds the configured threshold (default: 3.0), OR when
the PatternMatcher detects a known incident pattern from accumulated span data.
Emitter¶
- Class:
AgentTelEventEmitter(called fromAgentTelSpanProcessor.onEnd()) - Constant:
AgentTelEvents.ANOMALY_DETECTED
Fields¶
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
operation |
string | Yes | The operation that triggered the anomaly | "POST /api/payments" |
latency_ms |
double | Yes | Actual latency observed for this span (ms) | 312.0 |
anomaly_score |
double | No | Severity score normalized to 0.0-1.0, calculated as min(1.0, abs(z_score) / 4.0) |
0.85 |
z_score |
double | No | Standard deviations from the rolling baseline mean | 4.2 |
pattern |
string | No | Incident pattern identifier (present when pattern matching triggers the event) | "latency_degradation" |
pattern_description |
string | No | Human-readable description of the detected pattern | "Sustained latency increase beyond baseline" |
When the event is triggered by z-score anomaly detection, anomaly_score and z_score are
present but pattern and pattern_description are absent. When triggered by pattern matching,
pattern and pattern_description are present but anomaly_score and z_score are absent.
Both sets of fields may be present in a combined event.
Incident Patterns¶
The pattern field uses one of the following values, defined in IncidentPattern:
| Pattern | Value | Description |
|---|---|---|
| Cascade Failure | cascade_failure |
Multiple dependent services failing simultaneously |
| Latency Degradation | latency_degradation |
Sustained latency increase beyond baseline |
| Error Rate Spike | error_rate_spike |
Sudden increase in error rate beyond baseline |
| Memory Leak | memory_leak |
Steadily increasing latency with increasing error rate |
| Thundering Herd | thundering_herd |
Sudden spike in request rate after recovery |
| Cold Start | cold_start |
High latency on first requests after deployment |
Example Payload¶
Z-score triggered anomaly:
{
"event.name": "agenttel.anomaly.detected",
"severity": "WARN",
"body": {
"operation": "POST /api/payments",
"latency_ms": 312.0,
"anomaly_score": 0.85,
"z_score": 4.2
}
}
Pattern-matching triggered anomaly:
{
"event.name": "agenttel.anomaly.detected",
"severity": "WARN",
"body": {
"operation": "POST /api/payments",
"latency_ms": 312.0,
"pattern": "latency_degradation",
"pattern_description": "Sustained latency increase beyond baseline"
}
}
Agent Workflow¶
When this event fires, the recommended agent workflow is:
- Call
get_incident_contextwith the operation name to get full diagnosis including baselines, dependencies, and change correlation - Call
get_error_analysiswith the operation name to understand error category breakdown - Call
get_playbookwith the pattern name to get a structured remediation plan - If the playbook confidence is high enough, call
list_remediation_actionsto see available automated fixes - Execute remediation via
execute_remediationif the action does not require human approval, or escalate otherwise
Related¶
- Span attributes:
agenttel.anomaly.detected(boolean),agenttel.anomaly.pattern(string),agenttel.anomaly.score(double),agenttel.anomaly.latency_z_score(double) - MCP tools:
get_incident_context,get_error_analysis,get_playbook,execute_remediation - Configuration:
agenttel.anomaly-detection.z-score-threshold(default:3.0) - Source:
AgentTelSpanProcessorinagenttel-core
agenttel.slo.budget_alert¶
Emitted when an SLO's error budget crosses a threshold. The SloTracker checks all registered
SLOs after every span and fires an alert when budget remaining falls below 50%, 25%, or 10%.
The OTel severity escalates with the alert level: INFO at 50%, WARN at 25%, ERROR at 10%.
Trigger
When the remaining error budget for any registered SLO drops below 50%, 25%, or 10%.
Emitter¶
- Class:
AgentTelEventEmitter(called fromAgentTelSpanProcessor.emitSloAlerts()viaSloTracker) - Constant:
AgentTelEvents.SLO_BUDGET_ALERT
Fields¶
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
slo_name |
string | Yes | SLO identifier as registered in configuration | "payment-availability" |
severity |
string | Yes | Alert tier: CRITICAL (<=10%), WARNING (<=25%), INFO (<=50%) |
"WARNING" |
budget_remaining |
double | Yes | Remaining budget as a fraction 0.0-1.0 | 0.22 |
burn_rate |
double | Yes | Current burn rate: how fast the budget is being consumed. A value of 1.0 means the budget will be exactly exhausted at the end of the window. | 0.78 |
Alert Severity Tiers¶
| Tier | Budget Remaining | OTel Severity | Recommended Response |
|---|---|---|---|
INFO |
<= 50% | INFO | Monitor, no immediate action |
WARNING |
<= 25% | WARN | Investigate proactively |
CRITICAL |
<= 10% | ERROR | Immediate investigation required |
Example Payload¶
{
"event.name": "agenttel.slo.budget_alert",
"severity": "WARN",
"body": {
"slo_name": "payment-availability",
"severity": "WARNING",
"budget_remaining": 0.22,
"burn_rate": 0.78
}
}
Critical budget exhaustion:
{
"event.name": "agenttel.slo.budget_alert",
"severity": "ERROR",
"body": {
"slo_name": "payment-availability",
"severity": "CRITICAL",
"budget_remaining": 0.05,
"burn_rate": 1.42
}
}
Agent Workflow¶
When this event fires, the recommended agent workflow depends on the severity tier:
INFO (budget <= 50%):
1. Call get_slo_report to review overall SLO posture
2. Log the observation; no immediate action required
WARNING (budget <= 25%):
1. Call get_slo_report to confirm budget trajectory
2. Call get_trend_analysis for the affected operation to understand if the trend is
worsening or stabilizing
3. If worsening, call get_incident_context to begin proactive investigation
CRITICAL (budget <= 10%):
1. Call get_incident_context immediately for the affected operation
2. Call get_error_analysis to identify the primary error contributors
3. Call get_playbook for the relevant pattern
4. Execute remediation or escalate to on-call
Related¶
- Span attributes:
agenttel.slo.name(string),agenttel.slo.target(double),agenttel.slo.budget_remaining(double),agenttel.slo.burn_rate(double) - MCP tools:
get_slo_report,get_trend_analysis,get_incident_context - Configuration: SLO definitions in
agenttel.slosconfiguration block - Source:
SloTrackerinagenttel-core
agenttel.dependency.state_change¶
Emitted when a dependency's observed health transitions from one state to another (e.g.,
healthy to degraded, degraded to unhealthy, or back to healthy). State transitions are tracked
by the CausalityTracker, which monitors client-span error rates per dependency.
Trigger
When the observed health state of a dependency changes based on client-span error rates.
Emitter¶
- Class:
AgentTelEventEmitter - Constant:
AgentTelEvents.DEPENDENCY_STATE_CHANGE
Fields¶
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
dependency |
string | Yes | Dependency name as declared in configuration | "postgres" |
previous_state |
string | Yes | Previous health state: healthy, degraded, or unhealthy |
"healthy" |
current_state |
string | Yes | New health state | "degraded" |
error_rate |
double | Yes | Current error rate observed for this dependency (0.0-1.0) | 0.15 |
Example Payload¶
Dependency degradation:
{
"event.name": "agenttel.dependency.state_change",
"severity": "WARN",
"body": {
"dependency": "postgres",
"previous_state": "healthy",
"current_state": "degraded",
"error_rate": 0.15
}
}
Dependency recovery:
{
"event.name": "agenttel.dependency.state_change",
"severity": "WARN",
"body": {
"dependency": "postgres",
"previous_state": "degraded",
"current_state": "healthy",
"error_rate": 0.01
}
}
Agent Workflow¶
When this event fires, the recommended agent workflow is:
- Call
get_service_healthto see the full dependency map and which operations are affected - Correlate with any concurrent
agenttel.anomaly.detectedevents -- if postgres goes degraded andPOST /api/paymentserrors spike simultaneously, the dependency is likely the root cause - Call
get_incident_contextfor any operations that depend on the affected dependency - If the dependency transitions to
unhealthy, check whether a circuit breaker is protecting the system by looking for a correspondingagenttel.circuit_breaker.state_changeevent
Related¶
- MCP tools:
get_service_health,get_incident_context,get_cross_stack_context - Configuration: Dependency declarations via
@DeclareDependencyannotation oragenttel.dependenciesYAML block - Source:
CausalityTrackerinagenttel-core
agenttel.circuit_breaker.state_change¶
Emitted when a circuit breaker transitions between states. Circuit breakers protect operations from cascading failures by temporarily stopping calls to an unhealthy dependency.
Trigger
When a circuit breaker transitions state: closed to open, open to half_open, or half_open to closed/open.
Emitter¶
- Class:
AgentTelEventEmitter - Constant:
AgentTelEvents.CIRCUIT_BREAKER_STATE_CHANGE
Fields¶
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
name |
string | Yes | Circuit breaker identifier | "postgres-cb" |
previous_state |
string | Yes | Previous state: closed, open, or half_open |
"closed" |
new_state |
string | Yes | New state after transition | "open" |
failure_count |
long | Yes | Accumulated failure count that triggered the transition | 15 |
dependency |
string | Yes | Name of the associated dependency | "postgres" |
Circuit Breaker States¶
| State | Meaning |
|---|---|
closed |
Normal operation. Requests flow through to the dependency. |
open |
Dependency is unhealthy. Requests are rejected immediately (fast-fail). |
half_open |
Testing recovery. A limited number of requests are allowed through to probe the dependency. |
Example Payload¶
Circuit breaker opens due to failures:
{
"event.name": "agenttel.circuit_breaker.state_change",
"severity": "WARN",
"body": {
"name": "postgres-cb",
"previous_state": "closed",
"new_state": "open",
"failure_count": 15,
"dependency": "postgres"
}
}
Circuit breaker starts recovery probe:
{
"event.name": "agenttel.circuit_breaker.state_change",
"severity": "WARN",
"body": {
"name": "postgres-cb",
"previous_state": "open",
"new_state": "half_open",
"failure_count": 15,
"dependency": "postgres"
}
}
Circuit breaker recovery succeeds:
{
"event.name": "agenttel.circuit_breaker.state_change",
"severity": "WARN",
"body": {
"name": "postgres-cb",
"previous_state": "half_open",
"new_state": "closed",
"failure_count": 0,
"dependency": "postgres"
}
}
Agent Workflow¶
When this event fires, the recommended agent workflow is:
- If the transition is to
open: the system is self-protecting. Callget_service_healthto assess impact on dependent operations. No remediation is needed for the circuit breaker itself -- focus on the underlying dependency. - If the transition is to
half_open: recovery is being tested. Monitor for a subsequent transition back toclosed(success) oropen(failure). - If the transition is to
closed: the dependency has recovered. Callget_trend_analysisto verify the recovery is stable and not a brief respite. - Correlate with
agenttel.dependency.state_changeevents for the same dependency to build a complete timeline of the incident.
Related¶
- MCP tools:
get_service_health,get_incident_context,get_trend_analysis - Configuration:
circuitBreaker = trueon@DeclareDependencyannotation or in YAML dependency declaration - Source:
AgentTelEventEmitterinagenttel-core
agenttel.deployment.info¶
Emitted once at application startup when agenttel.deployment.emit-on-startup is true
(the default). Captures deployment metadata that the ChangeCorrelationEngine uses to
correlate anomalies with recent deployments.
Trigger
At application startup, if deployment event emission is enabled.
Emitter¶
- Class:
DeploymentEventEmitter - Constant:
AgentTelEvents.DEPLOYMENT_INFO
Fields¶
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
version |
string | No | Application version | "2.3.1" |
commit_sha |
string | No | Git commit SHA of the deployed code | "a1b2c3d" |
previous_version |
string | No | Version of the previous deployment | "2.3.0" |
strategy |
string | No | Deployment strategy used | "blue-green" |
timestamp |
string | Yes | ISO 8601 timestamp of when the event was emitted | "2026-03-06T14:30:00Z" |
All fields except timestamp are optional. Only non-empty values are included in the body.
The timestamp field is always present and set to Instant.now().toString() at emission time.
Example Payload¶
Full deployment event:
{
"event.name": "agenttel.deployment.info",
"severity": "INFO",
"body": {
"version": "2.3.1",
"commit_sha": "a1b2c3d",
"previous_version": "2.3.0",
"strategy": "blue-green",
"timestamp": "2026-03-06T14:30:00Z"
}
}
Minimal deployment event (only version known):
{
"event.name": "agenttel.deployment.info",
"severity": "INFO",
"body": {
"version": "2.3.1",
"timestamp": "2026-03-06T14:30:00Z"
}
}
Agent Workflow¶
When this event fires, the recommended agent workflow is:
- Record the deployment metadata for future correlation
- If an
agenttel.anomaly.detectedevent fires within minutes of a deployment, callget_incident_context-- theChangeCorrelationEnginewill automatically flag the deployment as a probable cause - Compare
versionandprevious_versionto understand whether this is a major or minor release, which affects rollback risk assessment
Related¶
- Span attributes:
agenttel.deployment.id,agenttel.deployment.version,agenttel.deployment.commit_sha,agenttel.deployment.previous_version,agenttel.deployment.strategy,agenttel.deployment.timestamp - MCP tools:
get_incident_context(includes change correlation data) - Configuration:
agenttel.deployment.emit-on-startup(default:true),agenttel.deployment.version,agenttel.deployment.commit-sha,agenttel.deployment.previous-version,agenttel.deployment.strategy - Source:
DeploymentEventEmitterinagenttel-core
Configuration Reference¶
Events are controlled through the standard AgentTel configuration. Below are the properties that affect event emission.
Anomaly Detection¶
agenttel:
anomaly-detection:
z-score-threshold: 3.0 # Z-score above which a span is anomalous (default: 3.0)
Lowering the threshold produces more events (higher sensitivity); raising it reduces noise.
SLO Definitions¶
SLO budget alerts require SLOs to be registered. Registration happens through the
agenttel.slos configuration block:
agenttel:
slos:
payment-availability:
operation-name: "POST /api/payments"
target: 0.999 # 99.9% availability target
Deployment Events¶
agenttel:
deployment:
emit-on-startup: true # Emit deployment.info at startup (default: true)
version: "2.3.1"
commit-sha: "a1b2c3d"
previous-version: "2.3.0"
strategy: "blue-green"
Consuming Events¶
OTel Log Exporter¶
Events are emitted as OTel log records and are exported through the configured OTel log exporter. To see events in the console during development:
To export to an OTLP-compatible backend:
Programmatic Subscription¶
Events flow through the standard OTel Logs pipeline. To process events programmatically,
configure a custom LogRecordProcessor in your OTel SDK setup and filter on the event.name
attribute.
MCP Agent Integration¶
AI agents connected via the MCP server do not receive events directly through the event pipeline. Instead, events trigger state changes that agents observe through MCP tool calls:
- Anomaly events update the
ServiceHealthAggregator, visible viaget_service_health - SLO budget alerts update
SloTrackerstate, visible viaget_slo_report - Dependency state changes update the
CausalityTracker, visible viaget_incident_context
For real-time event-driven agent workflows, configure your OTel log exporter to push events to a message queue or webhook that your agent framework consumes.