Skip to content

Agent Observability

Instrument your AI agent's lifecycle — invocations, reasoning steps, tool calls, task decomposition, handoffs, human checkpoints, code execution, evaluation, RAG, memory access, and error classification — with structured OpenTelemetry spans.

Overview

Feature What's Tracked Key Span
Agent Identity Name, type, framework, version invoke_agent
Invocation Lifecycle Goal, status, step count, max steps invoke_agent
Reasoning Steps Thought, action, observation, evaluation, revision agenttel.agentic.step
Tool Calls Tool name, status (success/error/timeout) agenttel.agentic.tool_call
Task Decomposition Nested tasks with depth and parent tracking agenttel.agentic.task
Handoffs Source agent, target agent, reason, chain depth agenttel.agentic.handoff
Human Checkpoints Approval, feedback, correction, decision + wait time agenttel.agentic.human_input
Code Execution Language, exit code, sandbox status agenttel.agentic.code_execution
Evaluation Scorer name, criteria, score, feedback, eval type agenttel.agentic.evaluate
RAG Pipeline Retriever query, document count, relevance; reranker model agenttel.agentic.retriever / agenttel.agentic.reranker
Guardrails Triggered name, action (block/warn/log/escalate), reason agenttel.agentic.guardrail
Memory Access Read, write, delete, search operations agenttel.agentic.memory
Error Classification Source (llm/tool/agent/guardrail/timeout/network), retryable On invoke_agent

Dependencies

<dependency>
    <groupId>dev.agenttel</groupId>
    <artifactId>agenttel-agentic</artifactId>
    <version>0.2.0-alpha</version>
</dependency>
implementation("dev.agenttel:agenttel-agentic:0.2.0-alpha")
pip install agenttel

Quick Start

import io.agenttel.agentic.trace.AgentTracer;
import io.agenttel.agentic.AgentType;
import io.agenttel.agentic.StepType;

// 1. Create a tracer
AgentTracer tracer = AgentTracer.create(openTelemetry)
    .agentName("incident-responder")
    .agentType(AgentType.SINGLE)
    .framework("custom")
    .build();

// 2. Start an invocation
try (AgentInvocation invocation = tracer.invoke("Diagnose high latency")) {
    // 3. Record reasoning steps
    invocation.step(StepType.THOUGHT, "Need to check service health metrics");

    // 4. Make a tool call
    try (ToolCallScope tool = invocation.toolCall("get_service_health")) {
        var health = mcpClient.call("get_service_health", params);
        tool.success();
    }

    // 5. Record another step
    invocation.step(StepType.OBSERVATION, "Latency elevated on POST /api/payments");

    // 6. Complete the invocation
    invocation.complete(true);
}
from agenttel.agentic.tracer import AgentTracer
from agenttel.enums import AgentType, StepType

tracer = (AgentTracer.create(otel)
    .agent_name("incident-responder")
    .agent_type(AgentType.SINGLE)
    .framework("custom")
    .build())

with tracer.invoke("Diagnose high latency") as invocation:
    invocation.step(StepType.THOUGHT, "Need to check service health metrics")

    with invocation.tool_call("get_service_health") as tool:
        health = mcp_client.call("get_service_health", params)
        tool.set_result(health)

    invocation.step(StepType.OBSERVATION, "Latency elevated on POST /api/payments")
    invocation.complete(goal_achieved=True)

Span output:

invoke_agent
  agenttel.agentic.agent.name          = "incident-responder"
  agenttel.agentic.agent.type          = "single"
  agenttel.agentic.invocation.goal     = "Diagnose high latency"
  agenttel.agentic.invocation.status   = "success"
  agenttel.agentic.invocation.steps    = 3
  agenttel.agentic.quality.goal_achieved = true
  └── agenttel.agentic.step
      agenttel.agentic.step.number     = 1
      agenttel.agentic.step.type       = "thought"
  └── agenttel.agentic.tool_call
      agenttel.agentic.step.tool_name  = "get_service_health"
      agenttel.agentic.step.tool_status = "success"
  └── agenttel.agentic.step
      agenttel.agentic.step.number     = 3
      agenttel.agentic.step.type       = "observation"

Agent Identity

Configure agent identity via the builder, YAML config, or @AgentMethod annotation. These attributes appear on every invoke_agent span.

Programmatic (Builder)

AgentTracer tracer = AgentTracer.create(openTelemetry)
    .agentName("code-reviewer")
    .agentType(AgentType.WORKER)
    .framework("langchain4j")
    .agentVersion("2.1.0")
    .build();
tracer = (AgentTracer.create(otel)
    .agent_name("code-reviewer")
    .agent_type(AgentType.WORKER)
    .framework("langchain")
    .build())

YAML Configuration

Define agent identity and guardrails in application.yml. YAML config takes priority over annotations and builder defaults.

agenttel:
  agentic:
    loop-threshold: 3              # global default
    default-max-steps: 50          # global default
    agents:
      incident-responder:
        type: react
        framework: langchain4j
        version: "2.0"
        max-steps: 100
        loop-threshold: 5
        cost-budget-usd: 2.0
      code-reviewer:
        type: worker
        framework: custom
        max-steps: 20

When AgentTracer.invoke("incident-responder", goal) is called, the config registry automatically applies the agent's type, framework, version, and maxSteps from YAML.

@AgentMethod Annotation

Annotate a method to automatically wrap it in an AgentInvocation scope. No manual AgentTracer calls needed.

@AgentMethod(name = "incident-responder", type = "react", maxSteps = 100)
public IncidentReport diagnose(String incidentId) {
    // Method body is automatically wrapped in AgentInvocation
    // The span gets agent.name, agent.type, invocation.goal, invocation.status
}

Config Priority

YAML config > @AgentMethod annotation > programmatic AgentTracer.Builder defaults. When YAML defines an agent, its values take precedence over annotation attributes.

AgentType Enum

Value Description
single Standalone agent operating independently
orchestrator Coordinates other agents in a multi-agent system
worker Executes tasks assigned by an orchestrator
evaluator Evaluates output quality of other agents
critic Provides feedback to improve agent output
router Routes requests to the appropriate agent

Invocation Lifecycle

An invocation represents a complete goal-directed execution of an agent.

// Basic invocation
try (AgentInvocation inv = tracer.invoke("Analyze customer feedback")) {
    // ... agent logic ...
    inv.complete(true);  // goal achieved
}

// Invoke a named sub-agent
try (AgentInvocation inv = tracer.invoke("summarizer", "Summarize findings")) {
    inv.complete(true);
}

// Invoke with explicit parent context
try (AgentInvocation inv = tracer.invoke("analyst", "Check metrics", parentCtx)) {
    inv.complete(InvocationStatus.ESCALATED);
}
# Basic invocation
with tracer.invoke("Analyze customer feedback") as inv:
    # ... agent logic ...
    inv.complete(goal_achieved=True)

InvocationStatus Enum

Value Description
success Goal was achieved
failure Agent could not achieve the goal
timeout Invocation exceeded time limit
escalated Agent escalated to a human or higher authority
human_intervened A human took over the task

Max Steps Guardrail

try (AgentInvocation inv = tracer.invoke("Process batch")) {
    inv.maxSteps(50);  // Sets agenttel.agentic.invocation.max_steps

    while (hasMoreWork()) {
        if (inv.stepCount() >= 50) {
            inv.complete(InvocationStatus.TIMEOUT);
            break;
        }
        inv.step(StepType.ACTION, "Process next item");
    }
    inv.complete(true);
}

Reasoning Steps

Steps represent the thought-action-observation loop of an agent.

Fire-and-Forget Steps

// Simple step — records and immediately ends the span
invocation.step(StepType.THOUGHT, "Analyzing error patterns");
invocation.step(StepType.ACTION, "Querying metrics API");
invocation.step(StepType.OBSERVATION, "Error rate is 5.2%");

Scoped Steps

// Scoped step — caller controls when the span ends
try (StepScope step = invocation.beginStep(StepType.ACTION)) {
    // Long-running action
    var result = callExternalApi();
    step.span().addEvent("API returned " + result.status());
}

// Scoped step with iteration number (for loops)
for (int i = 0; i < maxRetries; i++) {
    try (StepScope step = invocation.beginStep(StepType.REVISION, i)) {
        // Revise output based on feedback
    }
}

StepType Enum

Value Description
thought Internal reasoning or planning
action Executing an action (API call, tool use)
observation Processing results of an action
evaluation Assessing quality of output
revision Revising output based on feedback

Tool Calls

Track external tool invocations with success/error/timeout status.

try (ToolCallScope tool = invocation.toolCall("search_documents")) {
    try {
        var results = searchService.search(query);
        tool.success();
    } catch (TimeoutException e) {
        tool.timeout();
    } catch (Exception e) {
        tool.error(e.getMessage());
    }
}
with invocation.tool_call("search_documents") as tool:
    try:
        results = search_service.search(query)
        tool.set_result(results)
    except TimeoutError:
        tool.set_error("timeout")
    except Exception as e:
        tool.set_error(str(e))

The ToolCallScope sets agenttel.agentic.step.tool_name and agenttel.agentic.step.tool_status.


Task Decomposition

Track hierarchical task breakdown with depth and parent tracking.

try (TaskScope mainTask = invocation.task("Analyze codebase")) {
    // Nested sub-tasks
    try (TaskScope subTask = mainTask.subTask("Parse source files")) {
        // ...
        subTask.complete();
    }

    try (TaskScope subTask = mainTask.subTask("Extract dependencies")) {
        // ...
        subTask.complete();
    }

    mainTask.complete();
}
with invocation.task("Analyze codebase") as main_task:
    with main_task.subtask("Parse source files") as sub:
        ...
    with main_task.subtask("Extract dependencies") as sub:
        ...

Each TaskScope gets a unique task.id, tracks task.depth (0 for root, incrementing for children), and records task.parent_id for nested tasks.


Handoffs

Track delegation from one agent to another.

// Basic handoff
try (HandoffScope handoff = invocation.handoff("specialist-agent", "Requires domain expertise")) {
    // The target agent executes within this scope
    try (AgentInvocation specialist = tracer.invoke("specialist-agent", "Handle domain task")) {
        specialist.complete(true);
    }
}

// Handoff with chain depth tracking
try (HandoffScope handoff = invocation.handoff("escalation-agent", "Budget exceeded", 2)) {
    // chain_depth=2 means this is the 3rd agent in the chain
}
with invocation.handoff("specialist-agent", "Requires domain expertise") as handoff:
    with tracer.invoke("specialist-agent", "Handle domain task") as specialist:
        specialist.complete(goal_achieved=True)

Human Checkpoints

Track human-in-the-loop interactions with automatic wait time computation.

try (HumanCheckpointScope checkpoint =
        invocation.humanCheckpoint(HumanCheckpointType.APPROVAL, "Approve deployment rollback")) {

    boolean approved = awaitHumanApproval();
    checkpoint.decision(approved ? "approved" : "rejected");
    // wait_ms is automatically computed from scope open to decision()
}
with invocation.human_checkpoint(HumanCheckpointType.APPROVAL, "Approve deployment rollback") as checkpoint:
    approved = await_human_approval()
    checkpoint.decision("approved" if approved else "rejected")

HumanCheckpointType Enum

Value Description
approval Binary yes/no approval gate
feedback Free-form feedback from human
correction Human corrects agent output
decision Human makes a multi-option choice

Code Execution

Track code execution by agents with sandbox status.

// Basic code execution
try (CodeExecutionScope exec = invocation.codeExecution("python")) {
    var result = sandbox.run(code);
    exec.complete(result.exitCode());
}

// Sandboxed execution
try (CodeExecutionScope exec = invocation.codeExecution("javascript", true)) {
    var result = sandbox.run(code);
    exec.complete(0);  // exit code
}
with invocation.code_execution("python") as exec_scope:
    result = sandbox.run(code)
    exec_scope.complete(result.exit_code)

The scope tracks code.language, code.status (success/error), code.exit_code, and code.sandboxed.


Evaluation and Scoring

First-class evaluation spans for output quality assessment.

// Custom evaluation
try (EvaluationScope eval = invocation.evaluate("quality-scorer", "relevance")) {
    double score = scorer.evaluate(output);
    eval.score(score);
    eval.feedback("Response addresses the core question but misses edge cases");
}

// With eval type
try (EvaluationScope eval =
        invocation.evaluate("gpt4-judge", "accuracy", EvalType.LLM_JUDGE)) {
    eval.score(0.85);
}
with invocation.evaluate("quality-scorer", "relevance") as eval_scope:
    score = scorer.evaluate(output)
    eval_scope.score(score)
    eval_scope.feedback("Response addresses the core question but misses edge cases")

EvalType Enum

Value Description
llm_judge LLM-based evaluation (e.g., GPT-4 as judge)
heuristic Rule-based or algorithmic scoring
human Human evaluator scoring
custom Custom evaluation method

RAG Pipeline

Track retrieval and reranking in RAG workflows.

// Retriever
try (RetrieverScope ret = invocation.retrieve("What is the refund policy?", "pgvector", 10)) {
    var docs = vectorStore.search(query, 10);
    ret.complete(docs.size(), avgRelevanceScore(docs), minRelevanceScore(docs));
}

// Reranker
try (RerankerScope rerank = invocation.rerank("cross-encoder-v1", 10)) {
    var reranked = reranker.rerank(docs, query);
    rerank.complete(reranked.size(), reranked.get(0).score());
}
with invocation.retrieve("What is the refund policy?", "pgvector", 10) as ret:
    docs = vector_store.search(query, 10)
    ret.complete(len(docs), avg_relevance(docs), min_relevance(docs))

with invocation.rerank("cross-encoder-v1", 10) as rerank:
    reranked = reranker.rerank(docs, query)
    rerank.complete(len(reranked), reranked[0].score)

The retriever scope sets retrieval.query, retrieval.store_type, retrieval.top_k, retrieval.document_count, and relevance scores. The reranker scope sets reranker.model, reranker.input_documents, reranker.output_documents, and reranker.top_score.


Error Classification

Classify errors by source and retryability.

try (AgentInvocation inv = tracer.invoke("Process request")) {
    try {
        processRequest();
        inv.complete(true);
    } catch (TimeoutException e) {
        inv.classifyError(ErrorSource.TIMEOUT, "api_timeout", true);
        inv.complete(false);
    } catch (Exception e) {
        inv.classifyError(ErrorSource.AGENT, "unhandled_exception", false);
        inv.complete(false);
    }
}

ErrorSource Enum

Value Description
llm Error from the LLM provider
tool Error from a tool invocation
agent Error in agent logic
guardrail Error triggered by a guardrail
timeout Operation timed out
network Network connectivity error

Agent Capabilities and Conversation Tracking

try (AgentInvocation inv = tracer.invoke("Handle support query")) {
    // Declare available tools
    inv.tools(List.of("search_docs", "create_ticket", "send_email"));

    // Track system prompt version
    inv.systemPromptHash("sha256:abc123...");

    // Set conversation context
    inv.conversation("conv-456", 3);  // conversation ID, turn number
    inv.messageCount(12);

    inv.complete(true);
}

Guardrails

Record guardrail activations as child spans.

// On an invocation
invocation.guardrail("content-filter", GuardrailAction.BLOCK,
    "PII detected in output");

// Via GuardrailRecorder (standalone)
GuardrailRecorder recorder = new GuardrailRecorder(tracer);
recorder.record("budget-limit", GuardrailAction.ESCALATE,
    "Cost exceeded $10 threshold");
from agenttel.agentic.guardrail import GuardrailRecorder
from agenttel.enums import GuardrailAction

recorder = GuardrailRecorder()
recorder.record("content-filter", GuardrailAction.BLOCK, "PII detected in output")

See the Guardrails & Safety Guide for more detail on guardrails, loop detection, and quality tracking.


Memory Access

Track agent memory operations.

// Read from memory
tracer.memory(MemoryOperation.READ, "conversation_history", 5);

// Write to memory
tracer.memory(MemoryOperation.WRITE, "vector_store", 3);

// Search memory
tracer.memory(MemoryOperation.SEARCH, "knowledge_base", 10);
tracer.memory(MemoryOperation.READ, "conversation_history", 5)
tracer.memory(MemoryOperation.WRITE, "vector_store", 3)

MemoryOperation Enum

Value Description
read Reading from memory store
write Writing to memory store
delete Deleting from memory store
search Searching/querying memory store

Spring Boot Auto-Configuration

When agenttel-agentic is on the classpath, AgentTelAgenticAutoConfiguration automatically creates:

  • An AgentConfigRegistry bean (populated from agenttel.agentic.agents.* YAML config)
  • An AgentTracer bean (config-aware, inject and use directly)
  • An AgentMethodAspect bean (wraps @AgentMethod-annotated methods in invocation scopes)
  • An AgentCostAggregator bean (registered as an OTel SpanProcessor)

Programmatic Approach

Inject AgentTracer and call invoke() directly:

@Service
public class MyAgentService {

    private final AgentTracer agentTracer;

    public MyAgentService(AgentTracer agentTracer) {
        this.agentTracer = agentTracer;
    }

    public String processQuery(String query) {
        try (AgentInvocation inv = agentTracer.invoke("Process user query")) {
            inv.step(StepType.THOUGHT, "Analyzing query: " + query);
            // ... agent logic ...
            inv.complete(true);
            return result;
        }
    }
}

Annotation Approach

Use @AgentMethod for automatic invocation wrapping — no AgentTracer calls needed:

@Service
public class MyAgentService {

    @AgentMethod(name = "query-processor", type = "single")
    public String processQuery(String query) {
        // Automatically wrapped in AgentInvocation
        // On success: complete(true), on exception: span records error
        return doProcess(query);
    }
}

YAML-Only Approach

Combine @AgentMethod with YAML config for zero-code agent identity:

@AgentMethod(name = "query-processor")  // name only, rest from YAML
public String processQuery(String query) { ... }
agenttel:
  agentic:
    agents:
      query-processor:
        type: single
        framework: custom
        max-steps: 30

Tip

To customize the auto-configured AgentTracer, define your own @Bean AgentTracer — the auto-configuration backs off when a user-defined bean exists (@ConditionalOnMissingBean).


FastAPI Integration

When using the Python SDK with FastAPI, the AgentTracer is available through the engine:

from fastapi import FastAPI
from agenttel.fastapi import instrument_fastapi
from agenttel.agentic.tracer import AgentTracer
from agenttel.enums import AgentType, StepType

app = FastAPI()
engine = instrument_fastapi(app)

tracer = (AgentTracer.create(engine.tracer_provider)
    .agent_name("incident-responder")
    .agent_type(AgentType.SINGLE)
    .build())

@app.post("/api/diagnose")
async def diagnose(incident_id: str):
    with tracer.invoke(f"Diagnose {incident_id}") as inv:
        inv.step(StepType.THOUGHT, "Analyzing metrics")
        inv.complete(goal_achieved=True)
        return {"status": "resolved"}

Testing

Use AgenticAssertions from agenttel-testing for fluent test assertions.

import static io.agenttel.testing.AgenticAssertions.*;

@Test
void agentAchievesGoal() {
    // ... run agent logic ...

    assertAgentInvocation(collector)
        .hasAgentName("incident-responder")
        .hasGoal("Diagnose high latency")
        .wasSuccessful()
        .hasStepCount(4);

    // Assert by agent name
    assertAgentInvocation(collector, "summarizer")
        .wasSuccessful();

    // Assert orchestration
    assertOrchestrationPattern(collector, "sequential")
        .hasTotalStages(3);

    // Check specific span types
    List<SpanData> steps = getStepSpans(collector);
    List<SpanData> tools = getToolCallSpans(collector);
    List<SpanData> guardrails = getGuardrailSpans(collector);
}

Span Hierarchy

%%{init: {'theme': 'base', 'themeVariables': {'lineColor': '#6366f1'}}}%%
graph TB
    SESSION["agenttel.agentic.session<br/><small>orchestration.pattern</small>"]
    INV1["invoke_agent<br/><small>agent.name, invocation.goal</small>"]
    INV2["invoke_agent<br/><small>agent.name, invocation.goal</small>"]
    STEP1["agenttel.agentic.step<br/><small>step.type=thought</small>"]
    TOOL["agenttel.agentic.tool_call<br/><small>step.tool_name</small>"]
    STEP2["agenttel.agentic.step<br/><small>step.type=observation</small>"]
    TASK["agenttel.agentic.task<br/><small>task.name, task.depth</small>"]
    SUBTASK["agenttel.agentic.task<br/><small>task.depth=1</small>"]
    HANDOFF["agenttel.agentic.handoff<br/><small>handoff.from_agent, to_agent</small>"]
    HUMAN["agenttel.agentic.human_input<br/><small>human.checkpoint_type</small>"]
    CODE["agenttel.agentic.code_execution<br/><small>code.language</small>"]
    EVAL["agenttel.agentic.evaluate<br/><small>eval.scorer_name</small>"]
    RET["agenttel.agentic.retriever<br/><small>retrieval.query</small>"]
    RERANK["agenttel.agentic.reranker<br/><small>reranker.model</small>"]
    GUARD["agenttel.agentic.guardrail<br/><small>guardrail.name</small>"]
    MEM["agenttel.agentic.memory<br/><small>memory.operation</small>"]

    SESSION --> INV1
    SESSION --> INV2
    INV1 --> STEP1
    INV1 --> TOOL
    INV1 --> STEP2
    INV1 --> TASK
    INV1 --> HANDOFF
    INV1 --> HUMAN
    INV1 --> CODE
    INV1 --> EVAL
    INV1 --> RET
    INV1 --> RERANK
    INV1 --> GUARD
    TASK --> SUBTASK
    SESSION -.-> MEM

    style SESSION fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style INV1 fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style INV2 fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style STEP1 fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style STEP2 fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style TOOL fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style TASK fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style SUBTASK fill:#a5b4fc,stroke:#6366f1,color:#1e1b4b
    style HANDOFF fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style HUMAN fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style CODE fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style EVAL fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style RET fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style RERANK fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style GUARD fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style MEM fill:#a5b4fc,stroke:#6366f1,color:#1e1b4b

Further Reading