Observability & Tracing
Station includes built-in OpenTelemetry (OTEL) support for complete execution observability. Every agent execution, LLM call, and tool invocation is automatically traced.
What Gets Traced
| Component | Details Captured |
|---|---|
| Agent Executions | Complete timeline from start to finish |
| LLM Calls | Every OpenAI/Anthropic/Gemini API call with latency |
| MCP Tool Usage | Individual tool calls to AWS, databases, etc. |
| Database Operations | Query performance and data access patterns |
| GenKit Spans | Dotprompt execution, generation flow, model interactions |
Quick Start with Jaeger
The fastest way to get tracing running locally:
# Start Jaeger
stn jaeger up
# Jaeger UI available at http://localhost:16686
Station automatically detects Jaeger and sends traces to http://localhost:4318.
Example Trace
incident_coordinator (18.2s)
ββ assess_severity (0.5s)
ββ delegate_logs_investigator (4.1s)
β ββ __get_logs (3.2s)
ββ delegate_metrics_investigator (3.8s)
β ββ __query_time_series (2.9s)
ββ delegate_change_detective (2.4s)
β ββ __get_recent_deployments (1.8s)
ββ synthesize_findings (1.2s)
Configuration
Environment Variable (Recommended)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
stn serve
Config File
# config.yaml
otel_endpoint: "http://localhost:4318"
MCP Client Configuration
When connecting MCP clients, include the OTEL endpoint:
{
"mcpServers": {
"station": {
"command": "stn",
"args": ["stdio"],
"env": {
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4318"
}
}
}
}
Or with Claude Code CLI:
claude mcp add station -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 --scope user -- stn stdio
Tracing Backends
Station works with any OpenTelemetry-compatible backend.
Jaeger (Local Development)
# Built-in command
stn jaeger up
# Or manual Docker
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
-e COLLECTOR_OTLP_ENABLED=true \
jaegertracing/all-in-one:latest
Grafana Tempo
# docker-compose.yml
services:
tempo:
image: grafana/tempo:latest
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
ports:
- "4318:4318" # OTLP HTTP
- "3200:3200" # Tempo API
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
Datadog APM
# Install Datadog Agent with OTLP support
DD_API_KEY=<your-key> DD_SITE="datadoghq.com" \
DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT="0.0.0.0:4318" \
bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"
# Configure Station
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
Honeycomb
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key"
AWS X-Ray
# Run AWS OTEL Collector
docker run -d \
-p 4318:4318 \
-e AWS_REGION=us-east-1 \
amazon/aws-otel-collector:latest
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
New Relic
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS="api-key=your-license-key"
Azure Monitor
# Use Azure Monitor OpenTelemetry Exporter
export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=..."
export OTEL_EXPORTER_OTLP_ENDPOINT=https://dc.services.visualstudio.com/v2/track
Span Details
Station captures rich span information:
Agent Execution Span
{
"name": "agent.execute",
"attributes": {
"agent.id": "21",
"agent.name": "incident_coordinator",
"agent.environment": "production",
"task": "Investigate API timeout",
"model": "gpt-4o-mini",
"max_steps": 20
}
}
LLM Call Span
{
"name": "llm.generate",
"attributes": {
"model": "gpt-4o-mini",
"provider": "openai",
"input_tokens": 1250,
"output_tokens": 380,
"latency_ms": 1240
}
}
Tool Call Span
{
"name": "tool.call",
"attributes": {
"tool.name": "__get_logs",
"tool.server": "datadog",
"duration_ms": 320,
"success": true
}
}
Viewing Traces
Jaeger UI
- Open http://localhost:16686
- Select βstationβ from the Service dropdown
- Click βFind Tracesβ
- Click on a trace to see the full execution timeline
Filtering Traces
In Jaeger, use tags to filter:
agent.name=incident_coordinator
model=gpt-4o-mini
error=true
Production Setup
High-Volume Environments
For production, use sampling to reduce trace volume:
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1 # Sample 10% of traces
Secure Endpoints
# TLS endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.example.com:4318
export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt
# With authentication
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"
Docker Deployment
# docker-compose.yml
services:
station:
image: ghcr.io/cloudshipai/station:latest
environment:
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
depends_on:
- jaeger
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686"
environment:
- COLLECTOR_OTLP_ENABLED=true
Troubleshooting
No Traces Appearing
-
Check endpoint connectivity:
curl -v http://localhost:4318/v1/traces # Should return 405 Method Not Allowed (POST required) -
Verify environment variable:
echo $OTEL_EXPORTER_OTLP_ENDPOINT -
Check Station logs:
stn logs | grep -i otel
Traces Missing Tool Calls
Ensure MCP servers are configured with tracing:
{
"mcpServers": {
"my-server": {
"command": "my-mcp-server",
"env": {
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4318"
}
}
}
}
High Latency in Traces
If traces show high latency:
- Check network connectivity to tracing backend
- Consider async export: traces are sent asynchronously by default
- For high-volume, use sampling (see Production Setup)
Next Steps
- Deployment Monitoring - Metrics and alerting
- Scheduling - Automated agent runs
- CloudShip Integration - Centralized observability