Observability with OpenTelemetry

OpenTelemetry exporter for AIP integrity checkpoints and AAP verification results. Send AIP/AAP telemetry to any OTel-compatible observability platform — Langfuse, Arize Phoenix, Datadog, Grafana — with zero custom code.

Why

AIP and AAP produce rich alignment telemetry: integrity verdicts, concerns, verification results, coherence scores, drift alerts. But this data is only useful if it’s observable. This exporter bridges the gap between protocol output and your existing observability stack by mapping everything onto OpenTelemetry spans, events, and metrics.

AIP/AAP Protocol Output ──→ aip-otel-exporter ──→ OTel SDK ──→ Your Platform
                                                      │
                                                      ├── Langfuse
                                                      ├── Arize Phoenix
                                                      ├── Datadog
                                                      ├── Grafana / Tempo
                                                      └── Any OTLP endpoint

Three Integration Layers

Layer	TypeScript	Python	OTel SDK?	Use Case
Manual API	`@mnemom/aip-otel-exporter`	`aip-otel-exporter[otel]`	Yes	Full control, works everywhere
Auto-instrumentation	`@mnemom/aip-otel-exporter/auto`	`AIPInstrumentor`	Yes	Wraps AIP/AAP calls automatically
CF Workers adapter	`@mnemom/aip-otel-exporter/workers`	—	No	Cloudflare Workers edge runtime

Quick Start

TypeScript

npm install @mnemom/aip-otel-exporter @opentelemetry/api

import { createAIPOTelRecorder } from "@mnemom/aip-otel-exporter";

const recorder = createAIPOTelRecorder({ tracerProvider });

recorder.recordIntegrityCheck(signal);    // AIP integrity check → span
recorder.recordVerification(result);       // AAP verification → span
recorder.recordCoherence(result);          // AAP coherence → span
recorder.recordDrift(alerts, count);       // AAP drift detection → span

Python

pip install aip-otel-exporter[otel]

from aip_otel_exporter import AIPOTelRecorder

recorder = AIPOTelRecorder(tracer_provider=provider)

recorder.record_integrity_check(signal)
recorder.record_verification(result)
recorder.record_coherence(result)
recorder.record_drift(alerts, traces_analyzed=50)

Span Hierarchy

Spans are created as children of the current active span via context.active():

your_application_span
  ├── aip.integrity_check
  │    ├── event: aip.concern (one per concern)
  │    └── event: aip.drift_alert (when drift active)
  ├── aap.verify_trace
  │    └── event: aap.violation (one per violation)
  ├── aap.check_coherence
  └── aap.detect_drift
       └── event: aap.drift_alert (one per alert)

Attributes Reference

For the complete attributes and metrics reference, see OTel Attributes.

`aip.integrity_check` — 22 attributes + 2 GenAI SIG aliases

Attribute	Type	Source
`aip.integrity.checkpoint_id`	string	checkpoint
`aip.integrity.verdict`	string	checkpoint (clear / review_needed / boundary_violation)
`aip.integrity.agent_id`	string	checkpoint
`aip.integrity.card_id`	string	checkpoint
`aip.integrity.session_id`	string	checkpoint
`aip.integrity.thinking_hash`	string	checkpoint (SHA-256)
`aip.integrity.proceed`	boolean	signal
`aip.integrity.recommended_action`	string	signal
`aip.integrity.concerns_count`	int	signal
`aip.integrity.analysis_model`	string	analysis_metadata
`aip.integrity.analysis_duration_ms`	float	analysis_metadata
`aip.integrity.thinking_tokens`	int	analysis_metadata
`aip.integrity.truncated`	boolean	analysis_metadata
`aip.integrity.extraction_confidence`	float	analysis_metadata
`aip.conscience.consultation_depth`	string	conscience_context
`aip.conscience.values_checked_count`	int	conscience_context
`aip.conscience.conflicts_count`	int	conscience_context
`aip.window.size`	int	window_summary
`aip.window.integrity_ratio`	float	window_summary (0.0-1.0)
`aip.window.drift_alert_active`	boolean	window_summary
`gen_ai.evaluation.verdict`	string	GenAI SIG forward-compat
`gen_ai.evaluation.score`	float	GenAI SIG forward-compat

`aap.verify_trace` — 8 attributes

Attribute	Type
`aap.verification.result`	boolean
`aap.verification.similarity_score`	float
`aap.verification.violations_count`	int
`aap.verification.warnings_count`	int
`aap.verification.trace_id`	string
`aap.verification.card_id`	string
`aap.verification.duration_ms`	float
`aap.verification.checks_performed`	string (comma-separated)

`aap.check_coherence` — 5 attributes

Attribute	Type
`aap.coherence.compatible`	boolean
`aap.coherence.score`	float (0.0-1.0)
`aap.coherence.proceed`	boolean
`aap.coherence.matched_count`	int
`aap.coherence.conflict_count`	int

`aap.detect_drift` — 2 attributes

Attribute	Type
`aap.drift.alerts_count`	int
`aap.drift.traces_analyzed`	int

Metrics

9 metric instruments for aggregate monitoring:

Metric	Type	Labels
`aip.integrity_checks.total`	Counter	verdict, agent_id
`aip.concerns.total`	Counter	category, severity
`aip.analysis.duration_ms`	Histogram	verdict
`aip.window.integrity_ratio`	Histogram	—
`aip.drift_alerts.total`	Counter	—
`aap.verifications.total`	Counter	verified
`aap.violations.total`	Counter	type, severity
`aap.verification.duration_ms`	Histogram	—
`aap.coherence.score`	Histogram	compatible

Policy Evaluation Spans

When the Policy Engine evaluates requests at the gateway, it emits policy.evaluate spans. Record these alongside integrity checks for complete governance telemetry:

import { createAIPOTelRecorder } from "@mnemom/aip-otel-exporter";

const recorder = createAIPOTelRecorder({ tracerProvider });

// Record a policy evaluation result
recorder.recordPolicyEvaluation({
  agent_id: "my-agent",
  policy_id: "research-agent-policy",
  policy_version: 2,
  verdict: "pass",
  violations_count: 0,
  warnings_count: 1,
  coverage_pct: 92.5,
  context: "gateway",
  enforcement_mode: "enforce",
  duration_ms: 3.2,
});

Policy Dashboard Example

A Grafana dashboard panel for policy evaluation monitoring:

{
  "title": "Policy Evaluation Overview",
  "panels": [
    {
      "title": "Policy Verdicts (24h)",
      "type": "piechart",
      "targets": [{ "expr": "sum by (verdict) (increase(policy_evaluations_total[24h]))" }]
    },
    {
      "title": "Policy Violations by Type",
      "type": "timeseries",
      "targets": [{ "expr": "sum by (type, severity) (rate(policy_violations_total[5m]))" }]
    },
    {
      "title": "Policy Evaluation Latency (p99)",
      "type": "stat",
      "targets": [{ "expr": "histogram_quantile(0.99, rate(policy_evaluation_duration_ms_bucket[5m]))" }]
    },
    {
      "title": "Coverage by Agent",
      "type": "table",
      "targets": [{ "expr": "avg by (agent_id) (policy_coverage_pct)" }]
    }
  ]
}

Output Analysis in Traces

When output analysis is enabled for an agent (analyze_output: true), the observer passes output_block_hash and analysis_scope through linkCheckpointToTrace(). These fields appear in the aip.integrity_check span alongside the existing thinking-block attributes. To filter traces by output-aware analysis in Grafana:

{span.aip.integrity.analysis_scope = "thinking_and_output"} | rate() by (span.aip.integrity.verdict)

This query surfaces only checkpoints where the conscience prompt included both thinking and output content, making it easy to compare verdict distributions between thinking-only and output-aware analysis.

In the observer context, enforcement_mode is always observe — the observer performs post-action evaluation and never blocks requests. The policy.enforcement_mode attribute on observer-emitted policy.evaluate spans will always be observe, distinguishing them from gateway-emitted spans which may be warn or enforce.

Dashboard Templates

Pre-built dashboards are available in the aip-otel-exporter repository:

grafana-aip-overview.json — Fleet-wide integrity monitoring
grafana-aip-detail.json — Per-agent deep-dive
datadog-aip-overview.json — Datadog importable dashboard

See the dashboards README for import instructions.

Platform Examples

Integration examples are available in the examples directory:

Platform	File
Langfuse	`langfuse.ts`
Arize Phoenix	`arize-phoenix.ts`
Datadog	`datadog.ts`
Cloudflare Workers	`cloudflare-workers.ts`

Performance

Measured via npm run bench (Vitest bench, Node 22, Apple M-series):

Operation	Mean	p99	Ops/sec
`recordIntegrityCheck()`	0.007 ms	0.023 ms	142,540
`recordVerification()`	0.003 ms	0.004 ms	310,510
`recordCoherence()`	0.003 ms	0.003 ms	321,385
`recordDrift()`	0.003 ms	0.007 ms	295,807
Workers `createOTLPSpan()`	0.003 ms	0.004 ms	341,778
Workers `serializeExportPayload()`	0.004 ms	0.006 ms	234,860

All operations are sub-0.01ms mean. Zero measurable overhead on hot paths.

Design Principles

Duck-typed inputs — No hard dependency on AIP/AAP packages. Works with any compatible shape.
Graceful degradation — Missing fields are silently skipped, never throws.
Zero-overhead Workers — CF Workers adapter uses only fetch() + crypto, no OTel SDK.
GenAI SIG forward-compat — gen_ai.evaluation.* aliases for future OTel GenAI SIG alignment.

Standards Alignment

The exporter follows OpenTelemetry Semantic Conventions for span naming and attribute structure. Forward-compatible aliases (gen_ai.evaluation.*) track the emerging OTel GenAI SIG conventions for AI/ML observability. This exporter is part of the Mnemom trust infrastructure:

AIP — Agent Integrity Protocol (real-time thinking analysis)
AAP — Agent Alignment Protocol (behavioral verification)
aip-otel-exporter — This package (observability bridge)

Additional Resources

Document	Description
CHANGELOG	Release history
CONTRIBUTING	Development setup and contribution guide
Security Policy	Security policy and threat model
TypeScript README	TypeScript package documentation
Python README	Python package documentation
Dashboards README	Dashboard import instructions

Status

Version 0.1.0 — Initial release.

Component	Status
TypeScript Manual API	Stable
TypeScript Auto-instrumentation	Stable
TypeScript Workers Adapter	Stable
Python Manual API	Stable
Python Auto-instrumentation	Stable
Metrics API	Stable
Dashboard Templates	Stable

Guides

Documentation Index

​Observability with OpenTelemetry

​Why

​Three Integration Layers

​Quick Start

​TypeScript

​Python

​Span Hierarchy

​Attributes Reference

​aip.integrity_check — 22 attributes + 2 GenAI SIG aliases

​aap.verify_trace — 8 attributes

​aap.check_coherence — 5 attributes

​aap.detect_drift — 2 attributes

​Metrics

​Policy Evaluation Spans

​Policy Dashboard Example

​Output Analysis in Traces

​Dashboard Templates

​Platform Examples

​Performance

​Design Principles

​Standards Alignment

​Additional Resources

​Status

Observability with OpenTelemetry

Why

Three Integration Layers

Quick Start

TypeScript

Python

Span Hierarchy

Attributes Reference

`aip.integrity_check` — 22 attributes + 2 GenAI SIG aliases

`aap.verify_trace` — 8 attributes

`aap.check_coherence` — 5 attributes

`aap.detect_drift` — 2 attributes

Metrics

Policy Evaluation Spans

Policy Dashboard Example

Output Analysis in Traces

Dashboard Templates

Platform Examples

Performance

Design Principles

Standards Alignment

Additional Resources

Status