Documentation Index
Fetch the complete documentation index at: https://mnemomllc-feat-aip-output-analysis-docs.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Observability with OpenTelemetry
OpenTelemetry exporter for AIP integrity checkpoints and AAP verification results.
Send AIP/AAP telemetry to any OTel-compatible observability platform — Langfuse, Arize Phoenix,
Datadog, Grafana — with zero custom code.
Why
AIP and AAP produce rich alignment telemetry: integrity verdicts, concerns, verification results,
coherence scores, drift alerts. But this data is only useful if it’s observable. This exporter
bridges the gap between protocol output and your existing observability stack by mapping everything
onto OpenTelemetry spans, events, and metrics.
AIP/AAP Protocol Output ──→ aip-otel-exporter ──→ OTel SDK ──→ Your Platform
│
├── Langfuse
├── Arize Phoenix
├── Datadog
├── Grafana / Tempo
└── Any OTLP endpoint
Three Integration Layers
| Layer | TypeScript | Python | OTel SDK? | Use Case |
|---|
| Manual API | @mnemom/aip-otel-exporter | aip-otel-exporter[otel] | Yes | Full control, works everywhere |
| Auto-instrumentation | @mnemom/aip-otel-exporter/auto | AIPInstrumentor | Yes | Wraps AIP/AAP calls automatically |
| CF Workers adapter | @mnemom/aip-otel-exporter/workers | — | No | Cloudflare Workers edge runtime |
Quick Start
TypeScript
npm install @mnemom/aip-otel-exporter @opentelemetry/api
import { createAIPOTelRecorder } from "@mnemom/aip-otel-exporter";
const recorder = createAIPOTelRecorder({ tracerProvider });
recorder.recordIntegrityCheck(signal); // AIP integrity check → span
recorder.recordVerification(result); // AAP verification → span
recorder.recordCoherence(result); // AAP coherence → span
recorder.recordDrift(alerts, count); // AAP drift detection → span
Python
pip install aip-otel-exporter[otel]
from aip_otel_exporter import AIPOTelRecorder
recorder = AIPOTelRecorder(tracer_provider=provider)
recorder.record_integrity_check(signal)
recorder.record_verification(result)
recorder.record_coherence(result)
recorder.record_drift(alerts, traces_analyzed=50)
Span Hierarchy
Spans are created as children of the current active span via context.active():
your_application_span
├── aip.integrity_check
│ ├── event: aip.concern (one per concern)
│ └── event: aip.drift_alert (when drift active)
├── aap.verify_trace
│ └── event: aap.violation (one per violation)
├── aap.check_coherence
└── aap.detect_drift
└── event: aap.drift_alert (one per alert)
Attributes Reference
For the complete attributes and metrics reference, see OTel Attributes.
aip.integrity_check — 22 attributes + 2 GenAI SIG aliases
| Attribute | Type | Source |
|---|
aip.integrity.checkpoint_id | string | checkpoint |
aip.integrity.verdict | string | checkpoint (clear / review_needed / boundary_violation) |
aip.integrity.agent_id | string | checkpoint |
aip.integrity.card_id | string | checkpoint |
aip.integrity.session_id | string | checkpoint |
aip.integrity.thinking_hash | string | checkpoint (SHA-256) |
aip.integrity.proceed | boolean | signal |
aip.integrity.recommended_action | string | signal |
aip.integrity.concerns_count | int | signal |
aip.integrity.analysis_model | string | analysis_metadata |
aip.integrity.analysis_duration_ms | float | analysis_metadata |
aip.integrity.thinking_tokens | int | analysis_metadata |
aip.integrity.truncated | boolean | analysis_metadata |
aip.integrity.extraction_confidence | float | analysis_metadata |
aip.conscience.consultation_depth | string | conscience_context |
aip.conscience.values_checked_count | int | conscience_context |
aip.conscience.conflicts_count | int | conscience_context |
aip.window.size | int | window_summary |
aip.window.integrity_ratio | float | window_summary (0.0-1.0) |
aip.window.drift_alert_active | boolean | window_summary |
gen_ai.evaluation.verdict | string | GenAI SIG forward-compat |
gen_ai.evaluation.score | float | GenAI SIG forward-compat |
aap.verify_trace — 8 attributes
| Attribute | Type |
|---|
aap.verification.result | boolean |
aap.verification.similarity_score | float |
aap.verification.violations_count | int |
aap.verification.warnings_count | int |
aap.verification.trace_id | string |
aap.verification.card_id | string |
aap.verification.duration_ms | float |
aap.verification.checks_performed | string (comma-separated) |
aap.check_coherence — 5 attributes
| Attribute | Type |
|---|
aap.coherence.compatible | boolean |
aap.coherence.score | float (0.0-1.0) |
aap.coherence.proceed | boolean |
aap.coherence.matched_count | int |
aap.coherence.conflict_count | int |
aap.detect_drift — 2 attributes
| Attribute | Type |
|---|
aap.drift.alerts_count | int |
aap.drift.traces_analyzed | int |
Metrics
9 metric instruments for aggregate monitoring:
| Metric | Type | Labels |
|---|
aip.integrity_checks.total | Counter | verdict, agent_id |
aip.concerns.total | Counter | category, severity |
aip.analysis.duration_ms | Histogram | verdict |
aip.window.integrity_ratio | Histogram | — |
aip.drift_alerts.total | Counter | — |
aap.verifications.total | Counter | verified |
aap.violations.total | Counter | type, severity |
aap.verification.duration_ms | Histogram | — |
aap.coherence.score | Histogram | compatible |
Policy Evaluation Spans
When the Policy Engine evaluates requests at the gateway, it emits policy.evaluate spans. Record these alongside integrity checks for complete governance telemetry:
import { createAIPOTelRecorder } from "@mnemom/aip-otel-exporter";
const recorder = createAIPOTelRecorder({ tracerProvider });
// Record a policy evaluation result
recorder.recordPolicyEvaluation({
agent_id: "my-agent",
policy_id: "research-agent-policy",
policy_version: 2,
verdict: "pass",
violations_count: 0,
warnings_count: 1,
coverage_pct: 92.5,
context: "gateway",
enforcement_mode: "enforce",
duration_ms: 3.2,
});
Policy Dashboard Example
A Grafana dashboard panel for policy evaluation monitoring:
{
"title": "Policy Evaluation Overview",
"panels": [
{
"title": "Policy Verdicts (24h)",
"type": "piechart",
"targets": [{ "expr": "sum by (verdict) (increase(policy_evaluations_total[24h]))" }]
},
{
"title": "Policy Violations by Type",
"type": "timeseries",
"targets": [{ "expr": "sum by (type, severity) (rate(policy_violations_total[5m]))" }]
},
{
"title": "Policy Evaluation Latency (p99)",
"type": "stat",
"targets": [{ "expr": "histogram_quantile(0.99, rate(policy_evaluation_duration_ms_bucket[5m]))" }]
},
{
"title": "Coverage by Agent",
"type": "table",
"targets": [{ "expr": "avg by (agent_id) (policy_coverage_pct)" }]
}
]
}
Output Analysis in Traces
When output analysis is enabled for an agent (analyze_output: true), the observer passes output_block_hash and analysis_scope through linkCheckpointToTrace(). These fields appear in the aip.integrity_check span alongside the existing thinking-block attributes.
To filter traces by output-aware analysis in Grafana:
{span.aip.integrity.analysis_scope = "thinking_and_output"} | rate() by (span.aip.integrity.verdict)
This query surfaces only checkpoints where the conscience prompt included both thinking and output content, making it easy to compare verdict distributions between thinking-only and output-aware analysis.
In the observer context, enforcement_mode is always observe — the observer performs post-action evaluation and never blocks requests. The policy.enforcement_mode attribute on observer-emitted policy.evaluate spans will always be observe, distinguishing them from gateway-emitted spans which may be warn or enforce.
Dashboard Templates
Pre-built dashboards are available in the aip-otel-exporter repository:
- grafana-aip-overview.json — Fleet-wide integrity monitoring
- grafana-aip-detail.json — Per-agent deep-dive
- datadog-aip-overview.json — Datadog importable dashboard
See the dashboards README for import instructions.
Integration examples are available in the examples directory:
| Platform | File |
|---|
| Langfuse | langfuse.ts |
| Arize Phoenix | arize-phoenix.ts |
| Datadog | datadog.ts |
| Cloudflare Workers | cloudflare-workers.ts |
Measured via npm run bench (Vitest bench, Node 22, Apple M-series):
| Operation | Mean | p99 | Ops/sec |
|---|
recordIntegrityCheck() | 0.007 ms | 0.023 ms | 142,540 |
recordVerification() | 0.003 ms | 0.004 ms | 310,510 |
recordCoherence() | 0.003 ms | 0.003 ms | 321,385 |
recordDrift() | 0.003 ms | 0.007 ms | 295,807 |
Workers createOTLPSpan() | 0.003 ms | 0.004 ms | 341,778 |
Workers serializeExportPayload() | 0.004 ms | 0.006 ms | 234,860 |
All operations are sub-0.01ms mean. Zero measurable overhead on hot paths.
Design Principles
- Duck-typed inputs — No hard dependency on AIP/AAP packages. Works with any compatible shape.
- Graceful degradation — Missing fields are silently skipped, never throws.
- Zero-overhead Workers — CF Workers adapter uses only
fetch() + crypto, no OTel SDK.
- GenAI SIG forward-compat —
gen_ai.evaluation.* aliases for future OTel GenAI SIG alignment.
Standards Alignment
The exporter follows OpenTelemetry Semantic Conventions
for span naming and attribute structure. Forward-compatible aliases (gen_ai.evaluation.*) track
the emerging OTel GenAI SIG
conventions for AI/ML observability.
This exporter is part of the Mnemom trust infrastructure:
- AIP — Agent Integrity Protocol (real-time thinking analysis)
- AAP — Agent Alignment Protocol (behavioral verification)
- aip-otel-exporter — This package (observability bridge)
Additional Resources
| Document | Description |
|---|
| CHANGELOG | Release history |
| CONTRIBUTING | Development setup and contribution guide |
| Security Policy | Security policy and threat model |
| TypeScript README | TypeScript package documentation |
| Python README | Python package documentation |
| Dashboards README | Dashboard import instructions |
Status
Version 0.1.0 — Initial release.
| Component | Status |
|---|
| TypeScript Manual API | Stable |
| TypeScript Auto-instrumentation | Stable |
| TypeScript Workers Adapter | Stable |
| Python Manual API | Stable |
| Python Auto-instrumentation | Stable |
| Metrics API | Stable |
| Dashboard Templates | Stable |