Self-telemetry
The per-component numbers you see on the topology canvas —
records/sec on each receiver, errors/sec on each exporter, queue
depth on the backed-up edge — come from the collector itself. Every
managed collector pushes its own internal metrics (Alloy’s
prometheus.exporter.self, otelcol-contrib’s otelcol_* family) to
LinkMesh over standard OTLP. LinkMesh maps those metrics onto the
pipeline shape and renders them.
The chain has no proprietary middleware in the data path. The collector exposes its metrics the same way it always has; LinkMesh just becomes one of the destinations.
What you see in the UI
For every collector enrolled via OpAMP or remotecfg, the topology
canvas shows:
- Per-edge throughput labels — records/sec on each connection between a receiver, processor, and exporter. Updated every ~30s.
- Per-component error rates — errors/sec, computed from the
collector’s
refused/send_failedcounters. - Exporter queue depth — current queue size vs. capacity, useful for spotting back-pressure before it becomes data loss.
- Host metrics on the collector detail page — CPU, memory, uptime, restart count.
For what each sparkline, queue bar and error flag on the canvas actually means — element by element — see Reading throughput on the canvas.
If a number is missing where you expect one, see Self-telemetry troubleshooting.
How it works
flowchart LR
subgraph host["Managed host"]
collector["Collector<br/>(Alloy or otelcol-contrib)"]
scrape["prometheus.exporter.self<br/>+ service.telemetry"]
internal["Internal metrics<br/>otelcol_receiver_accepted_logs<br/>otelcol_exporter_sent_metric_points<br/>…"]
collector --> internal
internal --> scrape
end
scrape -- "OTLP/HTTP + Bearer<br/>POST /v1/metrics" --> server["LinkMesh server<br/>OTLP receiver"]
server -- "translates otelcol_*<br/>into ComponentThroughput" --> ui["Topology canvas<br/>per-edge numbers + dashboards"]
Three pieces compose:
-
The collector scrapes its own internal metrics. Alloy via
prometheus.exporter.self; otelcol-contrib via its built-inservice.telemetry.metricsexporter. These metrics have been part of every OTel collector since v0.85 — nothing LinkMesh-specific. -
The collector pushes the scrape result over OTLP to a destination LinkMesh configured at enrollment time. The destination URL is
<server>/v1/metrics; the Authorization header is a per-collector bearer token (one token per collector, covering both OTLP push and the native remote-config poll). -
The LinkMesh server translates the standard
otelcol_*metrics into per-component throughput records. The metric names (otelcol_receiver_accepted_*,otelcol_exporter_sent_*, etc.) map onto receivers/processors/exporters by their OTel SDK instance name; rate computation happens at read time so the UI stays responsive.
Where the bearer token comes from
The bearer is minted once at enrollment and lives in the collector’s local config — no operator action required to set it up. Two paths, depending on runtime:
- otelcol-contrib via OpAMP: LinkMesh’s OpAMP server mints the
token immediately after the collector’s first handshake and pushes
it via OpAMP’s
ConnectionSettings.OwnMetricsoffer. The supervisor receives it and installs it on the collector’s own_metrics pipeline. Operator sees aregisteredthenown_metrics_offeredevent on the collector’s Events tab. - Alloy via
remotecfg:linkmesh-agentwrites a bootstrapconfig.alloyat install time that contains the bearer + endpoints. Subsequent token rotations happen via the same bootstrap-rewrite path.
The bearer is opaque to the operator — there’s no UI surface to copy it around. To rotate, deregister + re-enroll the collector; this revokes the old token and mints a fresh one.
What gets reported
The collector’s otelcol_* self-metrics, mapped per OTel SDK
convention:
| What | Collector metric | LinkMesh field |
|---|---|---|
| Receiver accepted | otelcol_receiver_accepted_* | recordsPerSec |
| Receiver refused | otelcol_receiver_refused_* | errorsPerSec |
| Processor incoming | otelcol_processor_incoming_items | incomingPerSec |
| Processor outgoing | otelcol_processor_outgoing_items | recordsPerSec |
| Exporter sent | otelcol_exporter_sent_* | recordsPerSec |
| Exporter send failed | otelcol_exporter_send_failed_* | errorsPerSec |
| Exporter queue size | otelcol_exporter_queue_size | queueSize |
| Exporter queue capacity | otelcol_exporter_queue_capacity | queueCapacity |
Plus host metrics on the collector detail page:
| What | Collector metric | LinkMesh field |
|---|---|---|
| CPU utilisation | system.cpu.utilization | hostCpuPercent |
| Memory utilisation | system.memory.utilization | hostMemoryPercent |
| Process CPU | process.cpu.utilization | cpuPercent |
| Process memory | process.memory.usage | memoryMb |
| Uptime | process.uptime | uptimeSeconds |
What does NOT get reported
The OTLP push is scoped to internal collector metrics only. None of the following ever leaves the collector via the LinkMesh OTLP endpoint:
- Customer telemetry content (logs, metrics, traces flowing through the pipeline). That goes to the destinations you configured — Grafana Cloud, Loki, Mimir, whatever.
- Collector application logs (Alloy’s stderr, otelcol-contrib’s stdout). Stays on the host’s journal / log file.
- Collector configuration. LinkMesh already knows it — it generated it. The collector doesn’t re-emit it.
- Any host-process data outside the collector’s own
service.telemetryscope.
The push is exactly the same payload an operator would get by
configuring a prometheus.scrape against the collector’s
/metrics endpoint and forwarding to their own backend. LinkMesh
just happens to be one such backend.
Standards mode vs Managed mode
Both modes use the same self-telemetry path — the difference is only in who configures it.
- Standards mode (otelcol-contrib via OpAMP, or Alloy via
remotecfg): LinkMesh’s enrollment flow configures self-telemetry out-of-band as part of bringing the collector online. Operator sees throughput numbers without touching the collector’s local config. - Managed mode (linkmesh-agent supervises a collector): same story — the agent’s bootstrap config wires self-telemetry on first start. Operator gets the same numbers; the wiring path is just agent-driven instead of OpAMP-driven.
Either way, the topology canvas renders identically. The mode affects how config arrives at the collector (see Native remote config), not how self-telemetry reports back.
Self-telemetry over time
ComponentThroughput records are kept for 24 hours with a TTL —
enough to drive the topology canvas + the last-day chart on the
collector detail page. Long-term retention is the customer’s
responsibility through their own backends (the same backends the
pipeline destinations write to). If you need 30-day historical
throughput trends, point a Prometheus scrape at the collector’s
/metrics endpoint and pair it with your existing observability
stack.
See also
- Native remote config — the other half of the enrollment story: how config gets to the collector. Same per-collector bearer covers both endpoints.
- Enrol a Standards collector — walkthrough; the self-telemetry path lights up automatically as part of step 4.
- Self-telemetry troubleshooting — when the numbers don’t show up.