Skip to content

Self-telemetry

The per-component numbers you see on the topology canvas — records/sec on each receiver, errors/sec on each exporter, queue depth on the backed-up edge — come from the collector itself. Every managed collector pushes its own internal metrics (Alloy’s prometheus.exporter.self, otelcol-contrib’s otelcol_* family) to LinkMesh over standard OTLP. LinkMesh maps those metrics onto the pipeline shape and renders them.

The chain has no proprietary middleware in the data path. The collector exposes its metrics the same way it always has; LinkMesh just becomes one of the destinations.

What you see in the UI

For every collector enrolled via OpAMP or remotecfg, the topology canvas shows:

  • Per-edge throughput labels — records/sec on each connection between a receiver, processor, and exporter. Updated every ~30s.
  • Per-component error rates — errors/sec, computed from the collector’s refused / send_failed counters.
  • Exporter queue depth — current queue size vs. capacity, useful for spotting back-pressure before it becomes data loss.
  • Host metrics on the collector detail page — CPU, memory, uptime, restart count.

For what each sparkline, queue bar and error flag on the canvas actually means — element by element — see Reading throughput on the canvas.

If a number is missing where you expect one, see Self-telemetry troubleshooting.

How it works

flowchart LR
    subgraph host["Managed host"]
        collector["Collector<br/>(Alloy or otelcol-contrib)"]
        scrape["prometheus.exporter.self<br/>+ service.telemetry"]
        internal["Internal metrics<br/>otelcol_receiver_accepted_logs<br/>otelcol_exporter_sent_metric_points<br/>…"]
        collector --> internal
        internal --> scrape
    end
    scrape -- "OTLP/HTTP + Bearer<br/>POST /v1/metrics" --> server["LinkMesh server<br/>OTLP receiver"]
    server -- "translates otelcol_*<br/>into ComponentThroughput" --> ui["Topology canvas<br/>per-edge numbers + dashboards"]

Three pieces compose:

  1. The collector scrapes its own internal metrics. Alloy via prometheus.exporter.self; otelcol-contrib via its built-in service.telemetry.metrics exporter. These metrics have been part of every OTel collector since v0.85 — nothing LinkMesh-specific.

  2. The collector pushes the scrape result over OTLP to a destination LinkMesh configured at enrollment time. The destination URL is <server>/v1/metrics; the Authorization header is a per-collector bearer token (one token per collector, covering both OTLP push and the native remote-config poll).

  3. The LinkMesh server translates the standard otelcol_* metrics into per-component throughput records. The metric names (otelcol_receiver_accepted_*, otelcol_exporter_sent_*, etc.) map onto receivers/processors/exporters by their OTel SDK instance name; rate computation happens at read time so the UI stays responsive.

Where the bearer token comes from

The bearer is minted once at enrollment and lives in the collector’s local config — no operator action required to set it up. Two paths, depending on runtime:

  • otelcol-contrib via OpAMP: LinkMesh’s OpAMP server mints the token immediately after the collector’s first handshake and pushes it via OpAMP’s ConnectionSettings.OwnMetrics offer. The supervisor receives it and installs it on the collector’s own_metrics pipeline. Operator sees a registered then own_metrics_offered event on the collector’s Events tab.
  • Alloy via remotecfg: linkmesh-agent writes a bootstrap config.alloy at install time that contains the bearer + endpoints. Subsequent token rotations happen via the same bootstrap-rewrite path.

The bearer is opaque to the operator — there’s no UI surface to copy it around. To rotate, deregister + re-enroll the collector; this revokes the old token and mints a fresh one.

What gets reported

The collector’s otelcol_* self-metrics, mapped per OTel SDK convention:

WhatCollector metricLinkMesh field
Receiver acceptedotelcol_receiver_accepted_*recordsPerSec
Receiver refusedotelcol_receiver_refused_*errorsPerSec
Processor incomingotelcol_processor_incoming_itemsincomingPerSec
Processor outgoingotelcol_processor_outgoing_itemsrecordsPerSec
Exporter sentotelcol_exporter_sent_*recordsPerSec
Exporter send failedotelcol_exporter_send_failed_*errorsPerSec
Exporter queue sizeotelcol_exporter_queue_sizequeueSize
Exporter queue capacityotelcol_exporter_queue_capacityqueueCapacity

Plus host metrics on the collector detail page:

WhatCollector metricLinkMesh field
CPU utilisationsystem.cpu.utilizationhostCpuPercent
Memory utilisationsystem.memory.utilizationhostMemoryPercent
Process CPUprocess.cpu.utilizationcpuPercent
Process memoryprocess.memory.usagememoryMb
Uptimeprocess.uptimeuptimeSeconds

What does NOT get reported

The OTLP push is scoped to internal collector metrics only. None of the following ever leaves the collector via the LinkMesh OTLP endpoint:

  • Customer telemetry content (logs, metrics, traces flowing through the pipeline). That goes to the destinations you configured — Grafana Cloud, Loki, Mimir, whatever.
  • Collector application logs (Alloy’s stderr, otelcol-contrib’s stdout). Stays on the host’s journal / log file.
  • Collector configuration. LinkMesh already knows it — it generated it. The collector doesn’t re-emit it.
  • Any host-process data outside the collector’s own service.telemetry scope.

The push is exactly the same payload an operator would get by configuring a prometheus.scrape against the collector’s /metrics endpoint and forwarding to their own backend. LinkMesh just happens to be one such backend.

Standards mode vs Managed mode

Both modes use the same self-telemetry path — the difference is only in who configures it.

  • Standards mode (otelcol-contrib via OpAMP, or Alloy via remotecfg): LinkMesh’s enrollment flow configures self-telemetry out-of-band as part of bringing the collector online. Operator sees throughput numbers without touching the collector’s local config.
  • Managed mode (linkmesh-agent supervises a collector): same story — the agent’s bootstrap config wires self-telemetry on first start. Operator gets the same numbers; the wiring path is just agent-driven instead of OpAMP-driven.

Either way, the topology canvas renders identically. The mode affects how config arrives at the collector (see Native remote config), not how self-telemetry reports back.

Self-telemetry over time

ComponentThroughput records are kept for 24 hours with a TTL — enough to drive the topology canvas + the last-day chart on the collector detail page. Long-term retention is the customer’s responsibility through their own backends (the same backends the pipeline destinations write to). If you need 30-day historical throughput trends, point a Prometheus scrape at the collector’s /metrics endpoint and pair it with your existing observability stack.

See also