Self-telemetry troubleshooting
LinkMesh renders per-component throughput by reading the collector’s
own self-telemetry — Alloy’s prometheus.exporter.self,
otelcol-contrib’s otelcol_* metrics. If the topology canvas shows the
collector but the edges are blank, one link in that chain isn’t
delivering. Work through this list top-down; each section diagnoses
one specific failure mode.
Symptom: topology canvas shows the collector but no throughput numbers
The most common cause is that the collector hasn’t started pushing its own_metrics yet, or those pushes aren’t reaching the LinkMesh OTLP receiver.
Check 1: did the collector receive an own_metrics offer?
For otelcol-contrib via OpAMP, the LinkMesh server pushes a one-time
own_metrics offer immediately after enrollment. The collector’s
Events tab should show an own_metrics_offered event within ~5
seconds of registered.
If the event is missing, the collector enrolled before LinkMesh server v1.101 (the release that introduced own_metrics offers). Re-enrol the collector to trigger a fresh offer:
- Stop the collector + agent on the host
- In the LinkMesh UI: Collectors → … → Deregister
- Mint a fresh token + re-run the install one-liner
For Alloy via remotecfg, there is no separate “offer” event — the
bootstrap config that linkmesh-agent wrote already includes the
own_metrics exporter. Skip to Check 2.
Check 2: is the collector actually pushing OTLP?
On the host, watch the collector’s logs for OTLP send activity:
# Alloysudo journalctl -u alloy -f | grep -i 'otelcol.exporter.otlphttp'
# otelcol-contribsudo journalctl -u otelcol-contrib -f | grep -i 'metrics exporter'Expected: periodic lines about successful exports every ~30 seconds.
If you see no OTLP activity at all, the bootstrap config didn’t land correctly. Check the file exists and looks right:
# Alloysudo cat /etc/alloy/config.alloy | head -5# the first line should start with: // linkmesh-bootstrap
# otelcol-contrib (OpAMP-managed): config lives under the supervisor's# remote-config directory rather than a static filesudo find /var/lib/otelcol-contrib -name 'effective_config*'Check 3: are OTLP pushes reaching the server?
If the collector logs OTLP activity but throughput still doesn’t show, something between the collector and LinkMesh is dropping the pushes. Check the LinkMesh server-side logs:
# On the LinkMesh server (k8s)kubectl -n linkmesh logs deploy/linkmesh-server -f | grep -i 'otlp.dispatch\|otlp.auth'
# On a single-host installjournalctl -u linkmesh-server -f | grep -i 'otlp.dispatch\|otlp.auth'Three failure modes to look for:
-
otlp.auth: rejected reason=missing_or_malformed_bearer— the collector is reaching the OTLP receiver but not sending the Authorization header. The bootstrap config or the OpAMP own_metrics offer didn’t include the bearer. See Check 1. -
otlp.auth: rejected reason=token_unknown_or_revoked_or_expired— the bearer the collector is sending no longer Verifies. Most common cause: you deregistered the collector then it came back online. Re-enrol to mint a fresh token. -
otlp.dispatch: unknown collectorId— the bearer Verifies but the CollectorID it resolves to has been deregistered from the fleet. Re-enrol to recreate the Collector record.
If you see none of these but throughput is still missing, the collector probably can’t reach the LinkMesh OTLP endpoint at all (firewall, DNS, TLS). Test from the host:
curl -v -X POST -H 'Authorization: Bearer test' \ -H 'Content-Type: application/x-protobuf' \ https://your-server.example.com/v1/metrics# Expected: HTTP 401 (proves the endpoint is reachable + auth is wired)Check 4: is per-edge throughput working but only on some edges?
Some edges show numbers, others don’t. This means the OTLP path works but specific components aren’t being attributed correctly. Most common cause: the component naming in the collector’s emitted metrics doesn’t match what the topology lookup expects.
Open the collector’s Events tab in the LinkMesh UI and look for
component_naming_mismatch warnings. The fix depends on what the
warning says:
-
“unknown component” — the metric arrived but the component name doesn’t match any source/destination activation. Usually means the collector is running a config the LinkMesh UI doesn’t know about (manual edit, or an older config still active because reload didn’t fire). Restart the collector.
-
“orphan throughput” — a row exists in ComponentThroughput but the topology lookup can’t find a matching edge. Usually means a source/destination was deleted from the UI while the collector was still pushing for it. Self-resolves within 24h via TTL.
Symptom: host metrics show but per-edge throughput doesn’t
Host metrics (CPU, memory, uptime) come from a different path than
per-edge throughput — system.cpu.utilization etc. land via the same
OTLP push but get mapped onto the Collector record, not
ComponentThroughput. If host metrics work but edges don’t:
- OTLP receiver auth + transport are fine (host metrics arrived).
- The component-metric mapping is the issue.
Most common cause: the collector’s pipeline isn’t actually processing data yet. Per-edge throughput needs two consecutive samples 30s apart to compute a rate — a fresh enrollment with no data flowing will show nothing until data arrives.
Push some test data through the collector’s pipeline and wait 60s.
Symptom: throughput shows but the numbers look wrong
”Records/sec spikes to millions right after collector restart”
You’re looking at the lifetime cumulative counter instead of a rate. The first sample after restart can’t compute a rate (no prior sample), so it gets skipped — but if your dashboard is reading raw ComponentThroughput records directly (not the rendered topology), the counter values are in there. Wait 30 seconds for the second sample; the rate will be sensible.
”Numbers are zero but I see data flowing in Grafana Cloud”
Your collector is forwarding data correctly but not pushing self-telemetry. Two possibilities:
- Alloy’s
prometheus.exporter.selfis disabled or returning empty — check the bootstrap config has theprometheus.exporter.selfandprometheus.scrapeblocks intact. - otelcol-contrib’s
service.telemetry.metricsis configured to send to a different endpoint (often a customer Prometheus). Either reconfigure to push to LinkMesh as well, or accept that LinkMesh won’t show per-edge numbers for this collector.
When all else fails
Capture the collector’s effective config + recent logs and file a support ticket — include the collector’s CollectorID, the timestamp range, and which check above you got stuck on:
# Alloysudo cat /etc/alloy/config.alloy > /tmp/effective-config.txtsudo journalctl -u alloy --since '15 minutes ago' > /tmp/collector-logs.txt
# otelcol-contribsudo cat /var/lib/otelcol-contrib/effective_config.yaml > /tmp/effective-config.txtsudo journalctl -u otelcol-contrib --since '15 minutes ago' > /tmp/collector-logs.txt