The agent — host-side control plane
linkmesh-agent is the small daemon that runs alongside an
OpenTelemetry collector on every host LinkMesh manages. Its job is
control-plane work — install the collector, supervise its process,
handle OS integration, report host context back to the server. It is
intentionally not part of the telemetry data path: your logs,
metrics, and traces flow through the collector directly to your
destinations and never touch the agent.
This page locks the agent’s role for operators evaluating where it fits in their stack. If you’re deciding between Managed mode (run the agent) and Standards mode (don’t), the What it does / does not do table is the right starting point.
When you want the agent — and when you don’t
| Scenario | Use the agent? |
|---|---|
| Linux/macOS host where you want LinkMesh to install + supervise the collector | Yes (Managed mode). Agent handles upstream-Alloy install, systemd unit, restart-on-failure. |
| Windows host with Alloy via MSI, agent supervises it as a Windows Service | Yes (Managed mode). Same as Linux but the agent talks to the Windows Service Manager instead of systemd. |
| Kubernetes — Alloy/otelcol DaemonSet managed by your existing K8s operator | No. Let the K8s operator own lifecycle; the collector enrols via OpAMP directly. |
| Air-gapped host with operator-managed packages | Optional. The agent installs from your internal apt/yum mirror (pass --mirror=<your-mirror> to linkmesh-agent install alloy), or you skip the agent entirely and run the collector standalone via OpAMP. |
| Collector already installed + supervised by Ansible/Puppet/Chef | No. Standards mode is cleaner — let your existing config-management own install + supervise; the collector enrols via OpAMP. |
The shorthand: the agent earns its place when LinkMesh is the easiest thing on a host to do install + lifecycle through. If you already have a tool that does that, the agent is duplicate machinery.
What the agent does
Five things, in priority order of “what you’d notice missing”:
- Installs the collector from upstream channels. Grafana Alloy
from
apt.grafana.com/rpm.grafana.comby default (or your own internal mirror via--mirror=<your-mirror>for air-gap); pinned to the version in LinkMesh’s compat matrix. Apt/yum/Helm/MSI per platform. - Writes the bootstrap config exactly once. A tiny
config.alloy(~30 lines) pointing the collector at LinkMesh’s remote-config endpoint + own-metrics endpoint. After that, the collector pulls its real pipeline config directly from LinkMesh — the agent doesn’t touch customer config files. - Supervises the collector process. Starts it via systemd / Windows Service / launchd; watches for crash loops; restarts on failure with backoff; logs the process state back to LinkMesh.
- Handles cross-OS lifecycle operations. Operator-triggered restart, upgrade (pinned-version bump), rollback, uninstall — each one knows about the platform’s package manager + service model under the hood.
- Reports host context back to LinkMesh. Heartbeat carries hostname, OS, agent version, collector status. The LinkMesh fleet UI uses this to render the host card you click on.
What the agent does NOT do
The agent is intentionally not:
- A proprietary OTel distribution. No custom forks of Alloy or otelcol-contrib; the agent installs upstream packages bit-for-bit.
- The config pusher for Alloy or OpAMP-supervised
otelcol-contrib. Alloy’s
remotecfgand otelcol-contrib’s OpAMP supervisor both fetch config server-to-collector directly. The agent’s only config write is the one-time bootstrap; everything after that goes around the agent. - A telemetry scraper. The agent does not scrape the collector’s
/metricsendpoint and forward the result over gRPC. The collector’s own self-telemetry push to LinkMesh handles that path now. - A data-plane component. Your customer logs/metrics/traces flow collector → destination. The agent never touches them. (If the agent process dies, your telemetry keeps flowing — only management actions stop until the agent restarts.)
How the pieces compose
A managed host runs three independent processes:
┌───────────────────────────────────────────────────────────────────┐│ Host (Linux/macOS/Windows) ││ ││ ┌──────────────────┐ ┌────────────────────────────────┐ ││ │ linkmesh-agent │ ──┐ │ Collector (Alloy / otelcol) │ ││ │ (control plane) │ │ │ (data plane) │ ││ └────────┬─────────┘ │ └────────────┬───────────────────┘ ││ │ │ │ ││ │ mTLS gRPC │ systemctl etc. │ remotecfg / OpAMP ││ │ │ │ + own_metrics OTLP │└───────────┼─────────────┼──────────────────┼──────────────────────┘ │ │ │ ▼ ▼ ▼ ┌────────────┐ ┌──────────────┐ ┌──────────────────┐ │ LinkMesh │ │ systemd / SC │ │ LinkMesh │ │ server │ │ service │ │ server │ │ (heartbeat,│ │ model │ │ (config + OTLP │ │ commands) │ │ │ │ ingest) │ └────────────┘ └──────────────┘ └──────────────────┘Two distinct channels touch LinkMesh:
- Agent → server (mTLS gRPC) carries heartbeats, status, and operator-triggered commands (restart, upgrade, rollback). This channel does NOT carry collector configuration anymore — that goes via the collector’s native channel.
- Collector → server (HTTPS, per-collector bearer) carries
config pulls (Alloy’s
remotecfg, otelcol-contrib’s OpAMP) AND self-telemetry OTLP pushes. One bearer token per collector covers both directions.
That separation matters: if the agent dies, the collector keeps running with its last-applied config and keeps reporting self-telemetry. The fleet UI marks the agent offline but the data-plane is unaffected.
Lifecycle states
Operators see three “are things working” axes on the collector detail page. They’re orthogonal — any combination is possible.
| Axis | States | Reported by |
|---|---|---|
| Agent | online / offline | mTLS gRPC heartbeat to LinkMesh server |
| Collector process | running / starting / crashed / stopped | Agent observes via systemd / SC |
| Collector data flow | healthy / unhealthy | Collector self-telemetry — receiver-accept-rate, exporter-success-rate |
A common scenario:
- Agent: online ✓
- Collector process: crashed ✗ (agent restarting it with backoff)
- Collector data flow: unhealthy ✗
The fleet UI shows all three so operators triage at the right layer instead of staring at a single red light.
Future scope
The agent’s role is intentionally minimal today. The architecture leaves room for these without changing the boundary:
- Log auto-discovery — scan the host for known log paths
(
/var/log/nginx/*.log,/var/log/postgresql/*.log) and suggest pipeline templates. Pure host-side observation; doesn’t touch the collector’s data path. - Service auto-detection — detect running services (Postgres, Redis, Nginx) and recommend matching receiver configs the operator can adopt with one click.
- Multi-collector orchestration — supervise more than one collector on a host (typical: one Alloy for metrics, one otelcol-contrib for traces). Each collector keeps its own remote- config + own-metrics path; the agent just multiplies the supervision pattern.
- Remote shell / debug actions — operator-triggered “tail the collector’s logs” or “run alloy config-print” without the operator needing SSH to the host.
All of these are control-plane work — none expand the agent into the data plane. The boundary holds.
See also
- Native remote config — how the collector pulls its config directly from LinkMesh (the half of the story the agent is not in).
- Self-telemetry — how the collector pushes its own internal metrics to LinkMesh for the topology canvas (also not the agent).
- Enrol a Standards collector — when you want to skip the agent entirely.