The agent — host-side control plane

linkmesh-agent is the small daemon that runs alongside an OpenTelemetry collector on every host LinkMesh manages. Its job is control-plane work — install the collector, supervise its process, handle OS integration, report host context back to the server. It is intentionally not part of the telemetry data path: your logs, metrics, and traces flow through the collector directly to your destinations and never touch the agent.

This page locks the agent’s role for operators evaluating where it fits in their stack. If you’re deciding between Managed mode (run the agent) and Standards mode (don’t), the What it does / does not do table is the right starting point.

When you want the agent — and when you don’t

Scenario	Use the agent?
Linux/macOS host where you want LinkMesh to install + supervise the collector	Yes (Managed mode). Agent handles upstream-Alloy install, systemd unit, restart-on-failure.
Windows host with Alloy via MSI, agent supervises it as a Windows Service	Yes (Managed mode). Same as Linux but the agent talks to the Windows Service Manager instead of systemd.
Kubernetes — Alloy/otelcol DaemonSet managed by your existing K8s operator	No. Let the K8s operator own lifecycle; the collector enrols via OpAMP directly.
Air-gapped host with operator-managed packages	Optional. The agent installs from your internal apt/yum mirror (pass `--mirror=<your-mirror>` to `linkmesh-agent install alloy`), or you skip the agent entirely and run the collector standalone via OpAMP.
Collector already installed + supervised by Ansible/Puppet/Chef	No. Standards mode is cleaner — let your existing config-management own install + supervise; the collector enrols via OpAMP.

The shorthand: the agent earns its place when LinkMesh is the easiest thing on a host to do install + lifecycle through. If you already have a tool that does that, the agent is duplicate machinery.

What the agent does

Five things, in priority order of “what you’d notice missing”:

Installs the collector from upstream channels. Grafana Alloy from apt.grafana.com / rpm.grafana.com by default (or your own internal mirror via --mirror=<your-mirror> for air-gap); pinned to the version in LinkMesh’s compat matrix. Apt/yum/Helm/MSI per platform.
Writes the bootstrap config exactly once. A tiny config.alloy (~30 lines) pointing the collector at LinkMesh’s remote-config endpoint + own-metrics endpoint. After that, the collector pulls its real pipeline config directly from LinkMesh — the agent doesn’t touch customer config files.
Supervises the collector process. Starts it via systemd / Windows Service / launchd; watches for crash loops; restarts on failure with backoff; logs the process state back to LinkMesh.
Handles cross-OS lifecycle operations. Operator-triggered restart, upgrade (pinned-version bump), rollback, uninstall — each one knows about the platform’s package manager + service model under the hood.
Reports host context back to LinkMesh. Heartbeat carries hostname, OS, agent version, collector status. The LinkMesh fleet UI uses this to render the host card you click on.

What the agent does NOT do

The agent is intentionally not:

A proprietary OTel distribution. No custom forks of Alloy or otelcol-contrib; the agent installs upstream packages bit-for-bit.
The config pusher for Alloy or OpAMP-supervised otelcol-contrib. Alloy’s remotecfg and otelcol-contrib’s OpAMP supervisor both fetch config server-to-collector directly. The agent’s only config write is the one-time bootstrap; everything after that goes around the agent.
A telemetry scraper. The agent does not scrape the collector’s /metrics endpoint and forward the result over gRPC. The collector’s own self-telemetry push to LinkMesh handles that path now.
A data-plane component. Your customer logs/metrics/traces flow collector → destination. The agent never touches them. (If the agent process dies, your telemetry keeps flowing — only management actions stop until the agent restarts.)

How the pieces compose

A managed host runs three independent processes:

┌───────────────────────────────────────────────────────────────────┐
│  Host (Linux/macOS/Windows)                                       │
│                                                                   │
│  ┌──────────────────┐         ┌────────────────────────────────┐  │
│  │ linkmesh-agent   │ ──┐     │ Collector (Alloy / otelcol)    │  │
│  │ (control plane)  │   │     │ (data plane)                   │  │
│  └────────┬─────────┘   │     └────────────┬───────────────────┘  │
│           │             │                  │                      │
│           │ mTLS gRPC   │ systemctl etc.   │ remotecfg / OpAMP    │
│           │             │                  │  + own_metrics OTLP  │
└───────────┼─────────────┼──────────────────┼──────────────────────┘
            │             │                  │
            ▼             ▼                  ▼
      ┌────────────┐  ┌──────────────┐  ┌──────────────────┐
      │  LinkMesh  │  │ systemd / SC │  │   LinkMesh       │
      │  server    │  │   service    │  │   server         │
      │ (heartbeat,│  │   model      │  │ (config + OTLP   │
      │  commands) │  │              │  │  ingest)         │
      └────────────┘  └──────────────┘  └──────────────────┘

Two distinct channels touch LinkMesh:

Agent → server (mTLS gRPC) carries heartbeats, status, and operator-triggered commands (restart, upgrade, rollback). This channel does NOT carry collector configuration anymore — that goes via the collector’s native channel.
Collector → server (HTTPS, per-collector bearer) carries config pulls (Alloy’s remotecfg, otelcol-contrib’s OpAMP) AND self-telemetry OTLP pushes. One bearer token per collector covers both directions.

That separation matters: if the agent dies, the collector keeps running with its last-applied config and keeps reporting self-telemetry. The fleet UI marks the agent offline but the data-plane is unaffected.

Lifecycle states

Operators see three “are things working” axes on the collector detail page. They’re orthogonal — any combination is possible.

Axis	States	Reported by
Agent	online / offline	mTLS gRPC heartbeat to LinkMesh server
Collector process	running / starting / crashed / stopped	Agent observes via systemd / SC
Collector data flow	healthy / unhealthy	Collector self-telemetry — receiver-accept-rate, exporter-success-rate

A common scenario:

Agent: online ✓
Collector process: crashed ✗ (agent restarting it with backoff)
Collector data flow: unhealthy ✗

The fleet UI shows all three so operators triage at the right layer instead of staring at a single red light.

Future scope

The agent’s role is intentionally minimal today. The architecture leaves room for these without changing the boundary:

Log auto-discovery — scan the host for known log paths (/var/log/nginx/*.log, /var/log/postgresql/*.log) and suggest pipeline templates. Pure host-side observation; doesn’t touch the collector’s data path.
Service auto-detection — detect running services (Postgres, Redis, Nginx) and recommend matching receiver configs the operator can adopt with one click.
Multi-collector orchestration — supervise more than one collector on a host (typical: one Alloy for metrics, one otelcol-contrib for traces). Each collector keeps its own remote- config + own-metrics path; the agent just multiplies the supervision pattern.
Remote shell / debug actions — operator-triggered “tail the collector’s logs” or “run alloy config-print” without the operator needing SSH to the host.

All of these are control-plane work — none expand the agent into the data plane. The boundary holds.