Run a highly-available deployment
By default the LinkMesh server runs as a single instance. That’s simple and fine for many deployments — and even if the server is down, your collectors keep running and your telemetry keeps flowing, because telemetry never travels through the server. What you lose while a single instance is down is the control plane: the UI, the API, config push, and fleet status.
A highly-available (HA) deployment runs several server instances so that losing one node doesn’t take the control plane with it.
The topology
flowchart TB
LB[Load balancer]
subgraph Servers["LinkMesh server instances (stateless)"]
S1[Instance 1]
S2[Instance 2]
S3[Instance 3]
end
Mongo[(MongoDB replica set<br/>shared fleet state)]
Git[(Git-backed config store<br/>shared configuration)]
LB --> S1 & S2 & S3
S1 & S2 & S3 --> Mongo
S1 & S2 & S3 --> Git
Four things make it work:
- Several server instances, each identical and stateless — any instance can serve any request.
- A load balancer in front of them, spreading traffic and routing around an instance that goes down. It fronts both the web/API port and the gRPC endpoint collectors connect to.
- A shared MongoDB database, run as a replica set, holding the fleet state every instance reads and writes.
- A shared Git-backed config store, so every instance serves the same pipelines, sources, destinations, and routes.
What HA requires
-
Switch to the MongoDB storage backend. BoltDB is a local file and can’t be shared between instances, so it can’t back an HA deployment — MongoDB is mandatory. See Storage backends for the comparison and Deploy with MongoDB for setup.
-
Use a MongoDB replica set. The server relies on database transactions for correctness across instances; MongoDB only offers transactions on a replica set. Three members is the usual production minimum.
-
Point every instance at the same database and the same config store. All instances must share one MongoDB connection target and one Git-backed configuration source, or they’ll serve divergent views of the fleet.
-
Put a load balancer in front. Plain HTTP load balancing works for the web UI and REST API. The collector-facing gRPC endpoint also sits behind it — see Run behind a reverse proxy for the proxy and TLS specifics.
Failover behaviour
The server is stateless per request, so failover is simply “the load balancer stops routing to a dead instance and uses the others.” There’s no leader to promote and no session to migrate.
On the database side, MongoDB replica-set primary failover is
transparent to the server — the driver discovers the new primary within
a few seconds and reconnects. In-flight writes during a failover follow
MongoDB’s write-concern and retryable-write semantics; the default
w: majority with retryable writes gives you “the write committed, or
you got an error you can safely retry.”
Multi-instance correctness
Running more than one instance means two of them can act on the same data at the same time. The server is built for that:
- Multi-document writes — token rotation, collector deregistration cascades, enrollment, and similar — are wrapped in database transactions, so they commit atomically or not at all.
- Read-modify-write paths — editing a user, saving the routing
canvas — use optimistic concurrency. If two instances race the same
record, one wins and the other gets a
409 Conflictto retry, rather than silently overwriting.
What’s turn-key, and what isn’t
Today, HA is a bring-your-own-infrastructure setup: you provide the load balancer, the MongoDB replica set, and the shared config store, and wire the instances to them using this guide plus Deploy with MongoDB.
Turn-key Kubernetes manifests and a Helm chart for a fully-tuned HA topology are planned but not yet shipped. Until then, the lower-level path on this page is the supported way to run HA.
Related
- System requirements & sizing — how many instances, what to size each one, and the internal-latency considerations for the server-to-MongoDB link
- Storage backends — why HA needs MongoDB rather than the default BoltDB
- Deploy with MongoDB — replica-set setup, sizing, and backups
- Run behind a reverse proxy — fronting the web and gRPC endpoints
- Configuration reference — the
storage.*anddatabase.*keys each instance needs