Skip to content

Run a highly-available deployment

By default the LinkMesh server runs as a single instance. That’s simple and fine for many deployments — and even if the server is down, your collectors keep running and your telemetry keeps flowing, because telemetry never travels through the server. What you lose while a single instance is down is the control plane: the UI, the API, config push, and fleet status.

A highly-available (HA) deployment runs several server instances so that losing one node doesn’t take the control plane with it.

The topology

flowchart TB
    LB[Load balancer]
    subgraph Servers["LinkMesh server instances (stateless)"]
        S1[Instance 1]
        S2[Instance 2]
        S3[Instance 3]
    end
    Mongo[(MongoDB replica set<br/>shared fleet state)]
    Git[(Git-backed config store<br/>shared configuration)]

    LB --> S1 & S2 & S3
    S1 & S2 & S3 --> Mongo
    S1 & S2 & S3 --> Git

Four things make it work:

  1. Several server instances, each identical and stateless — any instance can serve any request.
  2. A load balancer in front of them, spreading traffic and routing around an instance that goes down. It fronts both the web/API port and the gRPC endpoint collectors connect to.
  3. A shared MongoDB database, run as a replica set, holding the fleet state every instance reads and writes.
  4. A shared Git-backed config store, so every instance serves the same pipelines, sources, destinations, and routes.

What HA requires

  1. Switch to the MongoDB storage backend. BoltDB is a local file and can’t be shared between instances, so it can’t back an HA deployment — MongoDB is mandatory. See Storage backends for the comparison and Deploy with MongoDB for setup.

  2. Use a MongoDB replica set. The server relies on database transactions for correctness across instances; MongoDB only offers transactions on a replica set. Three members is the usual production minimum.

  3. Point every instance at the same database and the same config store. All instances must share one MongoDB connection target and one Git-backed configuration source, or they’ll serve divergent views of the fleet.

  4. Put a load balancer in front. Plain HTTP load balancing works for the web UI and REST API. The collector-facing gRPC endpoint also sits behind it — see Run behind a reverse proxy for the proxy and TLS specifics.

Failover behaviour

The server is stateless per request, so failover is simply “the load balancer stops routing to a dead instance and uses the others.” There’s no leader to promote and no session to migrate.

On the database side, MongoDB replica-set primary failover is transparent to the server — the driver discovers the new primary within a few seconds and reconnects. In-flight writes during a failover follow MongoDB’s write-concern and retryable-write semantics; the default w: majority with retryable writes gives you “the write committed, or you got an error you can safely retry.”

Multi-instance correctness

Running more than one instance means two of them can act on the same data at the same time. The server is built for that:

  • Multi-document writes — token rotation, collector deregistration cascades, enrollment, and similar — are wrapped in database transactions, so they commit atomically or not at all.
  • Read-modify-write paths — editing a user, saving the routing canvas — use optimistic concurrency. If two instances race the same record, one wins and the other gets a 409 Conflict to retry, rather than silently overwriting.

What’s turn-key, and what isn’t

Today, HA is a bring-your-own-infrastructure setup: you provide the load balancer, the MongoDB replica set, and the shared config store, and wire the instances to them using this guide plus Deploy with MongoDB.

Turn-key Kubernetes manifests and a Helm chart for a fully-tuned HA topology are planned but not yet shipped. Until then, the lower-level path on this page is the supported way to run HA.