Deploy LinkMesh server with MongoDB

LinkMesh server ships with an embedded BoltDB store by default. That’s fine for single-host deployments and dev. For HA (multiple server instances behind a load balancer), you need MongoDB — and you need it as a replica set, not a standalone.

This page walks you from “I have a Mongo cluster” to “the server’s talking to it correctly”, and explains the replica-set requirement so you can size the deployment right.

Deciding whether you need MongoDB at all? Start with Storage backends. For the full multi-instance picture this unlocks, see Run a highly-available deployment.

Why replica set, not standalone?

LinkMesh server uses Mongo transactions for a handful of multi-document write paths — token rotation, collector deregistration cascades, audit-log atomicity, and similar. Mongo transactions require a replica set; a single-node Mongo deployment rejects WithTransaction calls with code 20 (IllegalOperation).

The server gracefully degrades to sequential writes on a standalone Mongo for dev convenience — you’ll see a warning logged once at startup:

WARN RunInTransaction: standalone Mongo detected — falling back to
     sequential writes (no atomicity). Production deploys MUST use a
     replica set.

That warning is your signal. If you see it in a production log, the deployment isn’t actually HA-correct. Token-rotation crashes can leave a collector with no valid bearer; failed audit writes can leave security events with no trail. Fix it by reconfiguring to a replica set.

Prerequisites

A current LinkMesh server release (the MongoDB storage backend is built in)
A MongoDB replica set with at least 3 members for production, or a single-node replica set for dev (the --replSet flag + initial rs.initiate() is enough for a single member)
Network reachability from every LinkMesh server instance to every Mongo replica-set member (mongodb+srv:// discovers replicas automatically; the plain mongodb:// scheme needs all members in the URI)
Mongo 6.0+ recommended; older versions work but lack some of the aggregation operators the ComponentThroughput store uses

Sizing

The LinkMesh data set is modest and bounded by TTLs — the componentthroughputs collection (self-telemetry, 24h TTL) is the workhorse, and the WiredTiger cache is the main knob to get right. The collection-by-collection volume estimates, working-set guidance, and cache settings live in one place:

➡️ System requirements & sizing → External MongoDB sizing

Configure the LinkMesh server

The server reads its backend choice from storage.backend + database.uri (or the parts database.server/user/password/database).

Set storage.backend: mongodb in config.yaml:

storage:
  backend: mongodb
  # auditLogRetentionDays: 365  # default; raise for stricter compliance

The boltPath field is ignored when backend is mongodb.

Provide the connection string. Either set database.uri directly (recommended for mongodb+srv:// URIs):
```
database:
  uri: "mongodb+srv://linkmesh-app:CHANGEME@cluster.example.mongodb.net/linkmesh?retryWrites=true&w=majority"
```
Or assemble from parts (the server builds the URI; useful when you inject secrets via env vars and want the database name out of the secret):
```
database:
  server: cluster.example.mongodb.net
  user: linkmesh-app
  password: ${MONGO_PASSWORD}   # interpolated from env
  database: linkmesh
```
The server defaults the database name to signalflow (the legacy name from before the rebrand). Set database: linkmesh explicitly on a fresh install; existing installs that already have a signalflow database keep working.
Restart the server. Watch the startup log for:
```
INFO Connected to MongoDB database=linkmesh
INFO MongoDB indexes ensured
```
These two lines mean the connection is alive and the ~30 indexes the data layer expects have been created.
Verify the transaction substrate. If the log shows:
```
WARN RunInTransaction: standalone Mongo detected — falling back ...
```
you’ve connected to a standalone, not a replica set. Reconfigure Mongo to be a single-member replica set (mongod --replSet rs0 + mongosh --eval "rs.initiate()") for dev, or expand to a 3-member set for production.

Switching from BoltDB

Today there’s no in-place migration tool — export/import is planned but not yet shipped. For greenfield deploys, just start fresh with Mongo from the beginning.

If you’ve accumulated dev state on Bolt and want to move it: re-create your collectors via the enrollment flow against the new Mongo-backed server. Sources/destinations/pipelines live in your gitops repo, so they come along for the ride when the new server clones it.

Operating notes

Backups

Mongo’s native dump/restore (mongodump, mongorestore) is the canonical path. Schedule against the secondary to avoid impacting the primary. The TTL’d collections (componentthroughputs, collectorevents, auditlogs) account for most of the dump size; restore time scales with how much TTL backfill you keep.

Monitoring

The server emits its own metrics via own_metrics OTLP push to itself. Mongo-side metrics come from the Mongo cluster directly — Atlas operators get this in the UI; self-hosted operators typically scrape mongodb_exporter.

The RunInTransaction: standalone Mongo detected warning is your single-most-important alert — wire it into your log monitoring so a mis-configured deploy doesn’t silently degrade correctness.

Failover

LinkMesh server is stateless from the per-request perspective; failover is “kill the pod, the load balancer routes to another instance”. The Mongo replica-set primary failover is transparent to the server — the Mongo driver re-discovers the new primary within a few seconds.

In-flight writes during a failover are subject to Mongo’s write-concern + retryable-writes semantics. The default w: majority with retryable writes gives you “the write committed, or you got an error and can retry safely” — appropriate for this workload.

Multi-instance correctness

When you run more than one server instance, concurrent writes stay correct:

Multi-document write paths (token rotation, collector deregister cascade, OpAMP re-registration, OpAMP first-time enrollment) are wrapped in Mongo transactions, so they commit atomically.
Read-modify-write paths (user updates, routing-canvas saves) use optimistic-concurrency version fields and return 409 Conflict when two instances race the same record.

One known edge case: the OpAMP enrollment cap-check has a small multi-instance race window — two instances can each admit a collector against the last available slot, leaving the fleet one over its cap. Single-instance deployments never hit it, and at the margin it’s rare; a hardening fix is planned.

What’s next

Configuration reference — every config field documented
Self-telemetry — what the server pushes about itself
Run a highly-available deployment — the multi-instance topology this MongoDB backend unlocks. This how-to is the lower-level “I’m bringing my own Mongo” path; turn-key Kubernetes manifests and a Helm chart for a fully-tuned HA topology are planned.