Enrollment troubleshooting
If a host doesn’t show up in your fleet after pasting the one-liner, work
through this list top-down. The script prints clear [FAIL] markers — match
the error against the section heading.
”LINKMESH_SERVER env var required” / “LINKMESH_TOKEN env var required”
You ran the script without the env vars. Use sudo -E (preserves env) and set
both:
curl -fsSL https://linkmesh.io/install.sh | \ LINKMESH_SERVER=https://your-server.example.com \ LINKMESH_TOKEN=eyJ... \ sudo -E shIf you used plain sudo sh (no -E), the env vars get stripped before the
script sees them.
”Enrollment failed” — token expired
Tokens default to 15 minutes. If the host took longer to install the package (slow network, big mirror), the token’s already gone.
Fix: mint a new token in the wizard and re-run. The script is idempotent — it won’t reinstall the package, just re-enrol.
”Enrollment failed” — server unreachable
The host can’t reach your LinkMesh server. Check:
curl -v https://your-server.example.com/api/v1/healthCommon causes:
- Outbound HTTPS blocked by host firewall
- Internal DNS doesn’t resolve the server hostname from this host
- Server is behind a VPN the host isn’t on
For air-gapped fleets, use the server-hosted install variant documented at linkmesh.io/install instead — agent host only needs to reach the LinkMesh server, not the public internet.
”Enrollment failed” — invalid token (410 Gone)
Token was already used. Each token enrols exactly one collector. Mint a fresh one for the next host.
If you didn’t intentionally use it twice, check the audit log — someone or something has already redeemed this token.
”linkmesh-agent restarted” but collector doesn’t appear in the UI
Wait 30 seconds. The agent’s first heartbeat to the server takes up to a heartbeat interval to land. If it’s still missing after 60s:
journalctl -u linkmesh-agent -fCommon failure modes the journal will reveal:
- Clock skew — host clock is more than ~5 minutes off from server. mTLS
cert validation rejects the connection. Fix with
timedatectl set-ntp true. - gRPC port 50051 blocked — agent uses gRPC on
:50051to talk to the server. Checknc -vz your-server.example.com 50051from the host. - TLS cert mismatch — the journal shows
x509: certificate is valid for …, not <addr>. The server’s gRPC cert doesn’t cover the address the agent dials. See “tls: failed to verify certificate” just below for the fix.
”tls: failed to verify certificate” / “certificate is valid for … not …”
The agent’s journal repeats a line like:
Stream connect failed: rpc error: ... tls: failed to verify certificate:x509: certificate is valid for 10.0.1.3, 127.0.0.1, ::1, not 35.230.148.135The agent reached the server and trusts its CA, but the server’s gRPC
certificate has no SAN
for the address the agent connects to (35.230.148.135 above). The agent does
strict mTLS and has no skip-verify option by design — the fix is always on
the server, and takes ~30 seconds.
Why it happens: install auto-detects SANs from the server’s hostname and its local network-interface IPs. On a cloud VM the external IP is a 1:1 NAT that never appears on the VM’s interface, so it can’t be auto-detected — you reach the box on an address its own cert doesn’t know about.
Fix — on the server, add the address agents use as a SAN. Cert subcommands
need exclusive database access, so stop the service first, and run as the
linkmesh user so the service can read the new key:
sudo systemctl stop linkmesh-server# a DNS name (preferred — survives IP changes):sudo runuser -u linkmesh -- linkmesh-server cert init --san dns:linkmesh.example.com# …or a bare IP, matching the "not <addr>" in the error:sudo runuser -u linkmesh -- linkmesh-server cert init --san ip:35.230.148.135sudo systemctl start linkmesh-servercert init keeps the auto-detected local SANs and merges in the ones you pass.
Confirm with linkmesh-server cert show, then watch the agent reconnect on its
own backoff — no action needed on the agent host.
Throwaway lab and can’t touch the server right now? The agent has an escape
hatch — certificates.insecureSkipVerify: true in
/etc/linkmesh-agent/config.yaml — that connects without verifying the server
cert. It disables MITM protection on the control plane and the agent will nag
with a WARN on every connect, so treat it as dev-only and never ship it. The
fix above is the right one for anything real; details under the agent
config.yaml entry in the configuration reference.
”Two operators using the same token”
The first one wins. The second gets 410 Gone. The second operator should
mint their own token.
If you’re scripting bulk enrolment via Ansible/Terraform, mint one token per host inside the loop — don’t share.
Still stuck?
- Check the server’s audit log under Settings → Audit Log for the bootstrap attempt — server-side error messages are richer than what the script can surface to a piped curl.
- Open the relevant journal:
journalctl -u alloy -f(or-u otelcol-contrib) for the collector runtime,journalctl -u linkmesh-agent -ffor the management client.