Operations & Access
How to check the system's health, get access, and recover from common failures. Adapted from the operational runbook — verify specifics against current infrastructure.
Quick health checks
The fastest ways to tell if the platform is healthy:
- Status dashboard —
https://status.macaronikid.com/reports the three layers (front end, API, admin) at a glance. - Public pages — load
national.macaronikid.com(home) andnational.macaronikid.com/articles(slightly more DB-dependent); slow loads hint at database strain. - Direct API calls — these should return structured data instantly (reload a few times since the API is load-balanced):
api.macaronikid.com/api/v1/town/data/nationalapi.macaronikid.com/api/v1/towns/locations
- Sentry — application errors and performance regressions surface here.
- Linode & CloudFlare dashboards — per-server CPU/network graphs; CloudFlare for traffic/DDoS signals.
Getting SSH access
Developer access is granted by adding the developer's public key to each server. In brief:
- Generate a key pair (
ssh-keygenon Linux/macOS; PuTTYgen on Windows). - From the Linode Cloud terminal, sign into each server and append the public key to
~/.ssh/authorized_keys. - Access is needed per-server for SSH, SFTP, and Mongo administration.
https://cloud.linode.com/ — the console for all servers (API nodes, Mongo replica set, web). Individual servers expose CPU/network graphs used for the checks below.
Recovery runbook
API instance unresponsive
If performance degrades (often most visible in the admin panel), an API instance may have crashed without recovering. The production API nodes are api-1, api-2, and api-4 (PM2, behind a NodeBalancer). SSH into each and use PM2:
pm2 monit # inspect per-instance CPU/memory/status
pm2 restart API # restart the cluster on that node
Space restarts across nodes by a few minutes so the NodeBalancer can keep serving. After a restart you may briefly see Mongo connection errors in the console — these should stop shortly; a steady stream of green lines means recovery. Re-check the API URLs above afterward.
api-3 is managed by Coolify, not PM2, and runs the API worker — don't expect pm2 to control it. Manage it through Coolify instead. See Environments & Deployment.
Database overload
Check the Mongo servers' CPU graphs in the Linode dashboard (mongo-1 / mongo-2 / mongo-3). Healthy CPU is roughly 15% or below; sustained higher load may warrant a restart. mongo-2 has historically run hot and often acts as primary.
A Mongo node has crashed
If a node's CPU graph flatlines or truncates (a crash; Linode emails on a hard crash, not on a hang):
- Do not restart Mongo directly — reboot the server (
shutdown -r now). The node should rejoin as SECONDARY. - At a stable moment, force the PRIMARY back to
mongo-1(the backed-up node). - API instances obtain their DB connection at startup and won't reconnect automatically if a Mongo node drops — so after restoring Mongo,
pm2 restart APIon each API node.
Deployment notes
- API — PM2 deploy pulls
origin/masterand restarts the cluster. SSH into each API node one at a time when updating manually. - web2 — image build via GitHub Actions; the legacy
git-deploy.phpwebhook ("Git Deployment Hamster") pulls code on push and has a deployment-status URL to confirm success. - admin panel — image build via GitHub Actions.