OpenReplay on Hetzner
Self-hosted OpenReplay running on a single Hetzner Cloud VM. The VM userData runs the documented openreplay -i $DOMAIN installer (which sets up K3s with embedded containerd, plus helm/templater/kubectl as standalone binaries — no Docker on the host).
OpenReplay on Hetzner
Self-hosted OpenReplay
running on a single Hetzner Cloud VM. The VM userData runs the documented
openreplay -i $DOMAIN installer (which sets up K3s with embedded containerd,
plus helm/templater/kubectl as standalone binaries — no Docker on the host).
Architecture
Two-domain split:
openreplay.studyflash.dev— admin dashboard. CF-proxied A record at the VM's public IP. Inherits the existing*.studyflash.devCloudflare Access wildcard, so the dashboard requires SSO.or.studyflash.com— tracker ingest endpoint. Bound viaWorkersCustomDomainto a Cloudflare Worker (openreplay-ingest) whose source (scripts/ingest-worker.js) reverse-proxies every path tohttps://openreplay.studyflash.dev/<same-path>. A path-scoped CF Access bypass onopenreplay.studyflash.dev/ingest/*lets the Worker's tracker payloads through without an Access challenge. Neutral host so ad-blocker rules pattern-matchingopenreplay.*don't break the tracker.
TLS:
- User TLS terminates at the Cloudflare edge with CF's Universal SSL cert
(browsers see
studyflash.dev). - CF→origin uses Full (Strict) mode and validates the Cloudflare Origin CA cert the VM presents (15-year validity).
- Origin CA cert + key are issued out-of-band (see TLS section below) and
stored base64-encoded in Infisical. Pulumi reads them at
pulumi uptime, decodes, marks as secret, and bakes the PEMs into the VM's userData (encrypted in Pulumi state under GCP KMS). bootstrap.shwrites them into theopenreplay-sslSecret in theappnamespace beforeopenreplay -iruns, so the OpenReplay Ingress finds the cert immediately on first install. No cert-manager, no Let's Encrypt, no DNS-01.
Network lockdown:
- Hetzner firewall: 22 (SSH) and ICMP open to the world; 443 locked to
Cloudflare's published IP ranges (
cloudflare.getIpRangesOutput()); port 80 closed entirely. Origin is invisible to direct IP probes from non-CF IPs.
Tracker init in client apps:
new OpenReplay({
projectKey: "...",
ingestPoint: "https://or.studyflash.com/ingest",
});
Stack config (Pulumi.<stack>.yaml)
| Key | Default | Notes |
|---|---|---|
studyflash-openreplay:domain | openreplay.studyflash.dev | Dashboard hostname. CF-proxied A record, Origin CA cert on the VM. |
studyflash-openreplay:ingestDomain | or.studyflash.com | Tracker ingest hostname. Bound to the openreplay-ingest Worker via WorkersCustomDomain on the studyflash.com zone. |
studyflash-openreplay:serverType | ccx23 | 4 vCPU dedicated / 16 GB / 160 GB. Bundled disk is below the 240 GB hard minimum, so a Volume is attached (see below). |
studyflash-openreplay:location | nbg1 | Hetzner DC. Volume is created in the same location. |
studyflash-openreplay:dataVolumeSize | 100 | GB. Attached as ext4 and bind-mounted at /var/lib/rancher + /var/lib/openreplay so K3s PVCs (replays, MinIO, ClickHouse) land on the volume. 160 + 100 = 260 GB total, comfortably above 240. |
Secrets / Infisical
Read from Infisical at /infra/openreplay/:
HCLOUD_TOKEN— Hetzner Cloud API tokenCLOUDFLARE_API_TOKEN— Cloudflare API token (DNS, Workers, Access apps, Rulesets, Worker custom domains, all on the studyflash account)PULUMI_BACKEND_URL— R2 backend URL (s3-compatible)AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY— R2 credentials for the Pulumi backendOPENREPLAY_ORIGIN_CERT_B64— base64-encoded Origin CA cert PEMOPENREPLAY_ORIGIN_KEY_B64— base64-encoded private key PEM matching the cert
GCP credentials for the KMS secrets provider come from the operator's local
gcloud auth application-default login — not Infisical. Pulumi state lives on
the shared R2 backend; secrets are encrypted with the same pulumi-state GCP
KMS key used by the other infra stacks.
Volume protection
The data volume has both protect: true (Pulumi-side) and deleteProtection: true (Hetzner API). Replacing only the Server (pulumi up --replace <server-urn>) is fine — new VM mounts the volume, K3s resumes from the
existing data dir. Replacing the Volume requires lifting both protections
manually first; doing so destroys all OpenReplay data (Postgres, ClickHouse,
MinIO, replays).
Hetzner does not have automatic volume backups (unlike AWS EBS); for real
durability we should layer on app-level backups (pg_dump → R2,
mc mirror → R2, clickhouse-backup) as a follow-up.
First run
cd infra/openreplay
pnpm install
# Log in to the R2 Pulumi backend (Infisical injects PULUMI_BACKEND_URL)
infisical run --env prod --path /infra/openreplay/ -- pulumi login "$PULUMI_BACKEND_URL"
# Initialize the prod stack against the project's GCP KMS secrets provider.
# This populates the `encryptedkey` field in Pulumi.prod.yaml automatically.
infisical run --env prod --path /infra/openreplay/ -- \
pulumi stack init prod \
--secrets-provider="gcpkms://projects/studyflash-security/locations/europe-west6/keyRings/pulumi-state/cryptoKeys/pulumi-state"
# Verify types compile
pnpm typecheck
# Provision (will fail loudly if OPENREPLAY_ORIGIN_CERT_B64/KEY_B64 aren't in
# Infisical — see "TLS" section for how to issue + populate them).
pnpm run pulumi:up
SSH access
The VM ships with Ubuntu's defaults: port 22 open, password auth on, no
static SSH keys baked in. For first login, regenerate the root password
with hcloud server reset-password openreplay (Hetzner returns it
inline) or grab it from the Hetzner Cloud panel. After logging in, add
your own pubkey to /root/.ssh/authorized_keys so subsequent sessions
use key auth.
A short-lived-cert replacement (Infisical SSH or similar) is on the
roadmap but not in place — bootstrap.sh does not configure any SSH CA
or restrict password auth.
TLS
The Origin CA cert + key are issued once out-of-band, stored in Infisical,
and read into Pulumi state as secrets. Pulumi-side issuance via
cloudflare.OriginCaCertificate isn't viable: the CF /certificates
endpoint doesn't accept modern API tokens (you get 1016 regardless of perms),
and the Pulumi cloudflare provider's apiKey field rejects CF's current
cfk_… key format because it validates against the legacy 37-hex-char schema.
Until both sides catch up, we stay out-of-band.
To issue (one-time bootstrap, then never again):
# 1) In Cloudflare dashboard: SSL/TLS → Origin Server → Create Certificate.
# Pick RSA, list "openreplay.studyflash.dev", validity 15 years.
# Copy BOTH the cert PEM and the private key PEM (the key is shown once).
# 2) Base64-encode both (single-line, no quoting issues for Infisical):
CERT_B64=$(base64 -w0 < cert.pem)
KEY_B64=$(base64 -w0 < key.pem)
# 3) Store in Infisical at /infra/openreplay/:
infisical secrets set \
--projectId 0cfec798-5081-4028-b142-a46080728d1f --env prod --path /infra/openreplay/ \
"OPENREPLAY_ORIGIN_CERT_B64=$CERT_B64" \
"OPENREPLAY_ORIGIN_KEY_B64=$KEY_B64"
Renewal: not really a concern — Origin CA certs are valid for 15 years.
Outputs
openreplayUrl—https://<domain>(dashboard, behind CF Access)ingestUrl—https://<ingestDomain>(trackeringestPoint)vmIp— Hetzner public IPv4 (informational; firewalled to CF only)
Resource sizing
OpenReplay's docs list a 2 vCPU / 8 GB / 50 GB minimum for low-to-moderate
traffic. We default to ccx23 (4 vCPU dedicated / 16 GB / 160 GB) plus a
100 GB data volume — total 260 GB, comfortably above the project's 240 GB
hard minimum. The volume is bind-mounted under /var/lib/rancher and
/var/lib/openreplay before the installer runs, so K3s persistent volumes
(session replays, MinIO, ClickHouse, Postgres) live on the volume rather
than the bundled root disk.
If you want a single-disk setup, switching serverType to cpx41
(8 vCPU shared / 16 GB / 240 GB) and dropping the volume also works.
CI
Intentionally not wired into .github/workflows/infra.yml yet — the
auto-up step in that workflow is currently disabled across all stacks, so
adding a job here would be dead weight. Run pnpm run pulumi:up locally for now.