infra/README.md

infra/

Pulumi projects for everything we self-host. One project per stack (infra/<stack>/), each with its own Pulumi.yaml, index.ts, and Pulumi.<env>.yaml. Stacks are independent — they share the R2 state backend and the pulumi-state GCP KMS key for secret encryption, but otherwise don't depend on each other.

infra/

Pulumi projects for everything we self-host. One project per stack (infra/<stack>/), each with its own Pulumi.yaml, index.ts, and Pulumi.<env>.yaml. Stacks are independent — they share the R2 state backend and the pulumi-state GCP KMS key for secret encryption, but otherwise don't depend on each other.

Stacks

FolderWhat
dns/Cloudflare zones, DNS records, and redirect rulesets across all studyflash.* domains
gatus/Gatus uptime monitoring
grafana/Self-hosted Grafana + Prometheus + Pushgateway
kms/The shared pulumi-state GCP KMS key all other stacks use as their secrets provider
learning-api/learning-api hosts: the Hetzner UAT VM, Azure Container Apps for prod, GCP MIGs for parser
openreplay/Self-hosted OpenReplay on a dedicated Hetzner Cloud VM
triton-inference/Triton Inference Server + vLLM sidecar GPU infra
turborepo-cache/Turborepo remote cache backend on GCP
zero-trust/Cloudflare Zero Trust device profiles and WARP split tunnel

Creating a new stack

# 1. From the folder that will own the stack (under infra/ for shared
#    infra, internal/<service>/ for a service-scoped stack).
cd infra/<stack>   # or: cd internal/<service>

# 2. Init Pulumi.yaml — runtime: nodejs/pnpm. Add `main: infra.ts` (or
#    whatever entry filename) under runtime.options if you want an
#    entry name other than index.ts. Mirror the description style of
#    sibling stacks so the stack catalog above stays readable.

# 3. Add the stack itself. ALWAYS pass --secrets-provider pointing at
#    the shared GCP KMS key — never accept the default passphrase
#    provider, that diverges from every other stack and trips local
#    deploys.
AWS_PROFILE=studyflash-pulumi pnpm exec pulumi stack init prod \
  --secrets-provider="gcpkms://projects/studyflash-security/locations/europe-west6/keyRings/pulumi-state/cryptoKeys/pulumi-state"

# 4. Set the package.json entry so Pulumi finds your TS:
#       "main": "infra.ts"
#    Pulumi reads package.json's `main` field, NOT Pulumi.yaml's
#    runtime.options.main. Don't burn an hour on this.

# 5. Wrap apply in `infisical run` so secrets at /infra/<stack>/ (or
#    /internal/<service>/) show up as env vars. Add a `pnpm run up`
#    script in package.json that does the wrapping; do NOT invoke
#    `pulumi up` directly from CI or local shells.

Pre-reqs for any new stack contributor: AWS_PROFILE=studyflash-pulumi configured in ~/.aws/credentials (for the R2 state backend), and GCP Application Default Credentials with access to the pulumi-state KMS key (gcloud auth application-default login).

Lessons (read before adding a new stack)

Provider sourcing — use Pulumi's TF bridge, never a third-party NPM package

For any vendor where Pulumi doesn't ship an official @pulumi/<vendor> package, generate the SDK locally from the vendor's own Terraform provider:

cd infra/<stack>
pulumi package add terraform-provider <Org>/<provider>

This emits a typed SDK at sdks/<provider>/, importable as @pulumi/<provider>. Trust chain: vendor's TF provider → Pulumi codegen → us. No third-party intermediary. The Pulumi.yaml's packages: block is the lock; anyone running pulumi install regenerates the SDK from there.

Do not depend on community NPM packages of the form pulumi-X (e.g. pulumi-infisical from hckhanh). They look convenient but they add an unrelated single-maintainer dependency on top of the same TF provider Pulumi can codegen for us natively. Past mistake; don't repeat.

Mark Pulumi-emitted Infisical secrets

Tag every infisical.Secret resource with the pulumi-managed Infisical tag (declare a infisical.SecretTag resource once per stack and reuse its id via tagIds). Hand-edited values stay untagged; the UI then distinguishes them at a glance.

Keep deploy auth out of Pulumi.yaml

Deploy-time creds (PULUMI_BACKEND_URL, R2 keys, provider tokens, …) live in Infisical, never in committed config. The canonical pnpm run up shape wraps the apply in infisical run --path /infra/<stack>/, so static secrets at that path show up as env vars to Pulumi. Cross-folder references (${prod.<other-path>.<KEY>}) resolve client-side and let one folder reference another without duplicating values.

SSH

Use Infisical to broker SSH connections via Dynamic Secrets — short-lived certs minted by packages/devtools/lease-ssh-cert.ts. The Hetzner VMs provisioned by learning-api and openreplay are the typical targets.