Got things working. Added CLAUDE instructions

This commit is contained in:
2026-03-14 13:48:23 -04:00
parent 2ab06e86f8
commit 1622177ba0
17 changed files with 1190 additions and 80 deletions

View File

@@ -0,0 +1,122 @@
# Session Handoff: Appwrite Stack Setup & Infrastructure Hardening
**Date:** 2026-03-14
**Session Duration:** ~3 hours
**Session Focus:** Bring Appwrite stack to production-ready state on bab1.mgmt.toal.ca — env templating, systemd, secrets, networking, monitoring, ansible-navigator fixes
**Context Usage at Handoff:** ~64%
---
## What Was Accomplished
1. Created Appwrite `.env` Jinja2 template → `playbooks/templates/appwrite.env.j2`
2. Created systemd unit template → `playbooks/templates/appwrite.service.j2`
3. Created Prometheus node exporter playbook → `playbooks/install_node_exporter.yml`
4. Moved all Appwrite vars to inventory → `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/appwrite.yml`
5. Created monitoring vars → `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/monitoring.yml`
6. Created secrets file using HashiCorp Vault lookups → `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/secrets.yml`
7. Rewrote `playbooks/install_appwrite.yml` — added .env deploy, systemd, tags (`deps`/`image`/`configure`), handler, production compose URL
8. Heavily extended `playbooks/tasks/patch_appwrite_compose.yml` — Traefik pin, image fix, forwardedHeaders, proxyProtocol, handler notifications
9. Added docker prune after upgrade → `playbooks/upgrade_appwrite.yml`
10. Added `community.hashi_vault` to `requirements.yml`
11. Fixed ansible-navigator pipelining — moved config to `environment-variables.set` in `~/.ansible-navigator.yml`; also added SSH multiplexing, fact caching, retry file suppression, profile_tasks via CALLBACKS_ENABLED
12. Deleted `secrets.yml.example` — contained plaintext secrets (security risk)
---
## Exact State of Work in Progress
- **503 from appwrite.toal.ca**: proxyProtocol patch added to `patch_appwrite_compose.yml` but **not yet re-run against the host**. The appwrite stack on bab1 is still running the old compose without `proxyProtocol.trustedIPs`. Next action: run `ansible-navigator run playbooks/install_appwrite.yml --mode stdout --skip-tags deps,image` to apply patches and restart.
- **Vault secret not populated**: `kv/oys/bab-appwrite` in HashiCorp Vault (http://nas.lan.toal.ca:8200) has not been populated. The `secrets.yml` will fail lookups until this is done.
---
## Decisions Made This Session
- **DECISION: HashiCorp Vault over ansible-vault for secrets** BECAUSE AAP and dev workflows both need access; 1Password-based ansible-vault is local-only. Vault path: `kv/data/oys/bab-appwrite`. All secrets stored as fields in one KV secret. STATUS: confirmed.
- **DECISION: appwrite.io/install/compose as compose source** BECAUSE the GitHub raw URL pointed to a dev compose (image: appwrite-dev, custom entrypoint: `php -e app/http.php`) that fails with the official image. STATUS: confirmed.
- **DECISION: Traefik pinned to 2.11.31** BECAUSE traefik:2.11 (floating tag) is incompatible with Docker Engine >= 29. STATUS: confirmed.
- **DECISION: systemd Type=oneshot RemainAfterExit=yes** BECAUSE `docker compose up -d` exits after starting containers; oneshot keeps the unit in "active" state. STATUS: confirmed.
- **DECISION: node exporter uses security_opts label=disable** BECAUSE on RHEL 9 with SELinux enforcing, `:z` on a `/` bind-mount would recursively relabel the entire filesystem. label=disable avoids this for a read-only mount. STATUS: confirmed.
- **DECISION: ANSIBLE_VAULT_IDENTITY_LIST moved to navigator set env vars** BECAUSE `ansible.config.path` does not auto-mount the file into the EE — the path is set via ANSIBLE_CONFIG env var but the file is never present at that path inside the container. STATUS: confirmed.
- **DECISION: profile_tasks via ANSIBLE_CALLBACKS_ENABLED, not ANSIBLE_STDOUT_CALLBACK** BECAUSE profile_tasks is an aggregate callback, not a stdout callback. Setting it as STDOUT_CALLBACK caused `'sort_order'` error. STATUS: confirmed.
---
## Key Numbers Generated or Discovered This Session
- `appwrite_version: "1.8.1"` — current pinned version in install_appwrite.yml
- `appwrite_traefik_version: "2.11.31"` — minimum Traefik version for Docker >29
- `appwrite_web_port: 8080` — host port mapping for Traefik HTTP
- `appwrite_websecure_port: 8443` — host port mapping for Traefik HTTPS
- `appwrite_traefik_trusted_ips: "192.168.0.0/22"` — HAProxy subnet, used for both forwardedHeaders AND proxyProtocol trustedIPs
- `node_exporter_version: "1.9.0"`, `node_exporter_port: 9100`
- HAProxy backend config: `send-proxy-v2 check-send-proxy` on both `appwrite` and `babdevapi` backends → Traefik MUST have proxyProtocol enabled
- Context at handoff: 128.2k / 200k tokens (64%)
---
## Conditional Logic Established
- IF compose source is GitHub raw URL THEN it may be the dev build compose (image: appwrite-dev) BECAUSE Appwrite's main branch docker-compose.yml is for local development
- IF Traefik `proxyProtocol.trustedIPs` is not set THEN HAProxy `send-proxy-v2` causes 503 BECAUSE Traefik reads the PROXY protocol header as malformed HTTP/TLS data
- IF `ansible.config.path` is set in navigator config WITHOUT a volume mount THEN the ansible.cfg settings are silently ignored inside the EE BECAUSE the file is not present at that path in the container
---
## Files Created or Modified
| File Path | Action | Description |
|-----------|--------|-------------|
| `playbooks/templates/appwrite.env.j2` | Created | Full Appwrite .env template; secrets use `vault_appwrite_*` vars |
| `playbooks/templates/appwrite.service.j2` | Created | systemd unit, Type=oneshot RemainAfterExit=yes |
| `playbooks/install_appwrite.yml` | Modified | Added .env deploy, systemd, tags, handler, production compose URL |
| `playbooks/tasks/patch_appwrite_compose.yml` | Modified | Added Traefik pin, image fix, forwardedHeaders, proxyProtocol, handler notifications |
| `playbooks/upgrade_appwrite.yml` | Modified | Added docker prune after upgrade |
| `playbooks/install_node_exporter.yml` | Created | Prometheus node exporter; pid_mode=host, label=disable, SYS_TIME cap |
| `requirements.yml` | Modified | Added community.hashi_vault |
| `~/.ansible-navigator.yml` | Modified | Replaced file-mount approach with environment-variables.set; added SSH mux, fact caching |
| `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/appwrite.yml` | Created | All non-secret Appwrite vars |
| `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/monitoring.yml` | Created | node_exporter_version, node_exporter_port |
| `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/secrets.yml` | Modified | HashiCorp Vault lookups for vault_appwrite_* vars |
| `secrets.yml.example` | Deleted | Contained plaintext secrets — security risk |
---
## What the NEXT Session Should Do
1. **First**: Populate HashiCorp Vault secret at `kv/oys/bab-appwrite` with fields: `openssl_key`, `db_pass`, `db_root_pass`, `smtp_password`, `executor_secret`, `github_client_secret`, `github_webhook_secret`, `github_private_key`
2. **Then**: Run `ansible-navigator run playbooks/install_appwrite.yml --mode stdout --skip-tags deps,image` to apply proxyProtocol patch and restart the Appwrite stack
3. **Then**: Verify `curl -v https://appwrite.toal.ca` no longer returns 503
4. **Then**: Install `community.hashi_vault` in the `ee-demo` execution environment (currently missing from the EE image)
5. **Then**: Run `ansible-navigator run playbooks/install_node_exporter.yml --mode stdout` to deploy node exporter
---
## Open Questions Requiring User Input
- [x] **Vault secret population**: RESOLVED 2026-03-14 — populated by hand at `kv/oys/bab-appwrite`.
- [x] **`_APP_DOMAIN_TARGET`**: RESOLVED 2026-03-14 — added to `appwrite.env.j2` defaulting to `appwrite_domain`. Fixes `Domain::__construct() null` in console.php:49.
- [x] **community.hashi_vault in EE**: RESOLVED 2026-03-14 — added to `ee-demo` EE image.
- [x] **SSH_AUTH_SOCK not passed to EE**: RESOLVED 2026-03-14 — confirmed working.
---
## Assumptions That Need Validation
- ASSUMED: `appwrite.io/install/compose` returns the production compose for 1.8.x — validate by inspecting the downloaded file on next run
- ASSUMED: Traefik entrypoint names in production compose are `appwrite_web` and `appwrite_websecure` — these were confirmed in the dev compose; verify they match in production compose
- ASSUMED: `community.hashi_vault.hashi_vault` lookup returns `data.data` fields directly for KV v2 — validate by running a test lookup
---
## What NOT to Re-Read
- The HAProxy config (provided inline by user) — key facts preserved above
- The original Appwrite `.env` (provided inline by user) — fields captured in `appwrite.env.j2`
## Files to Load Next Session
- `playbooks/install_appwrite.yml` — if continuing install/configure work
- `playbooks/tasks/patch_appwrite_compose.yml` — if debugging compose patches
- `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/secrets.yml` — if working on vault integration
- `docs/summaries/handoff-2026-03-14-appwrite-setup.md` — this file (load at session start)