Got things working. Added CLAUDE instructions

This commit is contained in:
2026-03-14 13:48:23 -04:00
parent 2ab06e86f8
commit 1622177ba0
17 changed files with 1190 additions and 80 deletions

View File

@@ -0,0 +1,29 @@
# Project Brief: BAB Backend Ansible
**Created:** 2026-03-14
**Last Updated:** 2026-03-14
## Project
- **Name:** BAB (Borrow a Boat) Backend Ansible
- **Type:** Ansible automation for Appwrite-based backend on RHEL 9
- **Host:** `bab1.mgmt.toal.ca`
- **Production Runner:** AAP (Ansible Automation Platform)
- **Dev Runner:** ansible-navigator with `ee-demo` execution environment
## Scope
Full lifecycle management of an Appwrite backend: host provisioning, Nginx, Gitea Act Runner, database schema, seed data, user provisioning, TLS certificates, EDA rulebooks for Gitea webhooks and Alertmanager alerts, ServiceNow integration for incident/problem creation.
## Input Documents
| Document | Path | Processed? | Summary At |
|----------|------|-----------|------------|
| Architecture reference | `docs/context/architecture.md` | Yes | self |
## Known Constraints
- No inventory file in repo — dev inventory at `~/Dev/inventories/bab-inventory/`, prod managed by AAP
- Sensitive files gitignored: `ansible.cfg`, `secrets.yml`, `.vault_password`
- `provision_database.yml` idempotency is incomplete — noted in that file
- Do not refer to AWX; production platform is AAP
## Project Phase Tracker
| Phase | Status | Summary File | Date |
|-------|--------|-------------|------|
| Initial setup | Complete | — | 2026-03-14 |

View File

@@ -0,0 +1,43 @@
---
name: Appwrite domain target fix and idempotency
description: Corrections to previous session's diagnosis and compose download idempotency
type: decision
date: 2026-03-14
---
## Decisions / Corrections
### _APP_DOMAIN_TARGET_CNAME (CORRECTS previous handoff)
Previous session recorded: `_APP_DOMAIN_TARGET` added to fix null Domain crash.
**That was wrong.** `_APP_DOMAIN_TARGET` is deprecated since Appwrite 1.7.0.
The compose file's `environment:` blocks pass only:
- `_APP_DOMAIN_TARGET_CNAME`
- `_APP_DOMAIN_TARGET_A`
- `_APP_DOMAIN_TARGET_AAAA`
- `_APP_DOMAIN_TARGET_CAA`
`_APP_DOMAIN_TARGET` is never injected into containers. It was silently ignored.
**Fix:** Replaced `_APP_DOMAIN_TARGET` with `_APP_DOMAIN_TARGET_CNAME` in
`playbooks/templates/appwrite.env.j2`. Added `_APP_DOMAIN_TARGET_CAA` (default: '').
`_APP_DOMAIN_TARGET_CNAME` defaults to `appwrite_domain` (appwrite.toal.ca).
**Why:** PHP `console.php:49` constructs a Domain object from `_APP_DOMAIN_TARGET_CNAME`.
Null → TypeError crash on every `/v1/console/variables` request.
### get_url force: true removed (idempotency)
`force: true` on the compose download caused the task to always report `changed`,
triggering a service restart on every playbook run.
**Fix:** Removed `force: true` from `playbooks/install_appwrite.yml` get_url task.
File is now only downloaded if absent. Upgrade playbook handles re-downloads.
## State After This Session
- Appwrite console loads without error ✅
- Stack running on bab1.mgmt.toal.ca ✅
- install_appwrite.yml is idempotent ✅
- node_exporter install: complete, metrics confirmed ✅

View File

@@ -0,0 +1,84 @@
# Session Handoff: Appwrite Stack Setup & Infrastructure Hardening
**Date:** 2026-03-14
**Session Duration:** ~4 hours
**Session Focus:** Bring Appwrite stack to production-ready state on bab1.mgmt.toal.ca
**Context Usage at Handoff:** ~70%
---
## Current State
The install playbook is ready to run. All open questions from the session are resolved. The stack on bab1 is running but with an unpatched compose (no proxyProtocol, old entrypoint issue). **One run of the playbook will bring everything current.**
---
## What Was Accomplished This Session
1. Appwrite `.env` Jinja2 template → `playbooks/templates/appwrite.env.j2`
2. Systemd unit template → `playbooks/templates/appwrite.service.j2`
3. Prometheus node exporter playbook → `playbooks/install_node_exporter.yml`
4. Appwrite inventory vars → `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/appwrite.yml`
5. Monitoring inventory vars → `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/monitoring.yml`
6. HashiCorp Vault secret lookups → `~/Dev/inventories/bab-inventory/host_vars/bab1.mgmt.toal.ca/secrets.yml`
7. `playbooks/install_appwrite.yml` — .env deploy, systemd, tags (`deps`/`image`/`configure`), restart handler, production compose URL (`appwrite.io/install/compose`)
8. `playbooks/tasks/patch_appwrite_compose.yml` — Traefik 2.11.31 pin, image fix (appwrite-dev→official), forwardedHeaders + proxyProtocol trustedIPs for both entrypoints, handler notifications
9. `playbooks/upgrade_appwrite.yml` — docker prune after upgrade
10. `requirements.yml` — added `community.hashi_vault`
11. `~/.ansible-navigator.yml` — pipelining fixed (ANSIBLE_CONFIG file was never mounted into EE; replaced with `environment-variables.set`); SSH multiplexing, fact caching, profile_tasks via CALLBACKS_ENABLED
12. Deleted `secrets.yml.example` — contained plaintext secrets
---
## Key Numbers
- `appwrite_version: "1.8.1"`
- `appwrite_traefik_version: "2.11.31"` — minimum for Docker Engine >= 29
- `appwrite_web_port: 8080`, `appwrite_websecure_port: 8443`
- `appwrite_traefik_trusted_ips: "192.168.0.0/22"` — HAProxy subnet; used for both `forwardedHeaders.trustedIPs` and `proxyProtocol.trustedIPs`
- `node_exporter_version: "1.9.0"`, `node_exporter_port: 9100`
- Vault path: `kv/data/oys/bab-appwrite` (populated 2026-03-14)
---
## Decisions Made
| Decision | Rationale |
|----------|-----------|
| HashiCorp Vault for secrets | AAP + dev both need access; 1Password ansible-vault is local-only |
| `appwrite.io/install/compose` as compose source | GitHub raw URL pointed to dev compose with `image: appwrite-dev` and broken entrypoint override |
| Traefik pinned to 2.11.31 | Floating `traefik:2.11` tag incompatible with Docker Engine >= 29 |
| `proxyProtocol.trustedIPs` on both Traefik entrypoints | HAProxy uses `send-proxy-v2` on both `appwrite` and `babdevapi` backends; without this Traefik returns 503 |
| `_APP_DOMAIN_TARGET` added to .env template | Appwrite 1.8.x `console.php:49` constructs a `Domain` object from this var; null = crash |
| systemd `Type=oneshot RemainAfterExit=yes` | `docker compose up -d` exits after starting containers; oneshot keeps unit active |
| node exporter `security_opts: label=disable` | `:z` on `/` bind-mount would recursively relabel entire filesystem under RHEL 9 SELinux |
| `profile_tasks` via `ANSIBLE_CALLBACKS_ENABLED` | It's an aggregate callback, not a stdout callback; `ANSIBLE_STDOUT_CALLBACK=profile_tasks` causes `'sort_order'` error |
---
## What the NEXT Session Should Do
1. **Run the install playbook** (skipping deps and image pull since stack is already running):
```bash
ansible-navigator run playbooks/install_appwrite.yml --mode stdout --skip-tags deps,image
```
2. **Verify** `curl -v https://appwrite.toal.ca` returns 200 (not 503)
3. **Verify** Appwrite console loads without `Domain::__construct() null` error
4. **Run node exporter**:
```bash
ansible-navigator run playbooks/install_node_exporter.yml --mode stdout
```
5. **Verify** `curl http://bab1.mgmt.toal.ca:9100/metrics` returns Prometheus metrics
---
## Open Questions
None. All issues from the session are resolved.
---
## Files to Load Next Session
- `playbooks/install_appwrite.yml` — if continuing install/configure work
- `playbooks/tasks/patch_appwrite_compose.yml` — if debugging compose patches
- `docs/context/architecture.md` — for Appwrite API or EDA work