diff --git a/docs/summaries/handoff-2026-03-14-appwrite-bootstrap-backup.md b/docs/archive/handoffs/handoff-2026-03-14-appwrite-bootstrap-backup.md similarity index 100% rename from docs/summaries/handoff-2026-03-14-appwrite-bootstrap-backup.md rename to docs/archive/handoffs/handoff-2026-03-14-appwrite-bootstrap-backup.md diff --git a/docs/summaries/handoff-2026-03-15-appwrite-function-dns-fix.md b/docs/summaries/handoff-2026-03-15-appwrite-function-dns-fix.md new file mode 100644 index 0000000..6078eac --- /dev/null +++ b/docs/summaries/handoff-2026-03-15-appwrite-function-dns-fix.md @@ -0,0 +1,72 @@ +# Session Handoff: Appwrite Function DNS Fix +**Date:** 2026-03-15 +**Session Duration:** ~1.5 hours +**Session Focus:** Diagnosed and fixed curl error 6 in Appwrite function executor caused by Docker inheriting host search domain +**Context Usage at Handoff:** ~60% + +## What Was Accomplished + +1. Diagnosed SMTP auth failure in `appwrite-worker-mails` — deferred (credentials/provider issue, not automation) +2. Diagnosed `userinfo` function curl error 6 (CURLE_COULDNT_RESOLVE_HOST) in `openruntimes-executor` +3. Identified `_APP_EXECUTOR_RUNTIME_NETWORK` mismatch (`appwrite_runtimes` vs actual Docker network `runtimes`) → fixed in env template default +4. Traced root cause to `search mgmt.toal.ca` in container resolv.conf inherited from host → fixed by shortening system hostname from `bab1.mgmt.toal.ca` to `bab1` +5. Added pre-flight assertions to `install_appwrite.yml` to prevent recurrence +6. Cleaned up ineffective `daemon.json` task added and removed this session + +## Exact State of Work in Progress + +- SMTP authentication failure (`appwrite-worker-mails`): NOT investigated. Separate issue from DNS fix. Deferred. +- All DNS/function work: COMPLETE. `userinfo` function confirmed working after hostname change. + +## Decisions Made This Session + +- `_APP_EXECUTOR_RUNTIME_NETWORK` default corrected to `runtimes` BECAUSE the Appwrite docker-compose creates a network named `runtimes` (prefixed by compose project `appwrite`→`appwrite_runtimes`... actually the network is literally named `runtimes` not `appwrite_runtimes`) — STATUS: confirmed, deployed to host +- Docker `daemon.json` `"dns-search": []` REJECTED BECAUSE Docker treats empty array as no-op (`# Overrides: []` in container resolv.conf confirms it had no effect) +- System hostname shortened to `bab1` BECAUSE FQDN hostname causes NetworkManager to write `search mgmt.toal.ca` into `/etc/resolv.conf`, which Docker inherits into all containers — STATUS: confirmed fix, function working + +## Key Numbers Generated or Discovered This Session + +- Runtime container IP on `runtimes` network: `172.20.0.3` +- Executor IP on `runtimes` network: `172.20.0.2` +- Executor IP on `appwrite` network: `172.19.0.5` +- openruntimes executor image: `openruntimes/executor:0.7.22` +- Appwrite version in `install_appwrite.yml`: `1.8.1` +- Docker.php error line: 1161 — curl call to `http://{random_32_hex}:3000/` +- Runtime hostname format: `bin2hex(random_bytes(16))` = 32-char hex, e.g. `c6991893fe570ce5c669d50ed6e7a985` + +## Conditional Logic Established + +- IF system hostname is FQDN (contains `.`) THEN NetworkManager writes `search ` to `/etc/resolv.conf` AND Docker inherits it into all containers AND Appwrite executor curl calls to runtime containers fail with error 6 BECAUSE musl resolver appends search domain to unqualified names and does not fall back on SERVFAIL +- IF `ping {hostname}` resolves but `curl http://{hostname}/` returns error 6 THEN suspect c-ares or `/etc/hosts` vs DNS split — trailing dot in URL (`curl http://{hostname}.:port/`) is a reliable test for whether Docker's embedded DNS has the record +- IF `_APP_EXECUTOR_RUNTIME_NETWORK` does not match the actual Docker network name the executor is connected to THEN runtime containers are placed on a different network than the executor and communication fails with error 6 + +## Files Created or Modified + +| File Path | Action | Description | +|-----------|--------|-------------| +| `playbooks/templates/appwrite.env.j2` | Modified | `_APP_EXECUTOR_RUNTIME_NETWORK`, `OPEN_RUNTIMES_NETWORK`, `_APP_FUNCTIONS_RUNTIMES_NETWORK`, `_APP_COMPUTE_RUNTIMES_NETWORK` defaults changed from `appwrite_runtimes` to `runtimes` | +| `playbooks/install_appwrite.yml` | Modified | Added pre-flight assertions: hostname must not be FQDN, `/etc/resolv.conf` must have no `search` line. Added explanatory comment block citing the executor curl error 6 failure mode. | + +## What the NEXT Session Should Do + +1. **First**: Read this handoff +2. **If SMTP is the goal**: Check `vault_appwrite_smtp_password` value and `appwrite_smtp_username` format against the SMTP provider. The template at `playbooks/templates/appwrite.env.j2` lines 74-78 is correct structurally. The issue is likely credentials or `_APP_SMTP_SECURE` value (`true` string vs `tls`/empty). +3. **If function work continues**: The `userinfo` function and DNS are working. Next functional gap is unknown — check Appwrite function logs directly. + +## Open Questions Requiring User Input + +- [ ] SMTP failure (`appwrite-worker-mails` SMTP Error: Could not authenticate) — what provider and were credentials recently rotated? Impacts email delivery for all Appwrite auth flows. + +## Assumptions That Need Validation + +- ASSUMED: Shortening the hostname to `bab1` has no negative side effects on other services on this host (Nginx, AAP connectivity, TLS certs) — validate by checking that `bab1.mgmt.toal.ca` still resolves externally and TLS certs are not hostname-bound to the FQDN system hostname. + +## What NOT to Re-Read + +- `docs/summaries/handoff-2026-03-14-appwrite-bootstrap-backup.md` — archived, superseded by this handoff + +## Files to Load Next Session + +- `playbooks/templates/appwrite.env.j2` — if working on SMTP or any env configuration +- `playbooks/install_appwrite.yml` — if adding further host setup tasks +- `docs/context/architecture.md` — if working on playbooks or EDA rulebooks diff --git a/playbooks/install_appwrite.yml b/playbooks/install_appwrite.yml index 3c73a92..677b36f 100644 --- a/playbooks/install_appwrite.yml +++ b/playbooks/install_appwrite.yml @@ -5,6 +5,34 @@ tags: deps tasks: + # A FQDN system hostname causes NetworkManager to write the domain suffix as a + # 'search' entry in /etc/resolv.conf. Docker inherits this into every container. + # The Appwrite executor uses randomly-generated short hostnames to reach runtime + # containers via DNS; with a search domain present, those names get the suffix + # appended, upstream DNS returns SERVFAIL, and musl's resolver does not fall back + # to the absolute name — breaking function execution with curl error 6. + - name: Assert system hostname is not a FQDN + ansible.builtin.assert: + that: "'.' not in ansible_hostname" + fail_msg: >- + System hostname '{{ ansible_hostname }}' is a FQDN. Shorten it first: + hostnamectl set-hostname {{ ansible_hostname.split('.')[0] }} + + - name: Check for search domain in /etc/resolv.conf + ansible.builtin.command: + cmd: grep -c '^search ' /etc/resolv.conf + register: resolv_search + changed_when: false + failed_when: false + + - name: Assert no search domain in /etc/resolv.conf + ansible.builtin.assert: + that: resolv_search.rc != 0 + fail_msg: >- + /etc/resolv.conf contains a 'search' domain. This is typically caused by a + FQDN system hostname. Shorten the hostname and reconnect the NM interface + to regenerate resolv.conf without the search entry. + - name: Update all packages to latest ansible.builtin.dnf: name: "*" diff --git a/playbooks/templates/appwrite.env.j2 b/playbooks/templates/appwrite.env.j2 index 1dc0182..7788336 100644 --- a/playbooks/templates/appwrite.env.j2 +++ b/playbooks/templates/appwrite.env.j2 @@ -127,14 +127,14 @@ _APP_FUNCTIONS_RUNTIMES={{ appwrite_functions_runtimes | default('node-16.0,php- _APP_EXECUTOR_SECRET={{ vault_appwrite_executor_secret }} _APP_EXECUTOR_HOST={{ appwrite_executor_host | default('http://exc1/v1') }} _APP_BROWSER_HOST={{ appwrite_browser_host | default('http://appwrite-browser:3000/v1') }} -_APP_EXECUTOR_RUNTIME_NETWORK={{ appwrite_executor_runtime_network | default('appwrite_runtimes') }} +_APP_EXECUTOR_RUNTIME_NETWORK={{ appwrite_executor_runtime_network | default('runtimes') }} _APP_FUNCTIONS_ENVS={{ appwrite_functions_envs | default('node-16.0,php-7.4,python-3.9,ruby-3.0') }} _APP_FUNCTIONS_INACTIVE_THRESHOLD={{ appwrite_functions_inactive_threshold | default(60) }} _APP_COMPUTE_INACTIVE_THRESHOLD={{ appwrite_compute_inactive_threshold | default(60) }} DOCKERHUB_PULL_USERNAME={{ appwrite_dockerhub_username | default('') }} DOCKERHUB_PULL_PASSWORD={{ appwrite_dockerhub_password | default('') }} DOCKERHUB_PULL_EMAIL={{ appwrite_dockerhub_email | default('') }} -OPEN_RUNTIMES_NETWORK={{ appwrite_open_runtimes_network | default('appwrite_runtimes') }} +OPEN_RUNTIMES_NETWORK={{ appwrite_open_runtimes_network | default('runtimes') }} _APP_FUNCTIONS_RUNTIMES_NETWORK={{ appwrite_functions_runtimes_network | default('runtimes') }} _APP_COMPUTE_RUNTIMES_NETWORK={{ appwrite_compute_runtimes_network | default('runtimes') }} _APP_DOCKER_HUB_USERNAME={{ appwrite_docker_hub_username | default('') }} diff --git a/rulebooks/alertmanager_listener.yml b/rulebooks/alertmanager_listener.yml index 9a3f419..7ce1772 100644 --- a/rulebooks/alertmanager_listener.yml +++ b/rulebooks/alertmanager_listener.yml @@ -2,10 +2,13 @@ - name: Listen for Alertmanager events hosts: all sources: - - name: Ansible Alertmanager listener - ansible.eda.alertmanager: + - name: Listener + ansible.eda.webhook: port: 9101 host: 0.0.0.0 + filters: + - eda.builtin.event_splitter: + splitter_key: payload.alerts rules: - name: Resolve Disk Usage condition: @@ -17,6 +20,7 @@ name: Demo - Clean Log Directory organization: OYS job_args: + limit: "{{ event.alert.labels.instance }}" extra_vars: alertmanager_annotations: "{{ event.alert.annotations }}" alertmanager_generator_url: "{{ event.alert.generatorURL }}"