Files
hyperv-demo/CLAUDE.md

347 lines
11 KiB
Markdown

# Hyper-V Windows Server Automation - Project Documentation
## Project Overview
This project demonstrates enterprise-grade automation for the complete lifecycle of Windows Server VMs on Hyper-V using Ansible Automation Platform (AAP), implementing GitOps and Infrastructure as Code (IaC) principles.
**Target Audience**: Enterprise IT operations teams, infrastructure engineers, platform engineers
**Deployment Platform**: Ansible Automation Platform 2.x (formerly Ansible Tower)
**Future Roadmap**: Event-Driven Ansible (EDA) integration for reactive automation
## Architecture
### Technology Stack
- **Automation Engine**: Ansible Core 2.15+
- **Platform**: Ansible Automation Platform 2.4+
- **Hypervisor**: Microsoft Hyper-V (Windows Server 2019/2022)
- **Guest OS**: Windows Server 2019/2022
- **CMDB**: ServiceNow ITSM
- **Version Control**: Git (GitOps workflow)
- **Authentication**: Active Directory / Kerberos
### Connectivity Model
```
Ansible Automation Platform
↓ (WinRM over HTTPS/Kerberos)
Windows Hyper-V Host(s)
↓ (Hyper-V PowerShell)
Windows Server VMs
↓ (REST API)
ServiceNow CMDB
```
### Core Use Cases
1. **VM Provisioning**: Automated creation of Windows Server VMs using unattended installation (autounattend.xml)
2. **Patch Management**: Automated Windows Update deployment triggered by git commits
3. **Application Deployment**: Install and configure applications (IIS demonstration)
4. **Configuration Management**: Day-2 operations and drift remediation
5. **CMDB Synchronization**: Bidirectional sync with ServiceNow CMDB
## Project Structure
```
.
├── ansible.cfg # Ansible configuration with Windows/WinRM defaults
├── collections/
│ └── requirements.yml # Required Ansible collections
├── inventory/
│ ├── production/
│ │ └── hosts.yml # Production inventory
│ └── staging/
│ └── hosts.yml # Staging inventory (future)
├── group_vars/
│ ├── all.yml # Global variables
│ ├── hyperv_hosts.yml # Hyper-V host configuration
│ ├── windows_servers.yml # Windows Server defaults
│ └── web_servers.yml # IIS/web server configuration
├── host_vars/ # Host-specific variables (future)
├── playbooks/
│ ├── provision-vm.yml # VM provisioning workflow
│ ├── patch-vms.yml # Windows Update automation
│ ├── install-iis.yml # IIS deployment
│ └── sync-cmdb.yml # ServiceNow CMDB sync
├── roles/ # Custom roles (future development)
│ ├── windows_baseline/ # Windows hardening & baseline config
│ ├── hyperv_vm/ # Hyper-V VM management
│ ├── iis_webapp/ # IIS application deployment
│ └── servicenow_sync/ # ServiceNow integration
├── templates/ # Jinja2 templates (future)
│ └── autounattend.xml.j2 # Windows unattended install template
└── README.md # Quick start guide
```
## Key Design Patterns
### GitOps Workflow
All infrastructure changes flow through Git:
1. Engineer creates feature branch
2. Updates inventory or group_vars to define desired state
3. Commits and creates pull request
4. AAP webhook triggers job template for validation
5. After merge, AAP webhook triggers deployment
### Idempotency
All playbooks must be idempotent - safe to run multiple times without side effects. Use:
- `state: present` vs `state: absent`
- Conditional tasks with `when:`
- Changed/failed handlers
- Check mode support (`--check`)
### Credential Management
- **Never commit secrets to Git**
- Use AAP credential types for:
- Machine credentials (WinRM)
- ServiceNow credentials
- Domain join credentials
- Use Ansible Vault for sensitive variables in development
### Role-Based Organization
Future development should extract common patterns into roles:
- `roles/windows_baseline/`: Base Windows configuration
- `roles/hyperv_vm/`: VM lifecycle management
- `roles/iis_webapp/`: IIS deployment patterns
## Technical Requirements
### Prerequisites
1. **Ansible Automation Platform**
- AAP 2.4 or later
- Controller configured with Windows machine credentials
- Execution environment with Windows collections
2. **Hyper-V Environment**
- Windows Server 2019/2022 with Hyper-V role
- WinRM enabled and configured
- Kerberos authentication configured
- Sufficient storage for VM images
3. **Network Requirements**
- WinRM ports (5985/5986) open from AAP to Hyper-V hosts
- WinRM ports open from AAP to managed Windows VMs
- DNS resolution for all hosts
- Active Directory domain membership
4. **ServiceNow**
- ServiceNow instance with CMDB
- API user credentials
- CMDB table structure defined
### Windows Remote Management Setup
On all Windows hosts (Hyper-V and VMs):
```powershell
# Enable WinRM with HTTPS
winrm quickconfig -transport:https
winrm set winrm/config/service/auth '@{Kerberos="true"}'
winrm set winrm/config/service '@{AllowUnencrypted="false"}'
```
## Development Guidelines
### Adding New Playbooks
1. Create playbook in `playbooks/` directory
2. Use descriptive names: `verb-noun.yml` (e.g., `deploy-webapp.yml`)
3. Include proper documentation in header
4. Add tags for selective execution
5. Implement check mode support
6. Test in staging environment first
### Variable Precedence
Follow this hierarchy (least to most specific):
1. `group_vars/all.yml` - Global defaults
2. `group_vars/<group>.yml` - Group-specific
3. `host_vars/<host>.yml` - Host-specific
4. Playbook `vars:` - Playbook overrides
5. Extra vars (`-e`) - Runtime overrides
### Testing Strategy
1. **Syntax Check**: `ansible-playbook --syntax-check playbook.yml`
2. **Check Mode**: `ansible-playbook --check playbook.yml`
3. **Limit Scope**: `--limit` to test on single host first
4. **Verbose Output**: Use `-v`, `-vv`, `-vvv` for debugging
5. **Staging First**: Always test in staging before production
### Windows Module Best Practices
- Use `ansible.windows.*` modules (not deprecated `win_*`)
- Always handle reboots explicitly with `ansible.windows.win_reboot`
- Use `register:` to capture task output
- Check `reboot_required` in results
- Use `failed_when:` for expected error conditions
## AAP Integration
### Job Templates to Create
1. **Provision VM** - `playbooks/provision-vm.yml`
- Survey for VM name, IP, CPU, RAM
- Credential: Hyper-V machine credential
- Webhook enabled for GitOps
2. **Patch VMs** - `playbooks/patch-vms.yml`
- Limit pattern for selective patching
- Scheduled for maintenance windows
- Credential: Windows machine credential
3. **Deploy IIS** - `playbooks/install-iis.yml`
- Limit to web_servers group
- Credential: Windows machine credential
4. **Sync CMDB** - `playbooks/sync-cmdb.yml`
- Scheduled daily
- Credentials: Windows + ServiceNow
### Workflow Templates (Future)
Create workflows for complex orchestration:
- **Full VM Lifecycle**: Provision → Configure → Deploy App → Update CMDB
- **Patch & Compliance**: Patch → Verify → Update CMDB → Generate Report
### Event-Driven Ansible (Future)
Planned EDA integrations:
- ServiceNow incident triggers remediation playbook
- Windows Event Log monitoring triggers security response
- Hyper-V alerts trigger capacity management
- Git webhook triggers deployment pipeline
## Common Tasks
### Bootstrap a New VM
```bash
# Provision VM
ansible-playbook playbooks/provision-vm.yml \
-e vm_name=DEMO-WEB01 \
-e vm_ip_address=192.168.1.101
# Configure baseline
ansible-playbook playbooks/windows-baseline.yml --limit DEMO-WEB01
# Deploy application
ansible-playbook playbooks/install-iis.yml --limit DEMO-WEB01
# Update CMDB
ansible-playbook playbooks/sync-cmdb.yml --limit DEMO-WEB01
```
### Patch All Windows Servers
```bash
ansible-playbook playbooks/patch-vms.yml --limit windows_servers
```
### Update Specific Group
```bash
ansible-playbook playbooks/patch-vms.yml --limit web_servers
```
## Troubleshooting
### WinRM Connection Issues
```bash
# Test WinRM connectivity
ansible hyperv_hosts -m ansible.windows.win_ping
# Check with verbose output
ansible hyperv_hosts -m ansible.windows.win_ping -vvv
```
### Common Issues
1. **Kerberos Authentication Failure**
- Verify DNS resolution (forward and reverse)
- Check domain join status
- Verify time synchronization
- Check Kerberos ticket: `klist`
2. **Module Not Found**
- Install collections: `ansible-galaxy collection install -r collections/requirements.yml`
- Verify in AAP execution environment
3. **Timeout Issues**
- Increase timeout in `ansible.cfg`
- Check network connectivity
- Verify WinRM service running
## Security Considerations
### Credential Storage
- Use AAP credential vault (not Ansible Vault in production)
- Rotate credentials regularly
- Use least-privilege service accounts
- Separate credentials per environment
### Network Security
- Use WinRM over HTTPS (port 5986)
- Enable Kerberos encryption
- Implement network segmentation
- Use jump hosts/bastion for AAP
### Compliance
- Enable audit logging in AAP
- Log all playbook runs
- Track changes in ServiceNow CMDB
- Implement change approval workflow
## Future Enhancements
### Phase 2 - Advanced Features
- [ ] Custom execution environment with all dependencies
- [ ] Ansible Vault integration for secrets
- [ ] Enhanced autounattend.xml templating
- [ ] VM template/image management
- [ ] Backup and DR automation
### Phase 3 - EDA Integration
- [ ] ServiceNow incident-driven remediation
- [ ] Windows Event Log monitoring
- [ ] Hyper-V performance monitoring
- [ ] Self-healing automation
### Phase 4 - Enterprise Scale
- [ ] Multi-region Hyper-V clusters
- [ ] RBAC and delegation model
- [ ] Compliance scanning and remediation
- [ ] Cost tracking and optimization
- [ ] Disaster recovery automation
## Contributing
This is a demonstration project. When extending:
1. Follow existing patterns and structure
2. Test thoroughly in staging
3. Document all variables in group_vars
4. Use semantic versioning for releases
5. Update this CLAUDE.md with architectural changes
## References
- [Ansible Windows Guide](https://docs.ansible.com/ansible/latest/os_guide/windows_usage.html)
- [Ansible Automation Platform Docs](https://access.redhat.com/documentation/en-us/red_hat_ansible_automation_platform)
- [ServiceNow ITSM Collection](https://github.com/ansible-collections/servicenow.itsm)
- [Event-Driven Ansible](https://www.ansible.com/products/event-driven-ansible)