Skip to content

AverAzure — Day 5 Session Context: Azure VMs + Docker Swarm

Starting point

AverAzure stack previously proved working on: - Local (Podman on Fedora) via docker compose - VPS (real Docker alongside Coolify) via docker compose

Goal today: deploy the same stack on genuine Azure infrastructure — two Azure VMs in a Docker Swarm cluster behind a VNet with NSG rules. This is the final piece of the Azure learning plan.

VPS was restored from backup at the start of the session (broken from previous Swarm experiments).


Concepts covered before touching the portal

Docker Swarm network layers (three separate networks, three separate jobs)

ingress network - Hosts Docker's IPVS load balancer - Owns published ports (8080, 15672, 5341, 8081) - When you curl any node on port 8080, IPVS intercepts and routes to whichever container holds a replica — even if that container is on a different VM - This is Swarm's routing mesh — hit any node, reach any service

docker_gwbridge - A virtual Ethernet bridge that lives on each VM - Connects the ingress mesh to containers via veth pairs (virtual cables in the kernel) - Once IPVS picks a node, gwbridge hands the packet into the container's network namespace via the container's eth0 - This is why ASPNETCORE_URLS must be http://0.0.0.0:8080 not http://+:8080 — gwbridge is IPv4 only, Kestrel must match

overlay network (aver-overlay) - Spans both VMs — a flat virtual network for container-to-container traffic - Each container gets its own IP from the Swarm pool (10.20.0.0/16): api=10.20.0.4, rabbitmq=10.20.1.6, seq=10.20.2.5 - Your code says rabbitmq:5672 — Swarm resolves the name to the container IP - Packet gets wrapped in VXLAN UDP (port 4789) using the VM IPs as the outer envelope - Ships across the Azure VNet to the other VM, gets unwrapped, delivered to the container - Without overlay: containers share the VM's IP, can't be addressed individually, you'd hardcode VM IPs everywhere

Why you need NSG rules even though overlay handles container traffic

NSG operates at the VM level. The overlay rides on top of it. VXLAN packets travel between VMs as UDP 4789 using the VM IPs — NSG sees these. If NSG blocks UDP 4789, the VXLAN tunnel never forms and containers can't find each other across VMs.

IP ranges and what they are

  • 10.x.x.x is a private range — only meaningful inside your network, never routed on public internet
  • 10.1.0.0/16 — VNet address space, Azure owns this range for your VNet
  • 10.1.0.0/24 — subnet, where VMs get their IPs (10.1.0.4, 10.1.0.5)
  • 10.20.0.0/16 — Swarm overlay pool, container IPs assigned by Docker
  • These are completely separate layers. NSG sees VM IPs. Overlay sees container IPs.
  • /16 = 65,536 addresses. /24 = 256 addresses. The number after / = fixed bits, rest are free to assign.

VNet vs subnet

  • VNet (aver-vnet) = the plot of land. Owns 10.1.0.0/16. Nothing from public internet gets in unless explicitly allowed.
  • Subnet (default) = a carved-out section of that land. 10.1.0.0/24. VMs get IPs from here.
  • Multiple subnets can exist in one VNet with different rules.
  • VMs are placed in a subnet at creation time — can't change region or VNet after.

NSG vs UFW

  • UFW lives inside the VM — traffic reaches the VM first, then gets filtered
  • NSG sits outside the VM at Azure network level — traffic filtered before reaching the VM
  • NSG replaces UFW on Azure. No need for UFW when NSG is properly configured.
  • Can associate NSG to subnet (all VMs get same rules) or to individual NIC (per-VM rules)
  • Subnet association is cleaner — one place to manage rules for all nodes

Azure VM hardening vs VPS hardening

VPS: you manually disable root login, set up UFW, install fail2ban, configure unattended upgrades. Azure VMs: root SSH disabled by default, NSG replaces UFW, fail2ban unnecessary when port 22 scoped to your IP, automatic patching available at provisioning time. Significantly less ops work.

Azure VM pricing

  • Pay per second, no upfront commitment
  • B1s at $0.0145/hr — pennies for a few hours
  • Free tier: 750 hours/month of B1s for 12 months
  • Deallocate (not just stop) to stop compute charges
  • Static public IPs charge even when VM is stopped — delete them if not coming back
  • Diagnostic logs, outbound data transfer, disk — all can add up silently. Set a billing alert.

DefaultAzureCredential chain on a plain Azure VM

Tries in order: EnvironmentCredential → WorkloadIdentityCredential → ManagedIdentityCredential → Visual Studio → Azure CLI → PowerShell → Azure Developer CLI. On a plain VM with no Managed Identity and no env vars: all fail with CredentialUnavailableException. Fix options: 1. Assign Managed Identity to the VM (Azure-specific, breaks cloud-agnostic design) 2. Pass SP credentials as exported env vars before docker stack deploy

docker stack deploy vs docker compose — env var handling

docker compose reads .env files automatically. docker stack deploy does not — it's a Swarm command, not a Compose command. Variables in stack file (${AZURE_TENANT_ID}) must be exported in the shell before deploying. Swarm resolves them at deploy time, bakes the actual values into the service definition, stores in Raft log, sends to whichever node runs the container. The shell session is irrelevant after deployment.

Docker Secrets

  • Values stored encrypted (AES-256) in Swarm's Raft log, distributed across manager nodes
  • Mounted as tmpfs files at /run/secrets/<name> inside containers — in-memory, never written to disk
  • Immutable — to change a secret: remove it, recreate it, force service update
  • Not auto-read as env vars — need an entrypoint script to export them before app starts
  • Canonical pattern:
# Create secrets on manager
echo "value" | docker secret create azure_tenant_id -
echo "value" | docker secret create azure_client_id -
echo "value" | docker secret create azure_client_secret -
# Stack file
services:
  api:
    secrets:
      - azure_tenant_id
      - azure_client_id
      - azure_client_secret
secrets:
  azure_tenant_id:
    external: true
  azure_client_id:
    external: true
  azure_client_secret:
    external: true
# entrypoint.sh
#!/bin/sh
export AZURE_TENANT_ID=$(cat /run/secrets/azure_tenant_id)
export AZURE_CLIENT_ID=$(cat /run/secrets/azure_client_id)
export AZURE_CLIENT_SECRET=$(cat /run/secrets/azure_client_secret)
exec dotnet AverAzure.dll

Secrets management honest assessment

  • Docker Secrets: encrypted at rest, never on disk, but no rotation, no versioning, no audit trail
  • Managed Identity + Key Vault: best on Azure, zero credentials anywhere, but Azure-specific
  • HashiCorp Vault: canonical cloud-agnostic solution, works anywhere
  • For current architecture: Docker Secrets for three SP bootstrap credentials, Key Vault for everything else
  • The chicken-and-egg: SP credentials are needed to authenticate to Key Vault. They're the only three secrets you need to protect externally. Everything else lives in the vault.

Swarm manager quorum

  • Odd number of managers needed for fault tolerance (1, 3, 5)
  • 1 manager: no redundancy — manager dies, cluster unavailable
  • 3 managers: tolerates 1 failure
  • Workers don't vote — only managers count for quorum
  • Current setup: 1 manager, 1 worker. Fine for learning. Production PEPPOL would need 3 managers minimum.

What we messed up

Region mismatch (main mistake of the day)

Created VNet and NSG in West India to match existing resources. Azure subscription doesn't support VM creation in West India — B1s not available. Had to delete both and recreate in South India. Lesson: Check VM availability in target region before creating dependent networking resources. VNet and NSG are region-locked at creation — cannot be moved, only deleted and recreated. Azure Resource Mover exists but is overkill for empty resources.

NSG Allow-Swarm scoped to Any initially

Set source to Any on the 2377 rule — correctly challenged. Security by obscurity is not security. Fixed to 10.1.0.0/24 — only VMs within the subnet can reach the Swarm control plane port.

Forgot ASPNETCORE_URLS fix

Stack file still had http://+:8080 from the VPS session. Changed to http://0.0.0.0:8080 before deploying — same fix as the VPS, same reason (Swarm overlay IPv4 only).

.env file not picked up

Created .env file on the manager thinking docker stack deploy would read it like compose does. It doesn't. Had to export the three SP credentials in the shell before redeploying.


Step by step: what we did in the portal

1. Create VNet

  • Search Virtual Networks → Create
  • Resource group: learn_week_1
  • Name: aver-vnet
  • Region: South India
  • Address space: 10.1.0.0/16
  • Subnet name: default
  • Subnet range: 10.1.0.0/24
  • All security options (Bastion, Firewall, DDoS): Disabled
  • Hit Create

2. Create NSG

  • Search Network Security Groups → Create
  • Resource group: learn_week_1
  • Name: aver-nsg
  • Region: South India
  • Hit Create
  • Open aver-nsg → Inbound security rules → Add four rules (see table above)
  • Then: Subnets → Associate → select aver-vnet → select default subnet

3. Create aver-manager VM

  • Search Virtual Machines → Create → Virtual machine
  • Resource group: learn_week_1
  • Name: aver-manager
  • Region: South India
  • Image: Ubuntu Server 24.04 LTS x64
  • Size: Standard_B1s (must search — default selection is expensive)
  • Authentication: SSH public key
  • Username: azureuser
  • SSH key source: Generate new key pair
  • Key pair name: aver-key
  • Public inbound ports: None (NSG on subnet handles this)
  • Disks tab: leave defaults
  • Networking tab:
  • Virtual network: aver-vnet
  • Subnet: default (10.1.0.0/24)
  • NIC NSG: None (subnet NSG already covers it)
  • Tick: Delete public IP and NIC when VM is deleted
  • Review + create → Create
  • DOWNLOAD aver-key.pem IMMEDIATELY when prompted — cannot retrieve again

4. Create aver-worker VM

  • Same as manager except:
  • Name: aver-worker
  • SSH key source: Use existing key stored in Azure → select aver-key
  • Everything else identical

5. Note IPs

  • aver-manager: public 20.219.66.48, private 10.1.0.4
  • aver-worker: public 52.140.55.120, private 10.1.0.5

Step by step: terminal commands

On local machine

chmod 400 ~/Downloads/aver-key.pem
ssh -i ~/Downloads/aver-key.pem azureuser@20.219.66.48

On aver-manager

# Install Docker
curl -fsSL https://get.docker.com | sudo sh

# Add user to docker group
sudo usermod -aG docker azureuser
newgrp docker

# Init Swarm with custom address pool
docker swarm init --advertise-addr 10.1.0.4 --default-addr-pool 10.20.0.0/16 --default-addr-pool-mask-length 24

# Note the join token from output — needed for worker

# Clone repo
git clone https://<pat>@github.com/abhishek0052off/averazure.git
cd AverAzure

# Fix ASPNETCORE_URLS in stack file
nano docker-stack.yml
# Change http://+:8080 to http://0.0.0.0:8080

# Login to GHCR
echo <pat> | docker login ghcr.io -u abhishek0052off --password-stdin

# Export SP credentials
export AZURE_TENANT_ID=...
export AZURE_CLIENT_ID=...
export AZURE_CLIENT_SECRET=...

# Deploy stack
docker stack deploy -c docker-stack.yml --with-registry-auth aver

# Verify
docker service ls
docker node ls
curl http://20.219.66.48:8080/health

On aver-worker (separate terminal)

ssh -i ~/Downloads/aver-key.pem azureuser@52.140.55.120

# Install Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker azureuser
newgrp docker

# Join Swarm
docker swarm join --token SWMTKN-1-5ogfj51i964sonnj2ja0z6mkbpz17q3rtof17ieaeje6fsfsop-9ms9q3xqwfkedq4t6fsp4go51 10.1.0.4:2377

Proved working

  • GET /health → 200 from both curl and browser
  • POST /api/invoices → 200, file uploaded to averblobstore, InvoiceUploadedEvent published to domain.events exchange, consumer received from aver.invoice-uploaded.queue, ACKed, logged in Seq
  • POST /api/images → Azure Function triggered on originals container, thumbnails and web versions generated
  • Scalar UI at http://20.219.66.48:8080/scalar/v1
  • Seq at http://20.219.66.48:8081

Full end to end flow proved on genuine Azure infrastructure.


What's next

  • Implement Docker Secrets + entrypoint script for SP credentials — before calling secrets story complete
  • GitHub Actions CI/CD — auto build, push to GHCR, redeploy stack on push to main
  • Interview narrative across all five services
  • Update PEPPOL resume bullet to reflect NSG rules and Swarm cluster specifics
  • DEALLOCATE both VMs when not in use — portal → VM → Stop → confirm Stopped (deallocated)

How secrets work in this stack

Three Azure Service Principal credentials are stored as Docker Secrets on the Swarm manager using docker secret create. Swarm encrypts them and stores them in its Raft log — distributed across manager nodes, never written to disk in plain text.

When the api container starts, Swarm mounts the secrets as files inside the container at /run/secrets/azure_tenant_id, /run/secrets/azure_client_id, and /run/secrets/azure_client_secret.

The entrypoint.sh script runs before the .NET app starts. It reads each file and exports the value as an environment variable:

export AZURE_TENANT_ID=$(cat /run/secrets/azure_tenant_id)

The .NET app starts, DefaultAzureCredential picks up the env vars via EnvironmentCredential, authenticates to Azure Blob Storage. No credentials in the stack file, no exports needed on the shell, no .env files on disk.

What survives a VM restart:

  • Docker Secrets — yes, stored in Swarm's Raft log
  • Running containers — yes, Swarm restarts them automatically with the same config
  • Shell exports — no, but they're not needed anymore

The one manual step that was ever needed: Creating the secrets once on the manager via docker secret create. After that, everything is automatic.