CAP AND HLD
Starting point¶
No prior HLD knowledge. Starting cold. Goal was to build genuine understanding of distributed systems concepts before drilling interview problems. Student was questioning whether distributed systems belonged on the CV at all.
What we did¶
CAP theorem introduction — first attempt
Introduced CAP with a two-node database replication scenario. This caused immediate confusion because the student didn't have a mental model for why two nodes would each have their own copy of the data. The node/replication framing was wrong for where the student was starting from.
Reset — grounded in PEPPOL flow
Asked the student to describe their own PEPPOL flow. Student described the full system — Invoice API, UBL Generator, Relay Service, webhook handling, distributed locking via row competition, correlation GUIDs, 24hr ambiguity window.
Used this as the actual CAP example:
- UBL Generator writes, raises SendOverPeppol command
- If RabbitMQ goes down at that moment — event lost, Relay never processes
- Student's design: lock invoice in "sending", wait for confirmation, cleanup resets after 24hrs if unresolved
- This is a CP decision — correctness over availability — the right call for compliance
CAP reframed as: when two parts of your system lose contact, do you protect correctness or keep moving. Student already made this decision correctly without knowing the term.
Failure modes walkthrough
Went through each handoff point in the PEPPOL flow:
- Invoice API → DB write: atomic transaction, clean failure, rolls back
- Invoice API → RabbitMQ: publisher confirms + idempotency token + durable queue + persistent messages — covered
- UBL Generator → lock acquisition: row-level competition, rowCount 0 backs off — covered
- UBL Generator → raises SendOverPeppol: same publisher confirms coverage
- Relay → PEPPOL network: ambiguous state — resolved by webhook coming back, 24hr window
- True failure mode identified: invoice sent successfully but webhook never arrives — outside system boundary, SBDH header is not an idempotency key the AP accepts, handled manually
Student correctly identified this as the boundary of what the system can control. Confirmed this is not a design flaw — it's a compliance reality every PEPPOL implementation has.
Relay → Merchant API discussion
Student asked if direct HTTP call from Relay to Merchant API is a violation. Discussed:
- Direct call creates coupling — Relay needs to know Merchant API exists, its URL, its contract
- Relay's stated purpose is network-agnostic — it knows sender and receiver, not tenant specifics
- Event-based is cleaner — Relay raises WebhookReceived, Merchant API consumes it, Relay walks away
- At 15-20 senders and 40-70k invoices/month direct call is operationally fine
- Student confirmed they want to refine this — event-based is the right direction
Distributed systems on the CV
Student questioned whether distributed systems was justified on the CV. Confirmed it is:
- Publisher confirms with idempotency tokens
- Distributed locking via row-level DB competition
- Correlation IDs spanning service boundaries
- Eventual consistency with explicit status lifecycle
- Cleanup service handling ambiguous states
- Conscious CP decision for compliance
Gap is vocabulary not knowledge. Student knows the concepts, doesn't have the interview terms yet.
Saga pattern
Student correctly identified they don't do distributed transactions. Confirmed their design is choreography-based Saga:
- Each service does a local transaction
- Publishes event to trigger next step
- Failures handled by status-based compensation not distributed rollback
- No central orchestrator
Student pushed back on the Saga label initially — was thinking of orchestration-based Saga only. Clarified the two flavours. Student's system matches choreography-based Saga exactly.
Read replicas
Explained what a read replica is — second copy of DB kept in sync automatically at the DB engine level, not application level. Application configures two connection strings, writes go to primary, reads go to replica.
Replication lag explained — small delay between write hitting primary and appearing on replica.
Student identified that stale reads could be dangerous at the point of checking if an order has been invoiced. Worked through why this doesn't break their lock acquisition:
- Stale read lets a second service instance get past the status check
- Lock acquisition hits primary via UPDATE WHERE ord_status = 'created'
- Primary has the truth — returns rowCount 0 to the loser
- Student arrived at this reasoning independently
Named the pattern: optimistic reads, pessimistic writes.
Explained PostgreSQL row-level locking — when two UPDATEs hit the same row simultaneously, PostgreSQL serialises them. Second waits, first commits, second finds no matching row, returns rowCount 0. Student confirmed they understood PostgreSQL buffers the second request.
Full defence in layers confirmed:
- Stale read slips through soft gate
- Lock acquisition hits primary
- PostgreSQL serialises concurrent attempts
- Winner gets rowCount 1, loser gets rowCount 0
Database per service
Explained the principle — each service owns its data exclusively, no other service touches its tables directly.
Went through all five gotchas:
Gotcha 1 — No cross-service joins. Solutions: API composition (call each service, join in memory — simple but painful at scale) or read model via events (separate reporting DB populated by consuming events — what production systems actually do).
Gotcha 2 — No distributed transactions. Solution: choreography-based Saga — local transactions per service, compensating actions on failure. Student already has this.
Gotcha 3 — No referential integrity across services. DB foreign keys only work within one DB. Enforce at application level — validate before writing, or accept eventual consistency with reconciliation.
Gotcha 4 — Reporting becomes hard. Solution: dedicated reporting service with denormalised read store, populated by events from all services.
Gotcha 5 — Schema changes move coupling from DB layer to event schema layer. Solution: event versioning — publish v1 and v2 simultaneously during transition, consumers migrate at own pace.
Student's current boundaries confirmed as defensible — Invoice API and UBL Generator sharing a DB is correct (same bounded context), Relay having its own store is correct (different bounded context).
Reporting DB and read model
Explained the full pattern:
- All services publish events to RabbitMQ
- Read Model Service consumes all events
- Writes to a denormalised flat reporting DB — one row per invoice, all fields pre-joined
- Dashboard BFF reads from reporting DB directly
- No cross-service joins at query time
Showed the flat table structure — order fields, invoice fields, peppol fields, payment fields all in one row. Showed event handlers that progressively build the row as events arrive.
Explained eventual consistency in the read model — row builds up over time as events flow through. Acceptable for dashboards, not for transactional operations.
Showed aggressive indexing strategy on reporting DB — read-only from application perspective, can index heavily without write performance impact.
BFF pattern
Student identified the canonical pattern correctly — each service is the access point to its own DB, BFF sits in front and aggregates.
BFF confirmed as external facing, registered client with auth server, aggregates data for a specific frontend's needs.
Infrastructure top-down
Covered every layer — DNS, CDN, Load Balancer, API Gateway, Services, Cache, Database, Object Storage. Each layer explained with its job and relevance to student's stack.
Load balancer vs API Gateway distinction clarified — student correctly questioned the ordering. Confirmed there are actually two load balancers in a large system: one in front of API Gateway, one in front of each service. In smaller systems they collapse into one. Nginx in student's stack does both.
YARP as API Gateway
Covered APIM vs YARP. YARP confirmed as the right fit — cloud agnostic, runs in Docker container, sits in Swarm stack, full control, free.
Showed complete YARP implementation:
- Program.cs with JWT validation and claims injection middleware
- appsettings.json routing config with AuthorizationPolicy per route
- Invoice API changes — JWT validation removed, replaced with custom authentication handler that reads injected headers and builds ClaimsPrincipal
- Docker Swarm stack with only YARP port exposed to host
- Full request flow for POST /invoices
Student suggested keeping [Authorize] attribute and building ClaimsPrincipal from headers instead of removing it. Correct instinct — implemented via custom authentication handler inheriting from AuthenticationHandler. Controllers stay identical.
Public vs authenticated endpoints — webhook endpoint must be public (PEPPOL access point can't send JWT). HMAC signature verification and IP whitelisting as webhook security patterns.
BFF auth flow
Covered full auth code flow without PKCE — BFF is confidential client, keeps client secret server side. Angular redirects to Auth Server, user logs in on Auth Server page, Auth Server redirects to BFF callback with auth code, BFF exchanges code for tokens using client secret, BFF issues session cookie to Angular.
Student questioned why callback can't be on Angular — confirmed it can but then Angular either needs the client secret (exposed in browser) or you use PKCE and tokens live in browser. BFF callback exists to keep exchange server side.
Student correctly identified the callback is just a GET endpoint, no UI needed.
Angular auth state — BFF exposes /me endpoint. Angular calls it on load, gets user info (userId, tenantId, name, roles), stores in AuthService signal. Cookie is HttpOnly so JavaScript can't read it directly.
Token storage — BFF stores tokens server side, either Redis (session ID in cookie, tokens in Redis) or encrypted cookie (tokens encrypted inside cookie). ASP.NET Core cookie auth middleware handles encrypted cookie by default.
Student correctly identified that BFF only needs to decrypt the cookie when making a downstream resource call, not on every request. Middleware just validates signature and expiry on every request — lightweight.
Concepts covered¶
CAP theorem — consistency, availability, partition tolerance. P is not optional in a distributed system. Real choice is CP or AP when partition happens. Student's PEPPOL system is CP-leaning — correctness over availability for compliance reasons.
Network partition — not split data. Both nodes alive, both have full data, network between them is cut. Replication sync breaks. Copies diverge.
Replication — DB engine keeps two copies in sync automatically. Application just uses two connection strings. Not application-level responsibility.
Replication lag — small delay between write hitting primary and appearing on replica. Soft gate (status check) can be fooled by stale read. Hard gate (lock acquisition on primary) cannot.
Optimistic reads, pessimistic writes — read loosely, accept potential staleness. Write operation itself is the real guard. Student arrived at this pattern naturally.
PostgreSQL row-level locking — exclusive lock on row when UPDATE touches it. Concurrent UPDATEs serialised automatically. No application-level coordination needed.
Saga pattern — sequence of local transactions, each publishing an event to trigger the next step, failures handled by compensating actions not distributed rollback. Two flavours: choreography (student has this) and orchestration (heavyweight, central coordinator).
Database per service — service is the only access point to its own DB. Five gotchas: no joins, no distributed transactions, no referential integrity, hard reporting, schema coupling moves to event layer.
Read model via events — separate denormalised reporting DB populated by consuming events. Eventual consistency acceptable for dashboards. Enables complex cross-service queries without joins.
BFF pattern — Backend for Frontend. Server-side aggregator for a specific frontend's data needs. Owns the auth code flow, holds tokens server side, issues session cookie to frontend.
Auth code flow without PKCE — BFF as confidential client. Angular initiates redirect, Auth Server handles login, redirect lands at BFF callback (just a GET endpoint), BFF exchanges code for tokens using client secret, issues cookie. Angular calls /me endpoint to get user info.
YARP — Microsoft reverse proxy library. Runs as ASP.NET Core app in Docker container. Handles routing, auth, rate limiting. Fits cloud-agnostic self-hosted philosophy.
Custom authentication handler — AuthenticationHandler subclass that builds ClaimsPrincipal from injected headers instead of JWT. Keeps [Authorize] attribute and User.FindFirst() calls intact in controllers. Trusts gateway-injected headers, validated by shared internal secret.
What we messed up¶
CAP explanation started wrong
Jumped straight to two-node database replication before establishing that two nodes each have their own copy of the data. Student had no mental model for why that would be the case. Caused significant confusion.
Fix: should have asked about student's existing architecture first, then grounded the explanation in their actual system. That's what we ended up doing anyway — just took longer to get there.
Remember: always ground new concepts in the student's existing system before introducing abstract scenarios.
Saga label pushed back
Called student's design a Saga too early without distinguishing the two flavours. Student associated Saga with orchestration-based only — the heavyweight version with a central coordinator. This was a fair pushback.
Fix: always distinguish choreography-based vs orchestration-based Saga upfront. Student has choreography-based. They should not claim orchestration-based Saga.
Key values and config to remember¶
| Item | Value |
|---|---|
| Manager VM | aver-manager, 10.1.0.4, 20.219.66.48 |
| Worker VM | aver-worker, 10.1.0.5, 52.140.55.120 |
| Swarm address pool | 10.20.0.0/16 |
| VNet | aver-vnet, 10.1.0.0/16 |
| Blob Storage | averblobstore, West India |
| Key Vault | peppol-vault, West India |
| API internal port | 8080 |
| YARP gateway | only published port in Swarm stack |
| Cookie type | HttpOnly, Secure, SameSite Strict |
Unanswered questions / things to investigate¶
- Does the cleanup service cover invoices stuck in "created" status (post DB commit, pre RabbitMQ publish) or only invoices stuck in "sending"? This was flagged as a potential gap and not resolved.
- When cleanup resets a stuck invoice, does it reset to pre-UBL or pre-invoice? UBL schema already generated — regenerating risks breaking correlation.
- PEPPOL network query API — is there any way to query send status directly rather than depending entirely on the webhook? Flagged as unknown.
- Redis vs encrypted cookie for BFF token storage — not decided for student's stack.
- Token refresh handling in BFF — covered conceptually, not implemented.
What's next¶
- HLD framework — five steps, apply to every problem
- First drill problem — URL shortener, student drives clarifying questions
- Capacity estimation practice — back of envelope numbers
- Core building blocks drill — cache strategies, sharding, consistent hashing
- Revisit cleanup service gap — confirm whether "created" status is covered alongside "sending"