Skip to content

Pagination

Pagination — Session Recap


Starting point

Fresh topic. No prior pagination context. Starting from zero as part of the DB Design pillar drill series. Student comfortable with basic SQL, no prior exposure to keyset mechanics or cursor design.


What we did

  1. Explained why pagination exists — unbounded queries destroy memory, network, and client rendering
  2. Explained offset pagination mechanics — how LIMIT/OFFSET works, what the DB actually does internally
  3. Explained what an index does and does not fix for offset pagination
  4. Introduced the phantom row problem
  5. Explained keyset pagination mechanics — cursor structure, tie-breaker, constant cost
  6. Drilled three real-world scenarios — export, load more, page numbers
  7. Covered filters with keyset — composition, filter change behaviour
  8. Covered sort changes with keyset — cursor invalidation, restart
  9. Covered dynamic cursor structure for multiple sort columns
  10. Covered filter encoding in cursor — canon for public API vs internal tool
  11. Searched production canon — confirmed Google/Stripe/GitHub pattern
  12. Drilled keyset SQL — student wrote page 1 and page 2 queries
  13. Covered index design for keyset queries — equality, sort, range order
  14. Introduced and covered backward pagination — the flip trick, reverse in code
  15. Showed production grade C# implementation — controller, service, query builder, cursor helper
  16. Discussed EF Core vs Dapper for keyset — why raw SQL wins on the read path
  17. Connected keyset to CQRS read path pattern

Concepts covered

Why pagination exists

Unbounded queries on large tables destroy memory, network, and client rendering. Pagination returns a subset and gives the client a way to ask for the next subset.

Offset pagination — how it actually works

SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 0;   -- page 1
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 20;  -- page 2
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 9980; -- page 500

Server computes OFFSET = (page - 1) * page_size. API surface: GET /orders?page=2&page_size=20.

The DB does not jump to the offset position. It scans through every row before the offset, counts them, discards them, then returns the next N. Cost is linear with depth.

What an index on the sort column does: Eliminates the filesort — rows are already in order in the index, no in-memory sort needed. For page 1 this is a massive win. For page 500 the index still removes the sort cost but cannot remove the skip cost — the DB still walks 9,980 index entries and discards them. The win gets proportionally smaller the deeper you go.

The phantom row problem: Structural — cannot be fixed with offset pagination. New row inserted at the top shifts every offset. A row the client already saw appears on the next page again. A deleted row disappears entirely from the result set across pages. The client never knows.

Keyset pagination — core mechanics

Never skips rows. Uses the last row's sort column value as the position anchor.

-- Page 1 — no cursor
SELECT * FROM orders
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Page 2 — cursor decoded from last row of page 1
SELECT * FROM orders
WHERE created_at < '2024-01-15 10:30:00'
   OR (created_at = '2024-01-15 10:30:00' AND id < 'uuid')
ORDER BY created_at DESC, id DESC
LIMIT 20;

The DB seeks directly to the index entry at the cursor position and reads forward. Page 1 and page 50,000 are identical cost.

Why the OR tie-breaker: Two rows can share the same created_at. Using only created_at < cursorDt would exclude both — rows lost silently. (created_at, id) is always unique, making the boundary unambiguous.

Why id alone is not enough as cursor: WHERE id < 8821 ORDER BY amount DESC is garbage. ID encodes insertion order, not amount order. A row with id = 100 could have amount = 50000. The cursor must encode the value of whatever column you are sorting by. id is only the tie-breaker.

What keyset breaks: - No arbitrary page jumping — cannot go to page 47 without walking pages 1–46 - Sort change invalidates cursor — must restart from page 1 - Filter change invalidates cursor — must restart from page 1 - No cheap total count — COUNT(*) is expensive; cache it, approximate it, or omit it

The three scenario decision

Use case Strategy Reason
Export 2M rows to Excel Async job + polling + download link Pagination is the wrong tool. Schedule job, client polls for status, server returns download URL when ready
Load more / infinite scroll Keyset Constant cost, no phantom rows, matches the UX pattern exactly
Page 3 of 140 admin screen Offset Arbitrary jumping required, total count required — pay the cost or cache it

The export answer was unprompted and correct — recognised that pagination was the wrong abstraction entirely for that use case.

Total count cost

COUNT(*) with filters on a large table cannot be avoided if you need exact totals. Options in practice: accept the cost on every request, cache with short TTL, show approximate counts ("140+"), or compute only on the first page request and carry it forward client-side.

Dynamic cursor structure

Single object handles all sort combinations. Server reads sort_column and sort_direction to know which inequality to apply:

{
  "sort_column": "amount",
  "sort_direction": "asc",
  "sort_value": 149.99,
  "id": "some-uuid"
}

Filters with keyset

Filters compose cleanly as additional WHERE conditions. The cursor defines position, filters define which result set. They are separate concerns.

WHERE status = 'pending'
AND (amount > 149.99 OR (amount = 149.99 AND id > 'uuid'))
ORDER BY amount ASC, id ASC
LIMIT 20;

Filter change = cursor invalid = reset to page 1. The cursor was pointing to a position in a filtered result set that no longer exists.

Filter encoding in cursor — production canon

Context Canon Reason
Public API (Stripe, GitHub, Google) Encode filters inside cursor Cursor is fully self-contained, server validates on decode, rejects and resets if mismatch, client cannot tamper
Internal ERP Trust the client Frontend resets cursor on filter change, no over-engineering needed

Google, Stripe, Shopify all use fully opaque page tokens that encode sort values, direction, and filter state. Client treats it as a black box and passes it back verbatim.

Cursor encryption vs signing

Tampering with the cursor gives the client a different position in the result set — no privilege escalation, no writes. For internal ERP with authenticated users plain base64 is fine.

For public APIs where cursor encodes sensitive internals: HMAC sign the cursor. Append a truncated HMAC of the payload signed with a server secret. On decode verify signature — mismatch returns 400.

Encryption is overkill in most cases. Only justified if the cursor encodes genuinely sensitive business data.

Index design for keyset queries

Query:

WHERE status = 'pending'
AND (amount > 149.99 OR (amount = 149.99 AND id > 'uuid'))
ORDER BY amount ASC
LIMIT 20;

Correct index: (status, amount, id)

Rule: Equality columns first, sort column second, range/tie-breaker last.

Why this order precisely: The index is a B-tree — rows physically sorted in declaration order. With (status, amount, id): 1. Binary search lands directly on the status = 'pending' section — all pending rows contiguous 2. Within pending, amount is already sorted — DB seeks to > 149.99 and walks forward, no filesort 3. Within matching amounts, id is sorted — tie-breaker resolves cleanly

If amount came after id: within each status rows are sorted by id first. Amount is scattered. DB has to read all pending rows and sort by amount in memory — filesort. Index useless for the sort.

The bookshelf analogy: Books sorted by spine label (status, amount, id). Walk to the pending shelf, find the 149.99 section, everything to the right is your result set, pull 20 books. Zero wasted work.

The precise rule: The index must match the sort order of the query for the DB to avoid a filesort. Range columns go last because once you apply a range, the index cannot help with anything to its right — you are already scanning within that range.

Backward pagination — the flip trick

The user's sort direction is a display contract — it never changes. The trick is to flip the DB query direction to make LIMIT land on the adjacent rows, then reverse the result in application code.

Without the flip: WHERE amount > 499 ORDER BY amount DESC LIMIT 20 returns the global top 20 rows above 499 — could be several pages away, not the adjacent page.

With the flip:

-- User wants amount DESC, currently on page 3, first row amount = 499
-- To get page 2: flip to ASC, fetch 20 rows above 499, reverse result

SELECT * FROM orders
WHERE amount > 499
   OR (amount = 499 AND id > 'uuid')
ORDER BY amount ASC, id ASC   -- flipped
LIMIT 20;

-- Result comes back: 500, 510, 520 ... 699 (ASC)
-- Reverse in code: 699 ... 520, 510, 500 (DESC)
-- Client sees correct page 2 in correct order

Operator matrix:

User sort Query direction DB operator DB ORDER BY After fetch
ASC Forward > ASC No reverse
DESC Forward < DESC No reverse
ASC Backward < DESC Reverse
DESC Backward > ASC Reverse

The limit + 1 trick: Always fetch one extra row. If you get limit + 1 back there is a next page in that direction. Slice it off before returning. Detects hasMore without any COUNT query.

Response shape

Every page response returns two cursors:

{
  "data": [...],
  "next_cursor": "...",
  "prev_cursor": "...",
  "has_next": true,
  "has_prev": true
}

next_cursor encodes the last row — follow to go forward. prev_cursor encodes the first row — follow to go backward. Client sends whichever cursor it wants to follow. Server detects direction from which token was sent.

EF Core vs Dapper for keyset

EF Core's expression tree system makes dynamic keyset WHERE clauses painful — requires Expression<Func<T, bool>> gymnastics to build the condition dynamically. Doable but ugly.

Raw SQL with Dapper: build the string, add parameters, done. Cleaner, more readable, full control.

Keyset lives entirely on the read path. EF Core for writes where change tracking and unit of work matter, Dapper with raw SQL for reads where you want exact SQL control. No conflict.

CQRS connection

Greg Young's original CQRS was only this: separate the model you use for reads from the model you use for writes. No specific technology mandated. EF Core writes + Dapper reads against the same Postgres DB is legitimate CQRS.

The "CQRS must mean Event Sourcing + projections into a separate read store" interpretation came from the community building on top of the concept — not from the original idea. A separate read store is only justified when query patterns are genuinely incompatible with the normalised schema, or when you need independent read scaling.


What we messed up

"Selectivity goes down as pages go deeper" — said this confidently without being able to back it up when pushed. Selectivity is a real concept but it was not the relevant point. The actual point was that the index removes the sort cost but cannot remove the skip cost. Selectivity refers to how many distinct values a column has relative to total rows — a separate concept that affects whether the query planner chooses to use the index at all. Flagged for a dedicated drill.

ORDER BY syntax slip — wrote ORDER BY created_at, id DESC when intending both columns descending. DESC only applies to the column immediately before it. Correct form is ORDER BY created_at DESC, id DESC. Each column needs its own direction keyword.

Filter encoding confusion — went in circles on whether filters need to be inside the cursor. Kept circling back to the same question after it was answered. The resolution: filters define the result set, cursor defines position within it. They are separate concerns. Encoding filters in the cursor is a tamper-resistance design choice, not a correctness requirement. The correctness requirement is that filter change = cursor reset.

"Limit -20" for backward pagination — reached for this instinctively but it is not valid SQL. The correct mechanism is flipping the ORDER BY direction and reversing the result in code. The instinct was right — go in the opposite direction — the implementation is the flip plus reverse.

Tried id-only as cursor while sorting by amountWHERE id < 8821 ORDER BY amount DESC. Deflected into "ids aren't always linear" and then "what if ids are guids" — both true but neither is the core reason it breaks. Core reason: ID encodes insertion order, not amount order. The two have no relationship. Even with perfectly sequential IDs the results would be wrong.


Key values and config to remember

Thing Value / Pattern
Cursor encoding Base64 JSON of sort column value + id + sort metadata
Tie-breaker column Always id — unique, stable
Index column order Equality → Sort column → Range/tie-breaker
Filter change Always reset cursor to page 1
Sort change Always reset cursor to page 1
Total count COUNT(*) — expensive, cache or approximate
Public API cursor Encode sort + direction + filters + id, HMAC sign
Internal ERP cursor Encode sort + direction + id, trust client on filters
hasMore detection Fetch limit + 1, check if result count exceeds limit
Backward pagination Flip ORDER BY direction, reverse result in code
EF Core vs Dapper EF Core for writes, Dapper for reads on keyset queries

Unanswered questions / things to investigate

Index selectivity — came up twice, never properly understood. What is selectivity, what is cardinality, how does the query planner decide whether to use an index, when does it ignore your index even when one exists. Needs a dedicated drill.

Backward pagination edge cases — what happens on the very first page when there is no prev_cursor? What happens when the result set is smaller than the page size on a backward query? The production code handles these but we didn't walk through them explicitly.

NULL handling in sort columns — search results surfaced NULLS LAST handling in keyset cursors during the production canon search. Not discussed. What happens when your sort column has NULLs — does the cursor break, does the inequality behave correctly.

Covering indexes — mentioned in passing during index discussion. When does the DB serve the entire query from the index without touching the table. Relevant for the LIMIT 20 case where the index could theoretically return all needed columns.

Cursor expiry — search results mentioned cursors can become invalid if records are deleted or if APIs enforce time-based expiry. Not discussed. Should cursors have TTLs in production and how do you handle an expired cursor gracefully.


What's next

  1. Index selectivity drill — fills the gap exposed twice in this session. What selectivity is, how the query planner uses it, when it ignores your index.
  2. API Design pillar — all three pillars have been touched. SQL done, DB Design done. API + Application Design is next: REST resource modelling, idempotency, consistency patterns, versioning.
  3. Backward pagination edge cases — optional but worth a quick pass to complete the keyset picture before interviews.