Skip to content

Twitter Feed — HLD Session Recap


Starting Point

CAP theorem, distributed systems foundations, infrastructure top-down view, BFF, YARP, auth flow covered in previous sessions. Twitter feed introduced as the first student-driven HLD problem. Student had never attempted a feed design problem. Goal was to apply the five step framework and derive the architecture from first principles rather than memorise a solution.


Capacity Estimation

Student derived all numbers independently.

Assumptions stated upfront: - 200 million daily active users - Each user loads feed 50 times per day - 20 tweets per feed request - 500 million tweets posted per day - Average following: 200 people - Celebrity users exist with 100M+ followers

Derivations:

Feed reads:
200M users * 50 loads/day = 10B reads/day
10B ÷ 86,400 = ~100,000 reads/second

Tweet writes:
500M tweets/day ÷ 86,400 = ~5,000 writes/second

Read/write ratio: 20:1 read dominant

Fan-out volume:
5,000 tweets/sec * 200 avg followers = 1,000,000 feed updates/second

Storage per tweet:
tweet_id    8 bytes
user_id     8 bytes
content     280 bytes
created_at  8 bytes
= ~300 bytes per tweet

Total storage:
300 bytes * 5,000/sec = 1.5MB/sec
1.5MB * 86,400 = ~130GB/day
130GB * 365 = ~47TB/year
47TB * 5 = ~235TB over 5 years  sharding required

What the numbers drove: - 100,000 reads/sec → cannot serve from DB, cache mandatory - 1,000,000 fan-out updates/sec → async processing mandatory - 235TB → sharding required, single node not viable


The Core Problem — Why DB Joins Don't Work

Student derived this themselves before seeing the solution.

Naive approach — join on read:

GET /feed?userId=123
→ SELECT follows WHERE follower_id = 123    → 200 following IDs
→ SELECT tweets WHERE user_id IN (200 IDs)  → recent tweets
→ sort by created_at DESC
→ return top 20

At 100,000 requests/second:

100,000 requests/sec * 200 user lookups each = 20,000,000 DB lookups/sec

DB collapses. Every feed load is a heavy join across 200 users. Cannot scale.

The insight student derived: pre-compute at write time, serve from cache at read time. The DB join happens once when a tweet is posted — not 100,000 times per second when feeds are loaded.


Fan-Out on Write — The Write Path

Student derived the full write path independently.

User posts tweet
→ Tweet Service
→ generate Snowflake ID
→ INSERT into Tweet DB
→ publish tweet.created to RabbitMQ topic exchange
          ↓                          ↓
  Fan-out Service              TweetCache Service
  → SELECT follower_ids         → SET tweet:{tweetId}
    FROM follows                  value: { content, author,
    WHERE following_id = X          created_at... }
  → for each follower:            TTL: 24hrs
    ZADD feed:{followerId}
    score: Snowflake ID
    member: tweetId

Why topic exchange not fanout exchange: Student caught this independently. Fanout sends to every queue with no discrimination. Topic exchange — both services subscribe to tweet.created. Future services that don't need tweet events simply don't subscribe. Clean routing.

Exchange: domain.events (Topic)
Routing key: tweet.created

Fan-out Service     → binds to tweet.created → receives it
TweetCache Service  → binds to tweet.created → receives it
Accounts Service    → not bound              → never sees it

Why Redis sorted sets: 100,000 feed reads/second needs sub-millisecond response. Redis sorted set per user — pre-sorted by Snowflake ID score. One Redis command returns top 20 tweet IDs instantly.

key: feed:{userId}
type: sorted set
score: Snowflake ID (time-ordered, unique)
member: tweetId
max size: 800 entries, oldest evicted automatically

DB query at this volume collapses even with indexing. Redis handles hundreds of thousands of operations per second on a single node.


The Celebrity Problem

Student identified this as a problem before being prompted.

Celebrity has 100M followers. Fan-out Service receives one TweetPosted event. Must write to 100M feed sorted sets.

Two problems:

  1. Takes minutes to fan-out to 100M users. Followers open feed — celebrity tweet not there yet.
  2. Queue backlog — while Fan-out Service processes 100M writes for one celebrity tweet, all other tweet events pile up in queue. Regular users' tweets delayed for everyone.

The fix — hybrid push/pull:

Regular users (< 1M followers) — push model. Fan-out on write. Feed pre-computed.

Celebrity users (> 1M followers) — skip fan-out entirely. Pull at read time.

Threshold configurable. Stored as flag on user record.


The Read Path — Hybrid Model

GET /feed?cursor=5001 for user 123

Step 1  regular feed
ZREVRANGEBYSCORE feed:123 5001 -inf LIMIT 0 50
 tweet IDs from pre-computed feed, score below cursor

Step 2  celebrity list
SMEMBERS celebrity_follows:123  [789, 101]
key: celebrity_follows:{userId}
type: Redis set
value: celebrity IDs this user follows
no TTL, updated on follow/unfollow

Step 3  celebrity tweets
ZREVRANGEBYSCORE tweet:789:latest 5001 -inf LIMIT 0 20
ZREVRANGEBYSCORE tweet:101:latest 5001 -inf LIMIT 0 20
key: tweet:{celebrityId}:latest
type: sorted set, last 800 tweets
score: Snowflake ID
no TTL, size-bounded not time-bounded

Step 4  merge in memory by Snowflake ID descending
Step 5  take top 20
Step 6  fetch tweet details
MGET tweet:5001 tweet:5002 ... tweet:5020

Step 7  return with new cursor = last Snowflake ID

Why no DB involved for hot content: tweet details cached at tweet:{tweetId}. Feed list pre-built in Redis. Celebrity tweet list pre-built in Redis. Three Redis structures, zero DB queries for cache hits.


Redis Structures — Full Reference

feed:{userId}
   sorted set
   score: Snowflake ID
   member: tweetId
   max 800 entries, regular users only
   no TTL

tweet:{tweetId}
   hash, tweet details
   { content, authorId, createdAt, likeCount }
   TTL 24hrs, refreshed on access

celebrity_follows:{userId}
   set of celebrity IDs this user follows
   no TTL
   updated on follow/unfollow

tweet:{celebrityId}:latest
   sorted set, last 800 tweets for celebrity
   score: Snowflake ID
   no TTL, self-managing by size

Cursor Pagination — How It Works

Cursor = Snowflake ID of last tweet seen. Applied equally to both sources.

Snowflake ID encodes timestamp in high bits. Globally unique and time-ordered. No two tweets across any source have the same ID. Sort by Snowflake ID descending = globally correct chronological order.

cursor = 5001

Regular feed:
ZREVRANGEBYSCORE feed:123 5001 -inf LIMIT 0 50
→ all tweet IDs with score below 5001

Celebrity feed:
ZREVRANGEBYSCORE tweet:789:latest 5001 -inf LIMIT 0 20
→ same cursor, same filter, different sorted set

Merge in application memory:

var merged = regularTweets
    .Concat(celebrityTweets)
    .OrderByDescending(id => id)  // Snowflake ID = time order
    .Take(20)
    .ToList();

var newCursor = merged.Last();

Why this works across sources: Snowflake ID is globally time-ordered. Tweet from regular user and tweet from celebrity both have globally unique Snowflake IDs. Sort together by ID — correct chronological order regardless of source.

The honest tradeoff: in-memory merge is messy. Not perfectly clean architecture. Accepted because the alternative — full read-time joins — collapses at 100,000 reads/second. Feed is best-effort chronological not guaranteed perfect order.


Storage Design

Tweet DB — sharded by tweetId:

tweets
    tweet_id    BIGINT PRIMARY KEY  -- Snowflake ID
    user_id     BIGINT
    content     VARCHAR(280)
    created_at  TIMESTAMPTZ

Sharded by tweetId. Most common read is single tweet lookup by ID — always single shard. If sharded by userId — fetching 20 tweets from 20 users means 20 different shards potentially. Shard by what you query by.

Follow DB — sharded by follower_id:

follows
    follower_id   BIGINT
    following_id  BIGINT
    is_celebrity  BOOLEAN
    PRIMARY KEY (follower_id, following_id)

Most common query — who does user X follow. Filter by follower_id. Shard by follower_id.

Problem — fan-out queries who follows user X. Filter by following_id. Different direction. Cross-shard.

Fix — store follow relationship twice:

follower_following  sharded by follower_id  ← "who do I follow"
following_follower  sharded by following_id ← "who follows me"

Fan-out queries following_follower — always single shard. Feed read queries follower_following — always single shard. Storage doubles. Queries never cross shards. Classic denormalisation for sharding.


Complete Write Path

POST /tweet { content: "hello" }
→ Tweet Service
→ Snowflake ID generated
→ INSERT into Tweet DB (sharded by tweetId)
→ publish tweet.created to RabbitMQ topic exchange

Fan-out Service                    TweetCache Service
→ query following_follower DB      → SET tweet:{tweetId}
  WHERE following_id = authorId      TTL 24hrs
→ if follower_count > 1M
  → skip fan-out (celebrity)
→ else
  → ZADD feed:{followerId}
    score: Snowflake ID
    for each follower

Complete Read Path

GET /feed?cursor=5001

Feed Service:
 ZREVRANGEBYSCORE feed:123 5001 -inf LIMIT 0 50   (regular)
 SMEMBERS celebrity_follows:123                    (celebrity IDs)
 ZREVRANGEBYSCORE tweet:X:latest 5001 -inf LIMIT 0 20  (per celebrity)
 merge by Snowflake ID descending
 take top 20
 MGET tweet:id1 tweet:id2 ... tweet:id20           (details)
 return + new cursor

Non-Functional Requirements

Scalability: - Feed Service stateless — scale horizontally - Redis Cluster — distribute feed keys across nodes - Tweet DB sharded by tweetId — distribute writes - Follow DB sharded by follower_id — distribute lookups - Fan-out Service scaled independently — add consumers for write spikes

Availability: - Redis down — fall back to DB for tweet details, feed temporarily unavailable, recovers on restart - Fan-out Service down — tweets queue in RabbitMQ, feeds catch up on recovery, no data loss - Tweet DB shard down — tweets on that shard unavailable, promote replica

Consistency: - Feed is eventually consistent — fan-out async, followers see tweets seconds after posting - Celebrity tweets eventually consistent — tweet cache has 24hr TTL - Unfollow doesn't immediately remove tweets from feed — TTL cleans up naturally - Acceptable — feed is best-effort chronological, not a financial ledger


Key Confusions and Resolutions

Confusion 1 — Why not just join at read time Student initially thought joins were viable. Resolved by calculating 20M DB lookups/second. Impossible to serve.

Confusion 2 — How hot links are decided in cache Initial instinct was a separate service analyses metrics and decides what to cache. Resolved — LRU handles it automatically through access patterns. Hot tweets stay cached because they keep getting hit. Cold ones evicted.

Confusion 3 — How cursor works across two data sources Student correctly identified that regular feed and celebrity feed are different sources with potentially different positions. Resolved — cursor is just a Snowflake ID timestamp. Applies equally to both sorted sets. Merge in memory by Snowflake ID. Globally time-ordered regardless of source.

Confusion 4 — Tie-breaking across sources Student identified that Snowflake ID tie-breaking within one sorted set doesn't cleanly apply across two sources. Resolved — merge in application memory, sort by Snowflake ID descending. Snowflake IDs are globally unique — no ties possible across any source.

Confusion 5 — How many celebrity tweets to fetch Student asked how many tweets from tweet:{celebrityId}:latest to include. Resolved — cursor bounds the fetch. ZREVRANGEBYSCORE with cursor as upper bound returns only tweets newer than last seen. Capped per celebrity to avoid one prolific celebrity flooding the merged result.


Patterns Used

Pattern Where used
Pre-compute at write, serve at read Feed pre-built per user in Redis
Async fan-out via queue Tweet posted → RabbitMQ → Fan-out Service
Tiered handling for outliers Regular users push model, celebrities pull model
Denormalise for sharding Follow table stored twice, different shard keys
Cursor pagination Snowflake ID as cursor, applied to all sources
Topic exchange routing tweet.created routed only to relevant consumers

Unanswered / Needs More Depth

  • Cold start — user has no pre-computed feed (first login, cache evicted)
  • Retweets — how fan-out handles retweets of celebrity content
  • Follow/unfollow consistency — feed showing tweets from unfollowed user until TTL
  • Trending topics — separate problem, not covered

What's Next

  • Notification system design (covered next)
  • Student to drive URL shortener problem end to end
  • Deep dive on consistent hashing for sharding