URL Shortener — HLD Session Recap¶
Starting Point¶
First HLD problem attempted. Five step framework introduced and demonstrated by tutor end to end. Student observed the full thinking process before attempting independently. Goal was to understand each building block deeply before drilling the framework. Student covered concepts one at a time — capacity estimation, Base62, write path, read path, cache, sharding, rate limiting, custom aliases, non-functionals.
The Five Step Framework¶
Apply to every HLD problem. No exceptions.
1. Clarify requirements — never draw anything first, 2-3 minutes
2. Capacity estimation — numbers drive component decisions, 2-3 minutes
3. High level components — skeleton first, add complexity only where needed
4. Deep dive — interviewer picks one or two areas, 20-30 minutes
5. Non-functional — scalability, availability, consistency
Clarified Requirements¶
- Public URL shortener (bit.ly scale)
- Basic click count analytics required
- Custom aliases optional feature
- Read heavy — links created once, clicked many times
Capacity Estimation¶
The one number to memorise: 86,400 — seconds in a day.
Mental shortcut for big number division: convert to powers of 10, subtract exponents.
10 billion ÷ 100 thousand
= 10^10 ÷ 10^5
= 10^(10-5)
= 10^5
= 100,000
Powers to memorise:
thousand = 10^3
million = 10^6
billion = 10^9
trillion = 10^12
Derivations:
Writes:
100M URLs shortened/day ÷ 86,400 = ~1,200 writes/second
→ single primary DB handles this fine
Reads:
100M links * 10 clicks avg = 1B reads/day
1B ÷ 86,400 = ~11,500 reads/second
→ DB cannot handle this alone → cache mandatory
Read/write ratio: 10:1 → read dominant
Storage:
short code + long URL + created_at + user_id = ~500 bytes per record
100M records/day * 365 * 5 years = 182B records
182B * 500 bytes = ~91TB over 5 years
→ single node viable
→ raise TTL/archiving strategy
→ sharding is future concern not day one
Short code space:
62^7 = 3.5 trillion combinations
3.5T ÷ 100M per day = 35,000 days coverage (~95 years)
→ 7 character Base62 is sufficient
What each number drives:
| Number | Decision |
|---|---|
| 1,200 writes/sec | Single primary DB sufficient |
| 11,500 reads/sec | Cache mandatory |
| 10:1 read/write ratio | Read path needs most attention |
| 91TB over 5 years | Single node viable, raise archiving |
| 182B records needed | 7 character Base62 sufficient |
What storage estimation drives specifically: Not just infrastructure — storage number raises the archiving conversation. At 91TB, single node viable. But adding ~18TB/year means raising TTL policy — archive links inactive for 2 years. Keeps active dataset at ~20-30TB. Always state what the storage number means, not just what it is.
Base62 vs Base64 vs Hash — The Decision¶
Why not Base64:
Base64 includes + and /. Both break URLs.
+ = space in URL encoding
/ = path separator
URL encoding required — %2B, %2F — making codes ugly and longer. Base64 designed for binary-to-text encoding. Wrong tool for URLs.
Base64URL (RFC 4648) fixes this with - and _ substitution. Safe but 64 characters for no meaningful benefit over Base62.
Why not Hash:
Three problems:
-
Collisions — two different URLs can produce same first 7 characters of MD5/SHA256. Must detect, handle, retry. Complexity for no benefit.
-
Deterministic — same URL always produces same code. Two campaign links to same destination URL get same short code. Analytics breaks — can't distinguish clicks between campaigns.
-
Always fixed length — always 7 characters whether 100 URLs or 100 billion. No efficiency gain early on.
Why Base62:
a-z = 26 characters
A-Z = 26 characters
0-9 = 10 characters
= 62 total, all URL-safe, no encoding needed
62^7 = 3.5 trillion combinations
Auto-Increment to Base62 — The Mechanics¶
Key insight: DB always stores integers. Base62 is application-level encoding on top. Two separate concerns.
Base62 is base 10 with 62 digits instead of 10. Same repeated division algorithm as converting decimal to binary.
Alphabet:
Position 0-9: '0'-'9'
Position 10-35: 'a'-'z'
Position 36-61: 'A'-'Z'
Full: "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Conversion example — ID 1000:
1000 ÷ 62 = 16 remainder 8 → Alphabet[8] = '8'
16 ÷ 62 = 0 remainder 16 → Alphabet[16] = 'g'
Read remainders bottom to top: "g8"
Verify:
g = position 16
8 = position 8
16 * 62^1 + 8 * 62^0 = 992 + 8 = 1000 ✓
Code:
private static readonly string Alphabet =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static string Encode(long id)
{
if (id == 0) return "0";
var result = new StringBuilder();
while (id > 0)
{
result.Insert(0, Alphabet[(int)(id % 62)]);
id /= 62;
}
return result.ToString();
}
Why guaranteed unique: DB auto-increment gives unique integers. Encoding a unique integer always gives a unique string. No collision possible.
The enumeration problem: codes are sequential. Someone with code "g8" can guess "g9" exists. Acceptable for public URL shortener. Not acceptable for private document sharing — switch to random Base62.
The Write Path — Full Flow¶
The confusion point — when does the code get saved back to DB:
POST /shorten { url: "https://very-long-url.com" }
Step 1 — INSERT URL, get ID back in same round trip
INSERT INTO urls (url, created_at) VALUES (@url, now()) RETURNING id
→ id = 1000
Step 2 — Encode in application
Base62.Encode(1000) = "g8"
Step 3 — Store code back
UPDATE urls SET code = 'g8' WHERE id = 1000
Step 4 — Return "g8" to client
Two DB operations. RETURNING id avoids separate SELECT round trip.
Two valid designs:
| Design | Write | Read |
|---|---|---|
| Store code + integer | encode ID, store code | lookup by code directly |
| Store integer only | encode ID, return code | decode code → lookup by ID |
Store code + integer is simpler on read path. No decoding needed.
The Read Path — Never Decode¶
The confusion point — student initially thought decoding was needed on GET requests.
You never decode unless you never stored the code column.
Stored code column:
GET /g8
→ SELECT url FROM urls WHERE code = 'g8' ← direct lookup
→ redirect
No code column stored:
GET /g8
→ decode "g8" → 1000
→ SELECT url FROM urls WHERE id = 1000
→ redirect
Decoding just to get the ID when the code is already stored is an unnecessary extra step.
Cache — LRU and Hot Link Detection¶
The confusion point — student thought a separate service decides what's hot and manually populates cache.
LRU handles everything automatically through access patterns. You write zero eviction logic.
Cache miss → populate Redis
Cache hit → Redis internally refreshes LRU timestamp
Cache full → Redis evicts key with oldest LRU timestamp automatically
Hot links stay cached — accessed constantly, LRU timestamp keeps refreshing. Cold links evicted — nobody clicks them, LRU timestamp goes stale.
Power law distribution: top 1% of links get ~80% of clicks. Cache the hot links, absorb 80% of read traffic without touching DB.
Cache warming — optional optimisation. Background job pre-loads top N clicked links before TTL expiry. Eliminates cold start problem. Not a core requirement.
Analytics service — entirely separate concern. Records click counts, geography, referrer for user-facing reporting. Has nothing to do with cache management.
LRU eviction → automatic, Redis internal, zero code
Cache populate → on cache miss, application code
Cache warming → optional background job
Analytics → separate service, user-facing reporting
301 vs 302¶
| 301 Permanent | 302 Temporary | |
|---|---|---|
| Browser caches | Yes — permanently | No |
| Every click hits server | No | Yes |
| Analytics possible | No | Yes |
| If destination changes | Browser ignores update | Always gets latest |
Analytics required → always 302. Every click must pass through server.
Analytics Off Critical Path¶
Never block the redirect on analytics recording.
GET /g8
→ cache hit → long URL retrieved
→ publish ClickEvent to queue ← async, fire and forget, never blocks
→ return 302 immediately
Analytics Service (separate)
→ consumes ClickEvent
→ increments click count
→ stores geography, referrer, timestamp
Redis in .NET — Canonical Pattern¶
IDistributedCache — canonical for simple get/set/remove with TTL. Abstraction over Redis. Swappable with in-memory for tests.
StackExchange.Redis — when Redis-specific features needed: counters, sorted sets, pub/sub, Lua scripts.
Critical — IConnectionMultiplexer is always singleton:
// WRONG — new connection per request, destroys performance
using var redis = await ConnectionMultiplexer.ConnectAsync("localhost");
// CORRECT — one connection pool, reused everywhere
builder.Services.AddSingleton<IConnectionMultiplexer>(
ConnectionMultiplexer.Connect("localhost"));
Cache-Aside Decorator Pattern¶
Repository owns cache logic. Service has zero cache knowledge. Decorator pattern.
Service → IUrlRepository → result
↑
CachedUrlRepository ← decorator, adds cache
↑
UrlRepository ← DB only, no cache knowledge
public class CachedUrlRepository : IUrlRepository
{
private readonly IUrlRepository _inner;
private readonly IDistributedCache _cache;
public async Task<string?> GetLongUrlAsync(string code)
{
var cached = await _cache.GetStringAsync(code);
if (cached != null) return cached;
var longUrl = await _inner.GetLongUrlAsync(code);
if (longUrl == null) return null;
await _cache.SetStringAsync(code, longUrl, new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24)
});
return longUrl;
}
public async Task<string> CreateAsync(string longUrl)
{
// writes pass straight through — no cache on write
return await _inner.CreateAsync(longUrl);
}
}
Why no cache on write: link just created, nobody has code yet, cache hit rate is zero. Populate on first read — lazy population. Cache-aside pattern.
Registration:
builder.Services.AddScoped<UrlRepository>();
builder.Services.AddScoped<IUrlRepository>(sp =>
new CachedUrlRepository(
sp.GetRequiredService<UrlRepository>(),
sp.GetRequiredService<IDistributedCache>()));
Full Read Path End to End¶
GET /g8
UrlShortenerService.RedirectAsync("g8")
↓
CachedUrlRepository.GetLongUrlAsync("g8")
↓
Redis hit?
→ yes → publish ClickEvent (async, fire and forget)
→ return 302 redirect
→ no → UrlRepository.GetLongUrlAsync("g8")
→ SELECT url FROM urls WHERE code = 'g8'
→ found → SET Redis 'g8' = longUrl, TTL 24hrs
→ publish ClickEvent (async)
→ return 302 redirect
→ not found → 404
Sharding¶
What it solves: single node storage and write throughput limits. Read replicas solve read throughput but don't reduce storage or improve write throughput — replicas are full copies, all writes still go to one primary.
When to shard URL shortener: 91TB over 5 years is viable on single node. Shard when writes grow 10x (~12,000/sec) or storage exceeds single node comfort. Not a day-one requirement.
Hash-based sharding:
hash(shortCode) % N = shard index
Even distribution. No hotspots. Standard for content systems without natural range key.
Consistent hashing: minimises redistribution when nodes added or removed. Simple modulo means adding one shard requires moving almost all data. Consistent hashing redistributes only 1/N of data.
Read replicas per shard: each shard is an independent replication unit. Shard 1 primary replicates to Shard 1 replicas only. No knowledge of other shards.
ShardDescriptor in .NET:
public class ShardDescriptor
{
public string WriteConnectionString { get; init; }
public string ReadConnectionString { get; init; }
}
public class ShardResolver
{
public ShardDescriptor Resolve(string shardKey)
{
var index = Math.Abs(shardKey.GetHashCode()) % _shards.Length;
return _shards[index];
}
}
Interview framing:
"At current scale — single primary with read replicas. If writes grow 10x or storage exceeds single node comfort I'd introduce hash-based sharding on the short code using consistent hashing to minimise redistribution when adding nodes."
Rate Limiting¶
What it solves: prevents abuse — bots submitting millions of URLs, spam link generation, API hammering.
Where it lives: API Gateway. Services never see rejected traffic.
What to rate limit on: - IP address — simplest, but corporate NAT means many users share one IP - API key — more precise, bad actor gets key revoked without affecting others - User ID — most precise, requires auth
Two main algorithms:
Token Bucket: Each client has a bucket of N tokens. Refills at fixed rate. Each request consumes one token. Empty bucket = rejected. Allows bursting up to capacity. Natural for real users who are sometimes active in bursts.
Sliding Window: Track exactly how many requests in last N seconds. Exceeds limit = rejected. More precise than token bucket. No burst allowance. Strict enforcement.
Must use Redis for distributed rate limiting:
Any component scaled to multiple instances
→ in-memory rate limiting is useless
→ each instance has isolated counter
→ client bypasses limit by hitting different instance
→ Redis shared counter required
This applies even when rate limiting at the gateway — gateway itself scaled to multiple instances behind a load balancer. In-memory fails. Redis required.
ASP.NET Core built-in rate limiting (. NET 7+):
builder.Services.AddRateLimiter(options =>
{
options.AddSlidingWindowLimiter("shorten-limit", o =>
{
o.PermitLimit = 100;
o.Window = TimeSpan.FromMinutes(1);
o.SegmentsPerWindow = 6;
o.QueueLimit = 0; // reject immediately, no queuing
});
options.RejectionStatusCode = 429;
});
Built-in is in-memory. For distributed — use RedisRateLimiting library or handle at NGINX/YARP gateway level.
Why reads are not rate limited: clicks on legitimate links. Real users click. Bots click for valid reasons (link preview, social crawlers). Rate limiting reads breaks legitimate use cases. Rate limit writes only.
Custom Aliases¶
User picks their own short code instead of auto-generated.
Happy path:
POST /shorten { url: "...", alias: "my-brand" }
→ validate format
→ check reserved list
→ INSERT with code = 'my-brand'
→ unique constraint violation → 409
→ success → return "my-brand"
Gotcha 1 — Race condition:
Two users request same alias simultaneously. Both check, both see nothing, both try to insert. One wins, one gets DB error.
Fix — unique constraint on code column. Try the insert, catch constraint violation. One round trip instead of two. Race condition impossible.
try
{
await _db.ExecuteAsync(
"INSERT INTO urls (code, url) VALUES (@code, @url)",
new { code = alias, url = longUrl });
}
catch (PostgresException ex) when (ex.SqlState == "23505")
{
return Conflict("Alias already taken");
}
Gotcha 2 — Namespace collision:
User requests alias "g8". That code already exists as auto-generated. Or user reserves "g8" today, auto-increment reaches ID 1000 tomorrow which encodes to "g8".
Initial fix considered — prefix auto-generated codes with "a-". Problem — user can submit "a-g8" as alias, colliding with prefixed auto-generated code "a-g8".
Correct fix — single table, single unique constraint:
CREATE TABLE urls (
id BIGINT PRIMARY KEY,
code VARCHAR(30) NOT NULL UNIQUE, -- covers everything
url TEXT NOT NULL,
is_alias BOOLEAN DEFAULT FALSE
);
One table. One unique constraint. Auto-generated and aliases live side by side. Constraint prevents any collision regardless of origin or source.
Separate tables were considered and rejected — they create false safety. User can submit any string as alias including auto-generated patterns. Physical separation without global uniqueness check is insufficient.
Gotcha 3 — Alias validation:
private static readonly Regex ValidAlias =
new Regex("^[a-zA-Z0-9-_]{3,30}$", RegexOptions.Compiled);
Rules: alphanumeric plus hyphen and underscore (URL safe), minimum 3 characters, maximum 30 characters.
Gotcha 4 — Reserved aliases:
private static readonly HashSet<string> Reserved = new()
{
"health", "api", "admin", "callback", "login", "logout"
};
User registers alias "health" — now GET /health hits redirect instead of health check endpoint.
Write path with aliases:
POST /shorten { url, alias? }
Alias provided:
→ validate format (regex)
→ check reserved list
→ INSERT with alias as code
→ unique constraint violation → 409 Conflict
→ success → return alias
No alias:
→ generate Snowflake ID
→ encode to Base62
→ INSERT with generated code
→ return generated code
Read path — no change:
GET /my-brand
→ SELECT url FROM urls WHERE code = 'my-brand'
→ found → 302 redirect
Identical to auto-generated lookup. Same column, same index, same query.
Non-Functional Requirements¶
Scalability:
Read path — stateless services scale horizontally, Redis Cluster distributes cache keys, read replicas handle cache misses.
Write path — 1,200/sec comfortable on single primary. If grows 10x: 1. Vertical scale — bigger machine 2. PgBouncer — connection pooling, reduces overhead 3. Write batching — buffer writes, flush in batches 4. Sharding — last resort, hash by short code, consistent hashing
Availability:
| Component fails | Impact | Recovery |
|---|---|---|
| Redis | Reads fall through to DB, slower but functional | Cache warms naturally on restart |
| Read replica | Reads fall back to primary, higher load | Add new replica |
| Primary DB | Writes fail, reads from cache + replica | Promote replica, small data loss window |
| Service instance | LB health check removes it, traffic to healthy instances | Auto-scaling replaces |
Consistency:
- Write path: strong — unique constraint on primary, two users cannot get same code
- Read path: eventual — cache TTL 24hrs, stale data possible after URL update
- Explicit invalidation on URL change:
await _cache.RemoveAsync(code)tightens window to near-zero - Analytics: eventual — click events async via queue, counts lag by seconds. Acceptable.
- Read replica lag: 50-100ms, acceptable. Newly created link read immediately from primary not replica.
Handling viral links:
"Viral link cached after first click. Every subsequent click hits Redis — sub-millisecond. DB never sees it. Redis handles hundreds of thousands of reads/second on single node. Cache-aside pattern absorbs viral traffic naturally."
Data loss if primary goes down:
"Replication lag at failure time — typically milliseconds to seconds. Acceptable for URL shortener. Synchronous replication would eliminate loss at cost of higher write latency — not justified for this use case."
Interview Framing — Key Statements¶
On sharding:
"At current scale a single primary with read replicas is sufficient. If write throughput or storage hits single node limits I'd introduce hash-based sharding on the short code using consistent hashing to minimise redistribution when scaling out."
On rate limiting:
"Rate limit at the API Gateway on the write path — IP-based for anonymous clients, API key based for registered users with higher limits. Token bucket for burst tolerance, sliding window for strict enforcement. State in Redis so limits are consistent across all service instances regardless of how many gateway instances are running."
On Base62:
"Base64 has URL-unsafe characters. Hashing introduces collision handling complexity for no benefit. Base62 with auto-increment gives guaranteed uniqueness, URL-safe characters, simple implementation. If link privacy is a concern I'd switch to random Base62 — collision probability negligible until billions of records."
On custom aliases:
"Custom aliases use the same code column as auto-generated codes. Uniqueness enforced at DB level via unique constraint — try the insert, catch constraint violation, eliminates race condition. Single table with one unique constraint is the only true guarantee — separate tables create false safety because users can submit any string as alias."
Gaps Identified During Session¶
Gap 1 — Storage estimation not connected to decisions Calculated 91TB but didn't use it. Storage drives archiving conversation, not just infrastructure. Always state what the number means.
Gap 2 — Base64 URL safety Didn't know + and / were the problem characters. Now known: Base64URL substitutes - and _ but Base62 is simpler and standard.
Gap 3 — Never decode on read Initial instinct was to decode short code on every GET. Resolved — if code stored, look up by code directly. Decode only if code column never stored.
Gap 4 — When code gets written back Didn't see the two-step write flow. INSERT with RETURNING id, then UPDATE SET code. Or Snowflake ID generator removes chicken-and-egg entirely.
Gap 5 — LRU is automatic Thought separate service manages cache population and eviction. Resolved — LRU handles it entirely through access patterns. Zero eviction code needed.
Gap 6 — Rate limiting across instances Thought gateway-level rate limiting avoided distributed state problem. Resolved — gateway itself is scaled. Same problem. Redis required regardless of where limiter lives.
Gap 7 — Separate tables don't prevent alias collision Proposed separate tables for aliases and generated codes. Resolved — user can submit any string as alias including auto-generated patterns. Only single table with unique constraint is a true guarantee.
Unanswered / Needs More Depth¶
- Snowflake ID generator mechanics — time component, machine ID, sequence
- Link expiration — user-defined TTL on short codes
- Link analytics dashboard design — reporting DB, event pipeline
- URL validation on write — check destination is reachable, not on malicious domain blocklist
- Geographic analytics — tracking click location, CDN implications
- Custom domain support — user brings their own domain for short links
What's Next¶
- Student drives Twitter feed problem end to end
- Consistent hashing in depth — virtual nodes, redistribution mechanics
- Design a rate limiter as a standalone HLD problem
- Design a distributed cache
- Resume finalisation — URL shortener design knowledge translatable to interview answers on caching and sharding