Session Recap — URL Shortener API Design¶
Starting Point¶
First problem in the API + Application Design pillar. Goal: practice resource modelling, endpoint design, request/response contracts, and schema design from scratch. No prior context on URL shorteners — had to build the mental model from first principles.
What We Did¶
Step 1 — Built the mental model¶
Established that sht.lky is the domain. sht.lky/abc123 is the full short URL. abc123 is the short code — generated by the server, stored in DB, returned to client. The domain alone is not the short URL.
Step 2 — Designed the endpoints¶
POST /api/links → create a short link
GET /{code} → redirect to long URL (301)
GET /api/links/{code} → get link details + stats
Key decision: redirect lives at root (/{code}), not under /api/links/{code}. Reason: the short URL is a user-facing artifact shared in tweets, emails, printed on posters. /api/links/abc123 defeats the purpose of shortening.
Step 3 — Designed request/response contracts¶
POST /api/links
Request:
{ "longUrl": "longasslink.com/some/path" }
Response (201):
{
"urlId": "abc123",
"longUrl": "longasslink.com/some/path",
"shortUrl": "sht.lky/abc123",
"createdAt": "2026-05-02T10:00:00Z"
}
Rule applied: server returns shortUrl ready to use. Client should never construct URLs — that's a server concern.
GET /api/links/{code}
Request: ?startDate=2026-04-01&endDate=2026-05-02
Response (200):
{
"urlId": "abc123",
"longUrl": "longasslink.com/some/path",
"shortUrl": "sht.lky/abc123",
"createdAt": "2026-05-02T10:00:00Z",
"metrics": {
"startDate": "2026-04-01",
"endDate": "2026-05-02",
"timesAccessed": 152,
"breakdown": [
{ "date": "2026-04-01", "hits": 12 },
{ "date": "2026-04-02", "hits": 9 }
]
}
}
Date filters on a GET go in query params, not request body. The breakdown array is what makes the date range actually useful — a single count doesn't need a range.
Step 4 — Schema design¶
ORG(id PK, created_at, last_updated)
USER(id PK, org_id FK, role, created_at)
LINKS(id PK, org_id FK, short_url_id UNIQUE, long_url, long_url_hash, created_at, last_updated)
LINK_ACCESS_LOG(id PK, link_id FK, accessed_at, ip_address, user_agent)
Indexes:
LINKS: unique(short_url_id), idx(long_url_hash)
LINK_ACCESS_LOG: idx(link_id, accessed_at)
Step 5 — Deduplication¶
Problem: same long URL posted twice creates duplicate short codes.
Solution: hash the long URL, store as long_url_hash, check on insert.
long_url itself cannot be indexed directly — can be 2000 characters. SHA-256 hash is always 64 chars, fixed length, fast to compute, indexable.
On second POST with same URL: return 200 with existing record plus "deduplicated": true flag so client knows it got back an existing resource, not a newly created one.
This is deduplication, not idempotency. Important distinction — covered in the concepts recap.
Concepts Covered¶
Resource modelling¶
URLs identify things (nouns). HTTP verbs identify actions. Bad: POST /createLink. Good: POST /links. The hard case is state transitions — send, cancel feel like actions but are state changes on a resource.
Naming resources¶
/api/links over /api/shortenedurls (unreadable) or /api/urls (too vague). Name should be unambiguous to an API consumer with no context.
URL audience¶
The redirect endpoint (/{code}) is for end users. /api/* endpoints are for developers. Different audiences warrant different URL design.
createdAt in POST responses¶
Always include it. The server owns the timestamp — never trust the client to know when a resource was created.
Clients don't construct URLs¶
Return shortUrl fully formed. If the client has to concatenate your domain + urlId to get the shareable link, you've leaked a server implementation detail to the client.
What We Messed Up¶
Missed the short code column in LINKS
Initial schema had id (PK) but no short_url_id. The redirect flow (GET /abc123) looks up by short code, not by DB primary key. Two different things. Added short_url_id as a separate column with a unique constraint.
Long URL indexing
Suggested indexing long_url directly. Can't do this — the column can be thousands of characters. Fix: hash it, store the hash, index the hash.
Composite index column order
Query is WHERE link_id = X AND accessed_at BETWEEN A AND B. Index should be idx(link_id, accessed_at) — link_id first to eliminate most rows, then accessed_at for the range scan within the remaining small set. Flipping the order means scanning the full date range across all links before filtering by link.
Key Values and Config to Remember¶
| Item | Value |
|---|---|
| Short code column | short_url_id UNIQUE |
| Hash column | long_url_hash (SHA-256, 64 chars) |
| Dedup response status | 200 with "deduplicated": true |
| Redirect status code | 301 |
| Stats query params | startDate, endDate |
| Composite index order | link_id before accessed_at |
Unanswered Questions / Things to Investigate¶
- How to generate the short code (
abc123) — random, base62, hash-based? Collision handling? GET /api/links— list all links for an org. Not designed (pagination covered separately in concepts recap).
What's Next¶
This problem is complete. Move to the Hotel Booking problem for more reps on the same skills, or jump to Concepts recap for pagination, idempotency, error contracts, versioning, and rate limiting.