Security 8 min readDecember 3, 2023

Rate Limiting Is Not Optional: The Patterns Every Production API Needs

Unprotected APIs are vulnerable to abuse, accidental overload, and denial-of-service attacks. Rate limiting is the first line of defense — and most implementations get it wrong in predictable ways.

The CTO of a payments startup once told me about an engineer at their company who wrote a data export feature over a weekend. The feature worked correctly and was reviewed and merged the following Monday. By Wednesday, a customer had found the endpoint, written a script that called it in a loop, and generated a 400 GB data export that brought down the database for 45 minutes. The export endpoint had no rate limiting. The code review hadn't flagged it. Nobody had thought about it.

Rate limiting is the category of security controls that engineers consistently underestimate because its absence usually isn't a vulnerability that appears in a security scanner. It's a vulnerability that appears when a motivated user or a misconfigured client decides to use your API faster than you planned for.

What Rate Limiting Actually Protects Against

Rate limiting is typically framed as protection against abuse. It also protects against accidents: a client with a bug that makes an API call in a tight loop, a data import that calls your API for every row in a million-row file, a misconfigured retry policy that treats every error as a reason to retry immediately. These aren't malicious but their impact on your infrastructure is identical to a deliberate attack. Rate limiting protects your system from all of these regardless of the intent behind the traffic.

The Token Bucket Algorithm

The most commonly misimplemented rate limiting approach is the fixed window algorithm: allow N requests per time window, reject everything above the threshold. The problem with fixed windows is the boundary burst: a client who exhausts their limit at the end of one window and makes the same number of requests at the start of the next window sends 2N requests in a short interval without violating any rate limit check.

The token bucket algorithm solves this: clients accumulate tokens at a fixed rate up to a maximum bucket size, and each request consumes a token. A client who waits can make burst requests up to the bucket size. A client who calls continuously is limited to the refill rate. This more accurately models the intent of rate limiting — allowing occasional bursts while preventing sustained high-rate access — and avoids the boundary burst problem.

Granularity: Per-IP, Per-User, Per-Key

The right rate limiting granularity depends on what you're protecting and who you're protecting it from. Per-IP rate limiting protects against simple scripts and misconfigured clients but is trivially bypassed by anyone with multiple IP addresses. Per-user rate limiting is more robust for authenticated endpoints — it limits the rate at which an authenticated user can access resources, regardless of which IP they're using. Per-API-key rate limiting is appropriate for public APIs where different clients should have different limits based on their tier or intended use.

Most production APIs need multiple layers: per-IP for unauthenticated endpoints (login, password reset, public data), per-user for authenticated resource endpoints, and per-key for API integrations. These aren't alternatives — they're complements that protect against different attack surfaces.

What Code Review Should Check

A code review checklist for new API endpoints: Does this endpoint have rate limiting? Is the rate limit applied at the right granularity (per-IP for public, per-user for authenticated)? Is the rate limit implemented in a distributed-safe way (using Redis or a similar shared store rather than in-memory state that doesn't survive restarts or scale horizontally)? Does the response include rate limit headers so clients can implement backoff? Does the 429 response include a Retry-After header so clients know when to try again? These questions take two minutes to check and catch the category of rate limiting omissions that lead to weekend incidents.

Try CodeMouse on your next PR

Free AI code review on every pull request. Bring your own API key — no subscription needed.

Install on GitHub — Free

Rate Limiting Is Not Optional: The Patterns Every Production API Needs

What Rate Limiting Actually Protects Against

The Token Bucket Algorithm

Granularity: Per-IP, Per-User, Per-Key

What Code Review Should Check

Try CodeMouse on your next PR

More from the blog

We Automated 10 Million Code Reviews. Here's What We Learned.

The PR Size Problem: Why Your Biggest Reviews Are Your Riskiest Deployments

The 7 Security Vulnerabilities Most Likely to Survive Your Code Review