A Traffic-Aware BFF for Next.js: RSC Strategies and API DAGs
Picking BFF and RSC strategies by traffic, modeling API dependencies as DAGs for parallel fetching, and knowing when to extract a separate BFF service from your Next.js app.
Two pages on the same Next.js app can behave nothing alike. A static marketing home renders in 30ms p50 on the edge. A dashboard that waterfalls through nine sequential API calls — auth, profile, preferences, feature flags, billing, notifications, project list, recent activity, suggested actions, each one waiting for the previous to finish before firing — takes 4 seconds. Same App Router, two completely different performance problems.
So the real question is never how do I make Next.js faster. It's what does this particular page actually need — when a BFF layer earns its place, when RSC is the right way to render, and when plain SSR or a static page is enough. Deciding that per page, by how much traffic it sees and how its data dependencies fan out, is what this article is about.
A modern Next.js app is three rendering strategies stitched together: static (CDN-cached HTML), RSC (server-rendered React), and client (interactive JS). A Backend-for-Frontend (BFF) layer sits underneath all three, aggregating downstream APIs into shapes the UI actually needs. The defaults are wrong if you treat every page the same way. Picking strategies based on how much traffic a page sees and how its data dependencies fan out is the entire job.
What a BFF actually buys you
A BFF is a server-side layer between your UI and your backend services. Two things make it worth adding.
- Shape control. Downstream services are designed for their own consumers. Your UI cares about a
ProjectCardwith title, member count, last activity, and a thumbnail. That's three or four service calls. The BFF turns them into one. - Trust boundary. Tokens stay server-side. The browser sees only what it needs. Cross-cutting concerns like rate limiting and audit logging happen in one place.
The thing a BFF is not is a generic API gateway. A gateway routes; a BFF composes. If your team ships endpoints sized to UI screens and ignores reuse, that's a BFF. If you're proxying generic CRUD, that's a gateway and you don't need it.
Two ways to do BFF in Next.js
There are two viable architectures. Picking one is mostly about whether you have multiple frontends.
| Architecture | When it fits | When it doesn't |
|---|---|---|
| Next.js IS the BFF — RSC + route handlers fetch downstream APIs directly | One web frontend, small-to-medium team, tight Next.js coupling acceptable | iOS, Android, or third-party clients also need shaped data |
| Separate BFF service (Hono / Fastify) that Next.js calls | Multiple frontends sharing a contract, backend org boundary, independent deploy cycle | Adds a hop, more infra, harder ergonomics for single-frontend teams |
For a single-web-frontend team, Next.js IS the BFF is faster to build and gives you streaming RSC, server actions, and route handlers as native primitives. The cost is that your aggregation logic lives in your Next app — refactoring to a separate service later is real work.
For a separate BFF, your aggregation lives in its own service. Next.js becomes a render layer. iOS clients can hit the same BFF and get the same shaped responses. The cost is an extra hop and a duplicated set of tools (your BFF needs its own caching, observability, deploy pipeline).
My default for the Next.js apps I've built is to start with Next-as-BFF and split out a separate service only when a second client appears. Premature splitting is a common mistake.
Sequence diagrams: where the time actually goes
Three scenarios, three different shapes. The diagram below traces a typical request lifecycle for each.
The bottom block is where teams lose the most time. Many BFF implementations resolve dependencies sequentially. The right answer is to model dependencies as a graph, run independent calls in parallel, and only sequence when there's a real data flow.
Modeling API dependencies as a DAG
Most page-level data fetches form a small graph. Some calls require outputs of others; many don't. A directed acyclic graph (DAG) is the natural model.
Take a dashboard. The page needs: user profile, their preferences, their feature flags, their projects, their notifications, project membership counts, recent activity, and a personalized recommendations feed. Some of these depend on others.
A topological sort gives you execution batches:
- Resolve
session— producesuserId. - In parallel:
profile,preferences,featureFlags,notifications,projects. - In parallel:
membershipCountsandrecentActivity(depend onprojects);recommendations(depends onprofile,preferences,featureFlags).
The naive sequential implementation does nine waits. The DAG-aware implementation, assuming 50ms median latency per call with uniform distribution, does three dependency frontiers — but the real gain is larger when latencies vary. If projects takes 200ms while profile, preferences, and flags each take 10ms, a promise-memoisation scheduler starts recommendations at 10ms, not at 200ms. The bound per frontier is the slowest dep for that specific node, not the slowest dep in the whole wave.
Implementing the DAG resolver
You don't need a graph library. Twenty lines of TypeScript is enough.
type NodeId = string
interface Node<T> {
id: NodeId
deps: NodeId[]
fetch: (resolved: Record<NodeId, unknown>) => Promise<T>
}
/**
* Promise-memoisation DAG resolver.
* Each node starts the instant its specific deps resolve,
* not when the whole batch does — unlike a topological-batch approach.
*/
export function createDagResolver<TOut extends Record<NodeId, unknown>>(
nodes: Array<Node<unknown>>,
) {
const cache = new Map<NodeId, Promise<unknown>>()
const resolved: Record<NodeId, unknown> = {}
const byId = new Map(nodes.map(n => [n.id, n]))
const schedule = (id: NodeId): Promise<unknown> => {
if (cache.has(id)) return cache.get(id)!
const node = byId.get(id)
if (!node) throw new Error(`Unknown node: ${id}`)
const p = Promise.all(node.deps.map(schedule))
.then(() => node.fetch(resolved))
.then(v => { resolved[id] = v; return v })
cache.set(id, p)
return p
}
return {
run: () =>
Promise.all(nodes.map(n => schedule(n.id)))
.then(() => resolved as TOut),
}
}The key insight is promise memoisation. Each call to schedule(id) returns — and caches — a promise that resolves when that node's fetch completes. Because Promise.all(node.deps.map(schedule)) reuses the cached promise for each dependency, a node starts the instant all its specific dependencies settle, regardless of what other unrelated nodes are doing. Contrast this with a batch approach that calls Promise.all across an entire wave: if projects takes 200ms and profile takes 10ms, a batch executor makes recommendations wait until projects finishes even though recommendations only depends on profile, preferences, and flags.
BATCH VS PROMISE-MEMOISATION
A batch DAG groups nodes into waves (topological sort) and awaits each wave with Promise.all before the next starts. A promise-memoisation DAG starts each node the moment its specific deps resolve. For a dashboard with mixed latencies — some calls at 10ms, one outlier at 200ms — the memoisation approach can finish its critical path seconds earlier.
import { createDagResolver } from '@/lib/bff/dag'
import { auth } from '@/auth'
import { cache } from 'react'
// React.cache() deduplicates calls within a single RSC render tree.
// If two components both call getProfile(userId), only one fetch fires.
const getProfile = cache((userId: string) => api.getProfile(userId))
const getPreferences = cache((userId: string) => api.getPreferences(userId))
const getFlags = cache((userId: string) => api.getFlags(userId))
export default async function Dashboard() {
const session = await auth()
if (!session) return null
const userId = session.user.id
const dag = createDagResolver([
{ id: 'profile', deps: [], fetch: () => getProfile(userId) },
{ id: 'preferences', deps: [], fetch: () => getPreferences(userId) },
{ id: 'flags', deps: [], fetch: () => getFlags(userId) },
{ id: 'projects', deps: [], fetch: () => api.getProjects(userId) },
{ id: 'activity', deps: ['projects'], fetch: r => api.getActivity((r.projects as Project[]).map(p => p.id)) },
{ id: 'memberCounts', deps: ['projects'], fetch: r => api.getMemberCounts((r.projects as Project[]).map(p => p.id)) },
{ id: 'recs', deps: ['profile', 'preferences', 'flags'],
fetch: r => api.getRecommendations({
profile: r.profile as Profile,
preferences: r.preferences as Prefs,
flags: r.flags as Flags,
}) },
])
const data = await dag.run()
return <DashboardView data={data} />
}DON'T REACH FOR A GRAPH LIBRARY
p-graph, dag-builder and friends are real packages. For the page-level case (5–15 nodes, no cross-page reuse), inline the resolver. You'll spend more time wrangling a library's lifecycle than writing the loop, and the loop is something a teammate can read in thirty seconds.
Streaming the BFF response when one slow call would block fast ones
Even with the DAG running everything in parallel batches, you still wait for the slowest call in each batch before rendering anything. If recommendations takes 2 seconds and profile takes 50ms, the user stares at a blank page for 2 seconds.
Two ways out. The first is RSC + Suspense — wrap each piece in <Suspense> and React streams them. The second is the BFF itself streaming its response as NDJSON: as each API call resolves, the server writes a JSON line. The client parses lines as they arrive and updates incrementally. Useful when:
- The caller is not RSC — a client route handler, an iOS app, an edge function, a partner integration. Suspense is React-only; NDJSON works for everyone.
- You want the same streaming behavior across web, iOS, and third-party clients without each reimplementing it.
- The fan-out is wide (8+ calls) and tail latency dominates. NDJSON gives you explicit control over what ships first.
The BFF endpoint stays simple. Open a ReadableStream, fire all calls in parallel, write a JSON line per resolved call, close.
import type { NextRequest } from 'next/server'
import { auth } from '@/auth'
export async function GET(req: NextRequest) {
const session = await auth()
if (!session) return new Response('unauthorized', { status: 401 })
const userId = session.user.id
const encoder = new TextEncoder()
const stream = new ReadableStream({
async start(controller) {
const send = (key: string, data: unknown) => {
controller.enqueue(
encoder.encode(JSON.stringify({ key, data }) + '\n'),
)
}
const fail = (key: string, error: unknown) => {
const message = error instanceof Error ? error.message : 'unknown'
controller.enqueue(
encoder.encode(JSON.stringify({ key, error: message }) + '\n'),
)
}
const calls = [
api.getProfile(userId).then(d => send('profile', d)).catch(e => fail('profile', e)),
api.getPreferences(userId).then(d => send('preferences', d)).catch(e => fail('preferences', e)),
api.getFlags(userId).then(d => send('flags', d)).catch(e => fail('flags', e)),
api.getRecommendations(userId).then(d => send('recommendations', d)).catch(e => fail('recommendations', e)),
]
await Promise.allSettled(calls)
controller.close()
},
})
return new Response(stream, {
headers: {
'Content-Type': 'application/x-ndjson',
'Cache-Control': 'no-store',
'X-Content-Type-Options': 'nosniff',
},
})
}Two things to call out in that snippet. Per-call .catch(fail) keeps one failure from poisoning the whole stream — the fast calls still ship. And Promise.allSettled is the right primitive at the bottom: it waits for everything without short-circuiting on the first rejection.
On the client, parse line-by-line, drop into state by key. Keep a buffer for partial chunks across read() boundaries — TCP doesn't respect your line breaks.
'use client'
import { useEffect, useState } from 'react'
type Chunk =
| { key: string; data: unknown }
| { key: string; error: string }
export function useStreamingDashboard() {
const [data, setData] = useState<Record<string, unknown>>({})
const [errors, setErrors] = useState<Record<string, string>>({})
useEffect(() => {
const ctl = new AbortController()
void (async () => {
try {
const res = await fetch('/api/dashboard', { signal: ctl.signal })
if (!res.body) return
const reader = res.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { value, done } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() ?? '' // tail might be partial
for (const line of lines) {
if (!line) continue
const chunk = JSON.parse(line) as Chunk
if ('error' in chunk) {
setErrors(p => ({ ...p, [chunk.key]: chunk.error }))
} else {
setData(p => ({ ...p, [chunk.key]: chunk.data }))
}
}
}
} catch (err) {
if ((err as Error).name !== 'AbortError') console.error(err)
}
})()
return () => ctl.abort()
}, [])
return { data, errors }
}The fast profile lands in 50ms and renders. The recs slot stays in a skeleton state for 2 seconds, then snaps in. Time-to-first-meaningful-content is the time of the fastest call, not the slowest.
Combining DAG batches with streaming
The DAG and the stream are not alternatives — they compose. Each batch in the topological sort runs in parallel and emits its results to the stream as they resolve. The next batch starts as soon as its dependencies have all landed in the stream.
- Batch 1 (
session) resolves → one line on the wire. - Batch 2 (
profile,preferences,flags,notifications,projects, all parallel) → up to five lines, each shipped the instant that call resolves. - Batch 3 (
memberCounts,activity,recommendations, all parallel) → three more lines, again as each resolves.
import { createDagResolver } from '@/lib/bff/dag'
import { auth } from '@/auth'
import type { NextRequest } from 'next/server'
export async function GET(req: NextRequest) {
const session = await auth()
if (!session) return new Response('unauthorized', { status: 401 })
const userId = session.user.id
const encoder = new TextEncoder()
const stream = new ReadableStream({
async start(controller) {
const send = (key: string, data: unknown) =>
controller.enqueue(encoder.encode(JSON.stringify({ key, data }) + '\n'))
const fail = (key: string, error: unknown) =>
controller.enqueue(encoder.encode(JSON.stringify({ key, error: String(error) }) + '\n'))
// Wrap each leaf fetch so it streams the moment it resolves.
const streaming = (key: string, fn: () => Promise<unknown>) =>
async (resolved: Record<string, unknown>) => {
try {
const data = await fn()
send(key, data)
return data
} catch (e) {
fail(key, e)
return null
}
}
const dag = createDagResolver([
{ id: 'profile', deps: [], fetch: streaming('profile', () => api.getProfile(userId)) },
{ id: 'preferences', deps: [], fetch: streaming('preferences', () => api.getPreferences(userId)) },
{ id: 'flags', deps: [], fetch: streaming('flags', () => api.getFlags(userId)) },
{ id: 'projects', deps: [], fetch: streaming('projects', () => api.getProjects(userId)) },
{ id: 'activity', deps: ['projects'], fetch: streaming('activity', r => api.getActivity((r.projects as Project[]).map(p => p.id))) },
{ id: 'memberCounts', deps: ['projects'], fetch: streaming('memberCounts', r => api.getMemberCounts((r.projects as Project[]).map(p => p.id))) },
{ id: 'recs', deps: ['profile', 'preferences', 'flags'],
fetch: streaming('recs', r => api.getRecommendations({ profile: r.profile, preferences: r.preferences, flags: r.flags })) },
])
await dag.run()
controller.close()
},
})
return new Response(stream, {
headers: { 'Content-Type': 'application/x-ndjson', 'Cache-Control': 'no-store' },
})
}Each node's fetch wraps the real API call in a streaming() helper that writes a line to the response the moment the call resolves. The DAG scheduler ensures activity and memberCounts don't start until projects lands, and recs doesn't start until profile, preferences, and flags all land — but nothing waits for anything it doesn't actually depend on. The client sees lines arrive as each API call finishes, not as batches.
This is the architecture I'd ship for any dashboard with more than five calls and meaningful tail latency. The wins compound: parallelization within batches, streaming across them, and on the client each section has independent loading state instead of one global spinner.
DON'T STREAM IF YOU NEED TRANSACTIONAL CONSISTENCY
Streaming means partial views. If three calls succeed and two fail, the user sees a half-built page. For pages where 'all of it' is the contract — checkout summary, legal disclosures, anything where one wrong field is worse than a longer load — return a single response and fail the whole request on any error.
NDJSON vs Server-Sent Events vs HTTP/2 push
All three carry the same shape (server pushes records over a long-lived response). NDJSON wins for BFF aggregation:
| Transport | Good for | Why I prefer NDJSON for BFF |
|---|---|---|
| NDJSON (chunked HTTP) | Per-page aggregation, same-origin, finite stream | Plain fetch consumes it. No reconnection logic. No special headers. Works in every runtime including edge functions and React Native. |
Server-Sent Events (text/event-stream) | Server-pushed updates that continue indefinitely (notifications, log tails) | EventSource auto-reconnects, which you don't want for a finite page-load aggregation. Different mental model. |
| HTTP/2 push / WebTransport | Specialized push from server to client | More moving parts than the BFF case warrants. Most useful for real-time apps, not aggregation. |
If the BFF response has a defined end (this page's data, no more), use NDJSON. If it's an open subscription that survives across pages, use SSE.
Picking RSC strategy by traffic
Once you have your APIs and dependencies modeled, the rendering strategy decision becomes mechanical. Two axes: traffic (how many requests/day) and personalization (does response depend on the user).
Top-left (high traffic, low personalization) is where Next.js's static and ISR shine. Generate at build time or first request, serve from CDN. Page rendering takes microseconds because no work happens per request.
Top-right (high traffic, personalized) is where streaming RSC + Suspense earns its keep. Stream the static shell first (nav, layout), then stream personalized chunks as they resolve. The user sees content immediately, even if the recommendations API is slow.
Bottom-left (low traffic, low personalization) — RSC with a broad cache TTL is fine. You're not rendering millions of these per day; the cache hit rate matters less. Pay the cost of fresh data.
Bottom-right (low traffic, high personalization) — server actions or client-side fetches. Settings pages, admin tools. No CDN benefit because every render is unique. No need for streaming because traffic is low. Server actions also let you co-locate the mutation handler with the page.
Cache configuration cheat sheet
| Page type | Strategy | `fetch()` config |
|---|---|---|
| Marketing / home | Static + revalidate hourly | next: { revalidate: 3600 } |
| Blog post | Static at build, refetch on tag bust | next: { tags: ['post', slug] } |
| Personalized dashboard | RSC, no shared cache | cache: 'no-store' |
| Per-user notifications | Streaming RSC inside Suspense | cache: 'no-store' + <Suspense> |
| Admin tool | Server action, no caching | cache: 'no-store' + form action |
Workflow for adopting this on an existing app
You don't need to redesign the system. The migration is incremental. The diagram below shows the path I've used twice now.
Three concrete steps, in order:
- Run the audit. Pull the last 30 days of analytics. Sort pages by traffic. Look at the top 20 — those probably account for 80% of requests. The bottom 80% need the least optimization.
- Pick one page to migrate first. Not your hardest one. Pick a high-traffic, low-personalization page — your home or your blog index. The wins will be obvious and immediate.
- Build the DAG, then the cache strategy. Don't write any aggregation code until you've drawn the dependency graph. The graph tells you which calls can run in parallel; the traffic tier tells you what to cache.
When to extract a separate BFF service
Three signals that the in-Next BFF has hit its ceiling:
- A second client (mobile, third-party, partner) needs the same shaped data.
- Aggregation logic is being duplicated across route handlers and RSC files.
- Your Next.js cold-start cost is meaningfully impacted by aggregation dependencies — heavy SDKs, internal-only client libraries, things that don't belong in a render layer.
If none of those apply, the in-Next BFF is fine. Don't pre-emptively split.
Pitfalls to avoid
Over-caching personalized data
The most painful bug. You add next: { revalidate: 3600 } to a fetch that includes user preferences. The platform caches the response. Next user gets the previous user's data. This has shipped to production at multiple companies, including some you've heard of.
CACHEABLE DATA IS DATA THAT'S SAFE TO SHARE
If a fetch's response is keyed on a user, session, or any other request-scoped value, set cache: 'no-store'. Don't trust your future self to remember which fetches included tokens.
Under-caching static data
The opposite mistake. Your home page does five fetches with default settings. The default in Next 15+ is cache: 'no-store'. Now your edge function runs five times per request. Add revalidate or force-cache explicitly, or wrap the helper in unstable_cache().
DAG cycles
Easy to write by accident: profile depends on preferences, preferences include a default profile key, someone wires it backward. The resolver should detect cycles and throw at build/test time, not in production. Add a unit test that resolves your real dashboard graph against stub fetches — it'll catch cycles before deploy.
Cache invalidation cascades
A user updates their profile. Now their preferences page (which embeds their name), their dashboard, their settings, and any rendered notifications referencing them are all stale. With revalidateTag() you can invalidate by tag — but the cascade should be modeled the same way the DAG is. If profile invalidates, anything that fetched profile data should invalidate too.
export const tags = {
profile: (userId: string) => [`profile:${userId}`],
// Downstream caches carry all upstream tags.
// Busting 'profile:userId' also busts preferences, dashboard, etc.
preferences: (userId: string) => [
`preferences:${userId}`,
...tags.profile(userId), // inherit profile tags — cascades on profile invalidation
],
dashboard: (userId: string) => [
`dashboard:${userId}`,
...tags.profile(userId),
...tags.preferences(userId),
],
}
// On mutation:
import { revalidateTag } from 'next/cache'
export async function updateProfile(
userId: string,
updates: ProfileUpdate,
) {
await api.updateProfile(userId, updates)
for (const tag of tags.profile(userId)) {
revalidateTag(tag)
}
}The tag dependencies mirror the DAG. When something at the root mutates, every downstream cache entry busts.
Recap
Three principles drove every decision above:
- Different pages need different rendering strategies. A home page and a dashboard are not the same problem. Pick by traffic and personalization.
- Model API dependencies as a DAG before writing aggregation code. The graph reveals which calls can run in parallel and which can't.
- Co-locate the BFF with Next.js until a second client shows up. Splitting is real engineering work; do it when you have a reason.
Most performance complaints I've heard about Next.js trace back to a missing one of these three. The framework is rarely the bottleneck. The architecture around it is.