Engineering

Migrating to AI SDK v5: A Story of Tool Streaming, Caching, and Type Safety

We migrated BrainGrid's core AI infrastructure from AI SDK v4 to v5. Here's how it went.

Felipe Cadavid

July 24, 2025

11 min read

Migrating to AI SDK v5: A Story of Tool Streaming, Caching, and Type Safety

Every developer knows that SDK migrations are like home renovations, they always take longer than expected, uncover hidden problems, and make you question your life choices halfway through. But sometimes, the end result makes it all worthwhile.

This month, we migrated BrainGrid's core AI infrastructure from AI SDK v4.3.16 to v5.0.0-beta.25. We knew it wouldn't be a walk in the park. SDK migrations never are, especially when you're dealing with beta versions. But three features made this migration impossible to ignore:

Tool streaming - Our users were tired of staring at blank screens while AI agents worked
Provider options for tool caching - Faster responses and lower costs? Yes please
Better TypeScript support - Even if it meant touching every file

For context, BrainGrid is an AI-powered platform that helps developers turn messy ideas into crystal-clear specs that AI coding assistants can actually implement. We analyze your codebase, ask the right clarifying questions, and break requirements down into atomic, verifiable, AI-ready tasks. Each task becomes a precise prompt with full context, so your AI IDE (Cursor, Claude Code, etc.) gets it right the first time. Behind the scenes, we're orchestrating multiple specialized agents with dozens of tools—which is why this SDK migration touched everything.

Anyhow, here's how the migration went.

Why We Knew This Would Be Hard (But Did It Anyway)

Let's be honest: nobody migrates to a beta SDK for fun. We had 14 tool definitions, several streaming event handlers, and a complex agent system that our users depend on every day. Breaking any of it wasn't an option.

But our users had a legitimate complaint. When the BrainGrid agent started thinking or composing a requirement, they'd see... nothing. Just a blinking cursor. Was it thinking? Had it crashed? Was it writing War and Peace? Nobody knew until the tool input was finally done.

The Agent experience waiting for a tool call to finish The Agent experience waiting for a tool call to finish

Meanwhile, paying less is nice. Every tool call meant sending the full tool definitions to the API. With complex tools, that's thousands of tokens per request. Anthropic's new cache control feature promised to fix this, but it required v5's provider options support.

So we made the call: temporary migration pain for permanent user gains.

The Migration Journey

1. The Great API Rename

The first surprise came immediately. Every single tool definition needed surgery:


1// Before (v4)
2const readWebPageTool = tool({
3  description: 'Reads and analyzes web content',
4  parameters: z.object({
5    url: z.string().url(),
6    extractImages: z.boolean().optional(),
7  }),
8  execute: async args => {
9    // Tool logic here
10  },
11});
12
13// After (v5)
14const readWebPageTool = tool({
15  description: 'Reads and analyzes web content',
16  inputSchema: z.object({ // 👈 renamed from 'parameters'
17    url: z.string().url(),
18    extractImages: z.boolean().optional(),
19  }),
20  execute: async args => {
21    // Tool logic here
22  },
23});

Not catastrophic, but we had 14 tools across our agent system. That's 14 careful edits, 14 places to potentially break something. But in reality it was actually the easy part.

Then came the tool calls themselves:


1// Before
2if (chunk.type === 'tool-call') {
3  const toolArgs = chunk.args; // 👈 'args'
4  // Process tool call
5}
6
7// After
8if (chunk.type === 'tool-call') {
9  const toolArgs = chunk.input; // 👈 now 'input'
10  // Process tool call
11}

And don't forget about token limits:


1// Before
2streamText({
3  model: anthropic('claude-4-sonnet'),
4  maxTokens: 4096, // 👈 'maxTokens'
5  // ...
6});
7
8// After
9streamText({
10  model: anthropic('claude-4-sonnet'),
11  maxOutputTokens: 4096, // 👈 'maxOutputTokens'
12  // ...
13});

Each change was small, but they added up. Fast.

2. Type System Overhaul

This is where things got interesting. The v5 SDK introduced stricter, more accurate types. Great for catching bugs, painful for migration.

Our entire conversation system was built on the old message types:


1// Before (v4)
2import { Message as AIMessage } from 'ai';
3
4interface Conversation {
5  messages: AIMessage[];
6}
7
8// After (v5)
9import { ModelMessage } from 'ai';
10
11interface Conversation {
12  messages: ModelMessage[];
13}

But that was just the beginning. The new ModelMessage type revealed a fundamental assumption in our code: we assumed message content was always a string.


1// Our token calculator before migration
2function calculateTokens(message: AIMessage): number {
3  const content = message.content as string; // 🚨 Danger!
4  return tokenizer.encode(content).length;
5}

In v5, message content can be:

A simple string: "Hello world"
An array of parts: [{ type: 'text', text: 'Hello' }, { type: 'image', image: '...' }]
Complex content objects

Our token calculator would crash on anything but strings. We built a helper to handle all cases:


1export function extractTextContent(content: unknown): string {
2  if (typeof content === 'string') {
3    return content;
4  }
5
6  if (Array.isArray(content)) {
7    return content
8      .filter(part => part.type === 'text')
9      .map(part => part.text)
10      .join(' ');
11  }
12
13  if (content && typeof content === 'object' && 'text' in content) {
14    return content.text;
15  }
16
17  return '';
18}

But the stricter typing went beyond just content handling. The v5 SDK also made streamText much more strict about message interfaces. Before, we could accidentally pass malformed message objects and get cryptic runtime errors—including one memorable bug where we were accidentally sending the tool name instead of the expected message content. The old SDK would accept it and produce bizarre, hard-to-debug behavior.

Now, TypeScript catches these interface mismatches at compile time:


1// This would have failed silently in v4, causing weird runtime bugs
2const messages: ModelMessage[] = [
3  {
4    role: 'assistant',
5    content: toolName, // 🚨 TypeScript now catches this mistake
6  },
7];
8
9// v5 forces us to be explicit and correct
10const messages: ModelMessage[] = [
11  {
12    role: 'assistant',
13    content: message.content, // ✅ Proper message content
14  },
15];

The silver lining? This exposed multiple real bugs. We'd been undercounting tokens for complex messages for months, and had subtle message formatting issues that occasionally caused confusing AI responses.

3. Streaming Protocol Redesign

Remember those users staring at blank screens? This is where we fixed that. But first, we had to rewrite how we handled streaming.

Every chunk type changed:


1// Before (v4)
2for await (const chunk of stream) {
3  if (chunk.type === 'text-delta') {
4    content += chunk.textDelta;
5  }
6}
7
8// After (v5)
9for await (const chunk of stream) {
10  if (chunk.type === 'text') {
11    content += chunk.text;
12  }
13}

But the real win was tool streaming. Now we could show tool cards the instant an agent started using a tool:


1// When a tool-call chunk arrives
2if (chunk.type === 'tool-call') {
3  setTemporaryStreamMessage(prev => [
4    ...prev,
5    {
6      type: 'tool_call',
7      tool_call: {
8        id: chunk.toolCallId,
9        name: chunk.toolName,
10        arguments: chunk.input,
11        loading: true, // Shows spinner immediately
12      },
13    },
14  ]);
15}

Users now see a card appear instantly when the agent starts using a tool. No more mystery. No more "is it frozen?" support tickets.

4. Control Flow Changes

Here's where we almost shot ourselves in the foot. The old maxSteps parameter got a makeover:


1// Before (v4)
2const result = await generateText({
3  model: anthropic('claude-4-sonnet'),
4  maxSteps: 25,
5  // ...
6});
7
8// After (v5)
9const result = await generateText({
10  model: anthropic('claude-4-sonnet'),
11  stopWhen: stepCountIs(25),
12  // ...
13});

Looks simple enough. But this change hid a critical shift in behavior. It turns out stepCountIs(n) doesn't set a maximum number of steps; it requires the agent to run for exactly n steps. What was maxSteps now behaved like minSteps.

This turned out to be a blessing in disguise. By forcing us to be explicit about the step count, we fixed a subtle issue where agents could occasionally run longer than needed. After adjusting our default step counts, the agents' behavior became smoother and more predictable, which was a great improvement for our complex workflows.


1// In our BaseAgent class
2maxSteps = 5,  // 👈 Used to be 25

The new stopWhen API is actually more powerful. We can now stop on specific conditions:


1stopWhen: [
2  stepCountIs(maxSteps),
3  hasToolCall('generate_clarifying_questions'), // Stop when clarification needed
4];

This feature pleasantly surprised us, as we previously had to meticulously prompt engineer to ensure the agent stopped immediately after invoking the generate_clarifying_questions tool.

5. Enabling Tool Definition Caching

This was the feature that made the customers happy. With v5's provider options, we could finally use Anthropic's cache control on tool definitions:


1const readWebPageTool = tool({
2  description: 'Reads and analyzes web content',
3  inputSchema: z.object({
4    url: z.string().url(),
5    extractImages: z.boolean().optional(),
6  }),
7  providerOptions: {
8    anthropic: {
9      cacheControl: { type: 'ephemeral' }, // 👈 Cache this tool definition
10    },
11  },
12  execute: async args => {
13    // Tool logic
14  },
15});

For frequently-used tools, this means the tool definition is cached on Anthropic's servers. Instead of sending thousands of tokens for complex tool definitions on every request, we send them once and they are cached automatically.

The Results

First and Foremost: Everything Still Works

Let's celebrate the most important achievement: after all these changes, BrainGrid works exactly as it did before. Every agent, every tool, every workflow operates seamlessly. No regressions. No "we'll fix that in v2" compromises.

This might sound like table stakes, but if you've done major migrations, you know it's not. Maintaining 100% compatibility while overhauling the foundation is like changing a car's engine while driving.

But Now It's Better

Here's what our users notice:

Instant feedback: Tool cards appear the moment an agent starts working. No more guessing.
Faster responses: Cached tool definitions mean less data to send, faster processing.
More reliable: The stricter types caught edge cases we didn't know existed.

The agent now shows the tool card right away, so they user knows whats going on. The agent now shows the tool card right away, so they user knows whats going on.

When multiple tool calls happen, the agent still shows the cards right away When multiple tool calls happen, the agent still shows the cards right away

The numbers tell the story:

Support tickets about "frozen" UI: 0 (down from 1-3 per week)
Average tool execution time: 17% faster
API costs from tool definitions: reduced by 7%
Type-related bugs caught: 7 (including that token calculator)

Lessons for Fellow Engineers

After a couple days of migration work, here's what we learned:

1. Pin Your Beta Versions


1"ai": "5.0.0-beta.25"  // Not "^5.0.0-beta.25"

Beta versions can have breaking changes between releases. Pin the exact version and upgrade deliberately.

2. Read the Source, Not Just the Docs

The migration guide covered the basics, but real apps have edge cases. When in doubt, read the SDK source code. It's surprisingly readable and answered questions the docs didn't.

3. Test with Production-Like Scenarios

Our unit tests passed. Our integration tests passed. But they all used simple, single-tool scenarios. We could definitely have missed the maxSteps issue. When it's about AI always run manual tests before shipping to production.

4. Migration Guides Show the Happy Path

Real migrations are messier. Budget time for:

Edge cases the guide doesn't mention
Updating related code that depends on the old behavior
Testing scenarios you forgot existed
Rolling back if something goes catastrophically wrong

5. Document Everything

We kept a migration log every change, every surprise, every "wait, why does this work now?" moment. This blog post started as those notes. Your future self will thank you.

Was It Worth It?

Absolutely.

Our users get instant feedback when agents work. Our infrastructure costs dropped noticeably. Our code is more type-safe and maintainable.

Yes, it took a couple of days instead of an afternoon. Yes, we discovered bugs we didn't know existed. Yes, we questioned our sanity around the second day.

But that's engineering. We don't migrate SDKs because it's easy. We do it because our users deserve better, our infrastructure demands it, and sometimes the beta version has exactly what we need.

Just remember to pin your dependencies.

BrainGrid is the AI-powered planning platform that helps developers turn messy thoughts in to AI-ready requirements and agent tasks. We migrated to AI SDK v5 so our agents could show their work in real-time. See it in action.

Ready to build without the back-and-forth?

Turn messy thoughts into engineering-grade prompts that coding agents can nail, the first time.

Get Started

Back to Blog