How to stream chat model responses
All chat
models
implement the Runnable
interface,
which comes with default implementations of standard runnable
methods (i.e.Β invoke
, batch
, stream
, streamEvents
). This guide
covers how to use these methods to stream output from chat models.
The default implementation does not provide support for
token-by-token streaming, and will instead return an
AsyncGenerator
that will yield all model output in a single chunk. It exists to ensures
that the the model can be swapped in for any other model as it supports
the same standard interface.
The ability to stream the output token-by-token depends on whether the provider has implemented token-by-token streaming support.
You can see which integrations support token-by-token streaming here.
Streamingβ
Below, we use a ---
to help visualize the delimiter between tokens.
Pick your chat model:
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
- Groq
- VertexAI
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
Add environment variables
OPENAI_API_KEY=your-api-key
Instantiate the model
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/anthropic
yarn add @langchain/anthropic
pnpm add @langchain/anthropic
Add environment variables
ANTHROPIC_API_KEY=your-api-key
Instantiate the model
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
model: "claude-3-5-sonnet-20240620",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
Add environment variables
FIREWORKS_API_KEY=your-api-key
Instantiate the model
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const model = new ChatFireworks({
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/mistralai
yarn add @langchain/mistralai
pnpm add @langchain/mistralai
Add environment variables
MISTRAL_API_KEY=your-api-key
Instantiate the model
import { ChatMistralAI } from "@langchain/mistralai";
const model = new ChatMistralAI({
model: "mistral-large-latest",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/groq
yarn add @langchain/groq
pnpm add @langchain/groq
Add environment variables
GROQ_API_KEY=your-api-key
Instantiate the model
import { ChatGroq } from "@langchain/groq";
const model = new ChatGroq({
model: "mixtral-8x7b-32768",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai
yarn add @langchain/google-vertexai
pnpm add @langchain/google-vertexai
Add environment variables
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
Instantiate the model
import { ChatVertexAI } from "@langchain/google-vertexai";
const model = new ChatVertexAI({
model: "gemini-1.5-flash",
temperature: 0
});
const stream = await model.stream(
"Write me a 1 verse song about goldfish on the moon"
);
for await (const chunk of stream) {
console.log(`${chunk.content}\n---`);
}
---
Here's
---
a one
---
-
---
verse song about goldfish on
---
the moon:
Verse
---
:
Swimming
---
through the stars
---
,
---
in
---
a cosmic
---
lag
---
oon
---
Little
---
golden
---
scales
---
,
---
reflecting the moon
---
No
---
gravity to
---
hold them,
---
they
---
float with
---
glee
Goldfish
---
astron
---
auts, on a lunar
---
sp
---
ree
---
Bub
---
bles rise
---
like
---
com
---
ets, in the
---
star
---
ry night
---
Their fins like
---
tiny
---
rockets, a
---
w
---
ondrous sight
Who
---
knew
---
these
---
small
---
creatures
---
,
---
could con
---
quer space?
---
Goldfish on the moon,
---
with
---
such
---
fis
---
hy grace
---
---
---
Stream eventsβ
Chat models also support the standard
streamEvents()
method to stream more granular events from within chains.
This method is useful if youβre streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser):
const eventStream = await model.streamEvents(
"Write me a 1 verse song about goldfish on the moon",
{
version: "v2",
}
);
const events = [];
for await (const event of eventStream) {
events.push(event);
}
events.slice(0, 3);
[
{
event: "on_chat_model_start",
data: { input: "Write me a 1 verse song about goldfish on the moon" },
name: "ChatAnthropic",
tags: [],
run_id: "d60a87d6-acf0-4ae1-bf27-e570aa101960",
metadata: {
ls_provider: "openai",
ls_model_name: "claude-3-5-sonnet-20240620",
ls_model_type: "chat",
ls_temperature: 1,
ls_max_tokens: 2048,
ls_stop: undefined
}
},
{
event: "on_chat_model_stream",
run_id: "d60a87d6-acf0-4ae1-bf27-e570aa101960",
name: "ChatAnthropic",
tags: [],
metadata: {
ls_provider: "openai",
ls_model_name: "claude-3-5-sonnet-20240620",
ls_model_type: "chat",
ls_temperature: 1,
ls_max_tokens: 2048,
ls_stop: undefined
},
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: "",
additional_kwargs: [Object],
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "",
name: undefined,
additional_kwargs: {
id: "msg_01JaaH9ZUXg7bUnxzktypRak",
type: "message",
role: "assistant",
model: "claude-3-5-sonnet-20240620"
},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
usage_metadata: undefined
}
}
},
{
event: "on_chat_model_stream",
run_id: "d60a87d6-acf0-4ae1-bf27-e570aa101960",
name: "ChatAnthropic",
tags: [],
metadata: {
ls_provider: "openai",
ls_model_name: "claude-3-5-sonnet-20240620",
ls_model_type: "chat",
ls_temperature: 1,
ls_max_tokens: 2048,
ls_stop: undefined
},
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: "Here's",
additional_kwargs: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Here's",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
usage_metadata: undefined
}
}
}
]
Next stepsβ
Youβve now seen a few ways you can stream chat model responses.
Next, check out this guide for more on streaming with other LangChain modules.