Skip to content

LLM Providers

Reactive Agents supports multiple LLM providers through a unified LLMService interface. Switch providers with a single line — your agent code stays the same.

ProviderModelsTool CallingStreamingEmbeddingsPrompt Caching
AnthropicClaude 3.5 Haiku, Claude Sonnet 4, Claude Opus 4YesYesNo (use OpenAI)Yes
OpenAIGPT-4o, GPT-4o-miniYesYesYesNo
Google GeminiGemini 2.0 Flash, Gemini 2.5 ProYesYesNoNo
OllamaAny locally hosted modelYesYesYesNo
LiteLLM100+ models via LiteLLM proxyYesYesNoNo
TestMock provider for testingNoNoNoNo

Set your API key in .env and specify the provider:

import { ReactiveAgents } from "reactive-agents";
// Anthropic
const agent = await ReactiveAgents.create()
.withProvider("anthropic")
.withModel("claude-sonnet-4-20250514")
.build();
// OpenAI
const agent = await ReactiveAgents.create()
.withProvider("openai")
.withModel("gpt-4o")
.build();
// Google Gemini
const agent = await ReactiveAgents.create()
.withProvider("gemini")
.withModel("gemini-2.0-flash")
.build();
// Ollama (local)
const agent = await ReactiveAgents.create()
.withProvider("ollama")
.withModel("llama3")
.build();
// LiteLLM proxy (100+ models)
const agent = await ReactiveAgents.create()
.withProvider("litellm")
.withModel("gpt-4o") // any model supported by your LiteLLM proxy
.build();
Terminal window
# Set the key for your provider
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
OLLAMA_ENDPOINT=http://localhost:11434 # defaults to this
LITELLM_BASE_URL=http://localhost:4000 # LiteLLM proxy endpoint
# Tools (optional)
TAVILY_API_KEY=tvly-... # enables built-in web search
# Optional tuning
LLM_DEFAULT_MODEL=claude-sonnet-4-20250514
LLM_DEFAULT_TEMPERATURE=0.7
LLM_MAX_RETRIES=3
LLM_TIMEOUT_MS=30000

Pre-configured model presets with cost and capability data:

PresetProviderCost/1M InputContext WindowQuality
claude-haikuAnthropic$1.00200K0.60
claude-sonnetAnthropic$3.00200K0.85
claude-opusAnthropic$15.001M1.00
gpt-4o-miniOpenAI$0.15128K0.55
gpt-4oOpenAI$2.50128K0.80
gemini-2.0-flashGemini$0.101M0.75
gemini-2.5-proGemini$1.251M0.95
import { ModelPresets } from "@reactive-agents/llm-provider";
const config = ModelPresets["claude-sonnet"];
// { provider: "anthropic", model: "claude-sonnet-4-20250514", costPer1MInput: 3.0, ... }

When tools are enabled, the LLM can request tool calls. Each provider translates tool definitions to its native format automatically:

  • Anthropic: Uses the tools parameter with Anthropic’s tool use format
  • OpenAI: Uses function_calling with tools array
  • Gemini: Uses functionDeclarations in tools array
  • Ollama: Uses the ollama npm SDK with OpenAI-compatible tool format
  • LiteLLM: OpenAI-compatible tools array forwarded to the proxy, which handles provider-specific translation
const response = await llm.complete({
messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
tools: [{
name: "get_weather",
description: "Get current weather for a city",
inputSchema: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
}],
});
if (response.toolCalls) {
for (const call of response.toolCalls) {
console.log(`Tool: ${call.name}, Input: ${JSON.stringify(call.input)}`);
}
}

Anthropic supports prompt caching for static content, reducing costs on repeated calls:

import { makeCacheable } from "@reactive-agents/llm-provider";
const message = {
role: "user" as const,
content: [
makeCacheable(largeSystemContext), // Cached across requests
{ type: "text" as const, text: dynamicUserInput },
],
};

Embeddings are routed through the configured embedding provider (OpenAI or Ollama), regardless of which chat provider you use:

Terminal window
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
const vectors = await llm.embed(["text to embed", "another text"]);
// Returns: number[][] (one vector per input text)

Embeddings are used by Memory Tier 2 for KNN vector search.

Parse LLM responses into typed objects with automatic retry on parse failure:

import { Schema } from "effect";
const WeatherSchema = Schema.Struct({
city: Schema.String,
temperature: Schema.Number,
conditions: Schema.String,
});
const weather = await llm.completeStructured({
messages: [{ role: "user", content: "Weather in Tokyo" }],
outputSchema: WeatherSchema,
maxParseRetries: 2, // Retries with error feedback on parse failure
});
// weather is fully typed: { city: string, temperature: number, conditions: string }

All providers include built-in retry logic with exponential backoff for transient errors and rate limits:

  • Rate limit (429): Retried with backoff, tracked as LLMRateLimitError
  • Timeout: Configurable per-request, defaults to 30 seconds
  • Retries: Configurable, defaults to 3 attempts

Use the test provider for deterministic, offline testing:

const agent = await ReactiveAgents.create()
.withProvider("test")
.withTestResponses({
"capital of France": "Paris is the capital of France.",
"quantum": "Quantum mechanics describes nature at the atomic scale.",
})
.build();
const result = await agent.run("What is the capital of France?");
// Always returns: "Paris is the capital of France."