LLM Providers
Reactive Agents supports multiple LLM providers through a unified LLMService interface. Switch providers with a single line — your agent code stays the same.
Supported Providers
Section titled “Supported Providers”| Provider | Models | Tool Calling | Streaming | Embeddings | Prompt Caching |
|---|---|---|---|---|---|
| Anthropic | Claude 3.5 Haiku, Claude Sonnet 4, Claude Opus 4 | Yes | Yes | No (use OpenAI) | Yes |
| OpenAI | GPT-4o, GPT-4o-mini | Yes | Yes | Yes | No |
| Google Gemini | Gemini 2.0 Flash, Gemini 2.5 Pro | Yes | Yes | No | No |
| Ollama | Any locally hosted model | Yes | Yes | Yes | No |
| LiteLLM | 100+ models via LiteLLM proxy | Yes | Yes | No | No |
| Test | Mock provider for testing | No | No | No | No |
Configuration
Section titled “Configuration”Set your API key in .env and specify the provider:
import { ReactiveAgents } from "reactive-agents";
// Anthropicconst agent = await ReactiveAgents.create() .withProvider("anthropic") .withModel("claude-sonnet-4-20250514") .build();
// OpenAIconst agent = await ReactiveAgents.create() .withProvider("openai") .withModel("gpt-4o") .build();
// Google Geminiconst agent = await ReactiveAgents.create() .withProvider("gemini") .withModel("gemini-2.0-flash") .build();
// Ollama (local)const agent = await ReactiveAgents.create() .withProvider("ollama") .withModel("llama3") .build();
// LiteLLM proxy (100+ models)const agent = await ReactiveAgents.create() .withProvider("litellm") .withModel("gpt-4o") // any model supported by your LiteLLM proxy .build();Environment Variables
Section titled “Environment Variables”# Set the key for your providerANTHROPIC_API_KEY=sk-ant-...OPENAI_API_KEY=sk-...GOOGLE_API_KEY=...OLLAMA_ENDPOINT=http://localhost:11434 # defaults to thisLITELLM_BASE_URL=http://localhost:4000 # LiteLLM proxy endpoint
# Tools (optional)TAVILY_API_KEY=tvly-... # enables built-in web search
# Optional tuningLLM_DEFAULT_MODEL=claude-sonnet-4-20250514LLM_DEFAULT_TEMPERATURE=0.7LLM_MAX_RETRIES=3LLM_TIMEOUT_MS=30000Model Presets
Section titled “Model Presets”Pre-configured model presets with cost and capability data:
| Preset | Provider | Cost/1M Input | Context Window | Quality |
|---|---|---|---|---|
claude-haiku | Anthropic | $1.00 | 200K | 0.60 |
claude-sonnet | Anthropic | $3.00 | 200K | 0.85 |
claude-opus | Anthropic | $15.00 | 1M | 1.00 |
gpt-4o-mini | OpenAI | $0.15 | 128K | 0.55 |
gpt-4o | OpenAI | $2.50 | 128K | 0.80 |
gemini-2.0-flash | Gemini | $0.10 | 1M | 0.75 |
gemini-2.5-pro | Gemini | $1.25 | 1M | 0.95 |
import { ModelPresets } from "@reactive-agents/llm-provider";
const config = ModelPresets["claude-sonnet"];// { provider: "anthropic", model: "claude-sonnet-4-20250514", costPer1MInput: 3.0, ... }Tool Calling
Section titled “Tool Calling”When tools are enabled, the LLM can request tool calls. Each provider translates tool definitions to its native format automatically:
- Anthropic: Uses the
toolsparameter with Anthropic’s tool use format - OpenAI: Uses
function_callingwithtoolsarray - Gemini: Uses
functionDeclarationsintoolsarray - Ollama: Uses the
ollamanpm SDK with OpenAI-compatible tool format - LiteLLM: OpenAI-compatible
toolsarray forwarded to the proxy, which handles provider-specific translation
const response = await llm.complete({ messages: [{ role: "user", content: "What's the weather in Tokyo?" }], tools: [{ name: "get_weather", description: "Get current weather for a city", inputSchema: { type: "object", properties: { city: { type: "string" } }, required: ["city"], }, }],});
if (response.toolCalls) { for (const call of response.toolCalls) { console.log(`Tool: ${call.name}, Input: ${JSON.stringify(call.input)}`); }}Prompt Caching (Anthropic)
Section titled “Prompt Caching (Anthropic)”Anthropic supports prompt caching for static content, reducing costs on repeated calls:
import { makeCacheable } from "@reactive-agents/llm-provider";
const message = { role: "user" as const, content: [ makeCacheable(largeSystemContext), // Cached across requests { type: "text" as const, text: dynamicUserInput }, ],};Embeddings
Section titled “Embeddings”Embeddings are routed through the configured embedding provider (OpenAI or Ollama), regardless of which chat provider you use:
EMBEDDING_PROVIDER=openaiEMBEDDING_MODEL=text-embedding-3-smallEMBEDDING_DIMENSIONS=1536const vectors = await llm.embed(["text to embed", "another text"]);// Returns: number[][] (one vector per input text)Embeddings are used by Memory Tier 2 for KNN vector search.
Structured Output
Section titled “Structured Output”Parse LLM responses into typed objects with automatic retry on parse failure:
import { Schema } from "effect";
const WeatherSchema = Schema.Struct({ city: Schema.String, temperature: Schema.Number, conditions: Schema.String,});
const weather = await llm.completeStructured({ messages: [{ role: "user", content: "Weather in Tokyo" }], outputSchema: WeatherSchema, maxParseRetries: 2, // Retries with error feedback on parse failure});// weather is fully typed: { city: string, temperature: number, conditions: string }Automatic Retry and Timeout
Section titled “Automatic Retry and Timeout”All providers include built-in retry logic with exponential backoff for transient errors and rate limits:
- Rate limit (429): Retried with backoff, tracked as
LLMRateLimitError - Timeout: Configurable per-request, defaults to 30 seconds
- Retries: Configurable, defaults to 3 attempts
Testing
Section titled “Testing”Use the test provider for deterministic, offline testing:
const agent = await ReactiveAgents.create() .withProvider("test") .withTestResponses({ "capital of France": "Paris is the capital of France.", "quantum": "Quantum mechanics describes nature at the atomic scale.", }) .build();
const result = await agent.run("What is the capital of France?");// Always returns: "Paris is the capital of France."