Configuring Multiple LLMs in Spring AI
Use OpenAI, Anthropic, Ollama, and other providers together in a single application for cost optimization and specialized tasks
1Why Use Multiple LLMs?
Different AI models excel at different tasks. By configuring multiple LLMs, you can optimize for cost, performance, and capabilities:
Cost Optimization
Use cheaper models for simple tasks, expensive models for complex ones
Latency Control
Fast local models for quick responses, cloud models for quality
Specialization
Code models for programming, vision models for images
Fallback & Redundancy
Switch providers if one is down or rate-limited
Real-World Example
Use GPT-4o for complex reasoning, Claude for long documents, Llama via Ollama for privacy-sensitive local processing, and GPT-3.5-turbo for simple classification tasks.
2Configuration Setup
Step 1: Add Dependencies for Each Provider
<!-- OpenAI --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId></dependency><!-- Anthropic Claude --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-anthropic-spring-boot-starter</artifactId></dependency><!-- Ollama (local models) --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-ollama-spring-boot-starter</artifactId></dependency>Step 2: Configure API Keys
spring:ai:openai:api-key: ${OPENAI_API_KEY}chat:options:model: gpt-4o
temperature:0.7anthropic:api-key: ${ANTHROPIC_API_KEY}chat:options:model: claude-3-5-sonnet-20241022max-tokens:4096ollama:base-url: http://localhost:11434chat:options:model: llama3.2Auto-Configuration Note
When multiple starters are present, Spring AI auto-configures all of them. You'll need to qualify which one to inject or create named beans.
3Creating Named ChatClients
Define Multiple ChatClient Beans
importorg.springframework.ai.chat.client.ChatClient;importorg.springframework.ai.openai.OpenAiChatModel;importorg.springframework.ai.anthropic.AnthropicChatModel;importorg.springframework.ai.ollama.OllamaChatModel;importorg.springframework.context.annotation.Bean;importorg.springframework.context.annotation.Configuration;importorg.springframework.context.annotation.Primary;@ConfigurationpublicclassChatClientConfig{@Bean@Primary// Default when no qualifier specifiedpublicChatClientopenAiChatClient(OpenAiChatModel openAiModel){returnChatClient.builder(openAiModel).defaultSystem("You are a helpful assistant powered by GPT-4o.").build();}@BeanpublicChatClientclaudeChatClient(AnthropicChatModel claudeModel){returnChatClient.builder(claudeModel).defaultSystem("You are Claude, an AI assistant by Anthropic.").build();}@BeanpublicChatClientollamaChatClient(OllamaChatModel ollamaModel){returnChatClient.builder(ollamaModel).defaultSystem("You are a local AI assistant running on Ollama.").build();}}Inject Specific ChatClients
@Service@RequiredArgsConstructorpublicclassMultiModelService{// Inject by bean name using @Qualifier@Qualifier("openAiChatClient")privatefinalChatClient openAiClient;@Qualifier("claudeChatClient")privatefinalChatClient claudeClient;@Qualifier("ollamaChatClient")privatefinalChatClient ollamaClient;publicStringaskOpenAI(String question){return openAiClient.prompt().user(question).call().content();}publicStringaskClaude(String question){return claudeClient.prompt().user(question).call().content();}publicStringaskLocal(String question){return ollamaClient.prompt().user(question).call().content();}}4Smart Router Pattern
Create a router service that automatically selects the best model based on the task:
@Service@Slf4jpublicclassLLMRouter{privatefinalChatClient openAiClient;privatefinalChatClient claudeClient;privatefinalChatClient ollamaClient;publicLLMRouter(@Qualifier("openAiChatClient")ChatClient openAiClient,@Qualifier("claudeChatClient")ChatClient claudeClient,@Qualifier("ollamaChatClient")ChatClient ollamaClient){this.openAiClient = openAiClient;this.claudeClient = claudeClient;this.ollamaClient = ollamaClient;}publicStringroute(String query,TaskType taskType){ChatClient selectedClient =switch(taskType){caseCODE_GENERATION->{
log.info("Using GPT-4o for code generation");yield openAiClient;}caseLONG_DOCUMENT->{
log.info("Using Claude for long document (200K context)");yield claudeClient;}caseSIMPLE_CLASSIFICATION->{
log.info("Using local Ollama for simple task");yield ollamaClient;}caseSENSITIVE_DATA->{
log.info("Using local Ollama for privacy");yield ollamaClient;}default-> openAiClient;};return selectedClient.prompt().user(query).call().content();}publicenumTaskType{CODE_GENERATION,LONG_DOCUMENT,SIMPLE_CLASSIFICATION,SENSITIVE_DATA,GENERAL}}Usage Example
@RestController@RequestMapping("/api/ai")@RequiredArgsConstructorpublicclassAIController{privatefinalLLMRouter router;@PostMapping("/ask")publicResponseEntity<String>ask(@RequestParamString query,@RequestParam(defaultValue ="GENERAL")TaskType taskType){String response = router.route(query, taskType);returnResponseEntity.ok(response);}}Cost Savings
This pattern can reduce API costs by 60-80% by routing simple tasks to cheaper or local models.
5Fallback & Retry Pattern
Implement resilient AI calls with automatic fallback to alternative providers:
@Service@Slf4jpublicclassResilientAIService{privatefinalList<ChatClient> clientPriorityList;publicResilientAIService(@Qualifier("openAiChatClient")ChatClient openAi,@Qualifier("claudeChatClient")ChatClient claude,@Qualifier("ollamaChatClient")ChatClient ollama){// Priority order: OpenAI -> Claude -> Ollama (local fallback)this.clientPriorityList =List.of(openAi, claude, ollama);}publicStringcallWithFallback(String prompt){for(int i =0; i < clientPriorityList.size(); i++){ChatClient client = clientPriorityList.get(i);try{
log.info("Attempting provider {} of {}", i +1, clientPriorityList.size());return client.prompt().user(prompt).call().content();}catch(Exception e){
log.warn("Provider {} failed: {}. Trying next...", i +1, e.getMessage());if(i == clientPriorityList.size()-1){thrownewRuntimeException("All providers failed", e);}}}thrownewRuntimeException("No providers available");}}Production Tip
Consider using Spring Retry or Resilience4j for more sophisticated retry policies with exponential backoff and circuit breakers.
What You've Learned
Multi-LLM Benefits
Cost, latency, and specialization
Configuration
Multiple provider setup in YAML
Named Beans
@Qualifier for specific clients
Router Pattern
Task-based model selection
Fallback Pattern
Resilient multi-provider calls
Cost Optimization
60-80% savings with smart routing