Spring AI Tutorials
    Tutorial 11

    Configuring Multiple LLMs in Spring AI

    Use OpenAI, Anthropic, Ollama, and other providers together in a single application for cost optimization and specialized tasks

    1
    Why Use Multiple LLMs?

    Different AI models excel at different tasks. By configuring multiple LLMs, you can optimize for cost, performance, and capabilities:

    Cost Optimization

    Use cheaper models for simple tasks, expensive models for complex ones

    Latency Control

    Fast local models for quick responses, cloud models for quality

    Specialization

    Code models for programming, vision models for images

    Fallback & Redundancy

    Switch providers if one is down or rate-limited

    Real-World Example

    Use GPT-4o for complex reasoning, Claude for long documents, Llama via Ollama for privacy-sensitive local processing, and GPT-3.5-turbo for simple classification tasks.

    2
    Configuration Setup

    Step 1: Add Dependencies for Each Provider

    Xml Example
    <!-- OpenAI --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId></dependency><!-- Anthropic Claude --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-anthropic-spring-boot-starter</artifactId></dependency><!-- Ollama (local models) --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-ollama-spring-boot-starter</artifactId></dependency>

    Step 2: Configure API Keys

    Yaml Example
    spring:ai:openai:api-key: ${OPENAI_API_KEY}chat:options:model: gpt-4o
    temperature:0.7anthropic:api-key: ${ANTHROPIC_API_KEY}chat:options:model: claude-3-5-sonnet-20241022max-tokens:4096ollama:base-url: http://localhost:11434chat:options:model: llama3.2

    Auto-Configuration Note

    When multiple starters are present, Spring AI auto-configures all of them. You'll need to qualify which one to inject or create named beans.

    3
    Creating Named ChatClients

    Define Multiple ChatClient Beans

    Java Example
    importorg.springframework.ai.chat.client.ChatClient;importorg.springframework.ai.openai.OpenAiChatModel;importorg.springframework.ai.anthropic.AnthropicChatModel;importorg.springframework.ai.ollama.OllamaChatModel;importorg.springframework.context.annotation.Bean;importorg.springframework.context.annotation.Configuration;importorg.springframework.context.annotation.Primary;@ConfigurationpublicclassChatClientConfig{@Bean@Primary// Default when no qualifier specifiedpublicChatClientopenAiChatClient(OpenAiChatModel openAiModel){returnChatClient.builder(openAiModel).defaultSystem("You are a helpful assistant powered by GPT-4o.").build();}@BeanpublicChatClientclaudeChatClient(AnthropicChatModel claudeModel){returnChatClient.builder(claudeModel).defaultSystem("You are Claude, an AI assistant by Anthropic.").build();}@BeanpublicChatClientollamaChatClient(OllamaChatModel ollamaModel){returnChatClient.builder(ollamaModel).defaultSystem("You are a local AI assistant running on Ollama.").build();}}

    Inject Specific ChatClients

    Java Example
    @Service@RequiredArgsConstructorpublicclassMultiModelService{// Inject by bean name using @Qualifier@Qualifier("openAiChatClient")privatefinalChatClient openAiClient;@Qualifier("claudeChatClient")privatefinalChatClient claudeClient;@Qualifier("ollamaChatClient")privatefinalChatClient ollamaClient;publicStringaskOpenAI(String question){return openAiClient.prompt().user(question).call().content();}publicStringaskClaude(String question){return claudeClient.prompt().user(question).call().content();}publicStringaskLocal(String question){return ollamaClient.prompt().user(question).call().content();}}

    4
    Smart Router Pattern

    Create a router service that automatically selects the best model based on the task:

    Java Example
    @Service@Slf4jpublicclassLLMRouter{privatefinalChatClient openAiClient;privatefinalChatClient claudeClient;privatefinalChatClient ollamaClient;publicLLMRouter(@Qualifier("openAiChatClient")ChatClient openAiClient,@Qualifier("claudeChatClient")ChatClient claudeClient,@Qualifier("ollamaChatClient")ChatClient ollamaClient){this.openAiClient = openAiClient;this.claudeClient = claudeClient;this.ollamaClient = ollamaClient;}publicStringroute(String query,TaskType taskType){ChatClient selectedClient =switch(taskType){caseCODE_GENERATION->{
    log.info("Using GPT-4o for code generation");yield openAiClient;}caseLONG_DOCUMENT->{
    log.info("Using Claude for long document (200K context)");yield claudeClient;}caseSIMPLE_CLASSIFICATION->{
    log.info("Using local Ollama for simple task");yield ollamaClient;}caseSENSITIVE_DATA->{
    log.info("Using local Ollama for privacy");yield ollamaClient;}default-> openAiClient;};return selectedClient.prompt().user(query).call().content();}publicenumTaskType{CODE_GENERATION,LONG_DOCUMENT,SIMPLE_CLASSIFICATION,SENSITIVE_DATA,GENERAL}}

    Usage Example

    Java Example
    @RestController@RequestMapping("/api/ai")@RequiredArgsConstructorpublicclassAIController{privatefinalLLMRouter router;@PostMapping("/ask")publicResponseEntity<String>ask(@RequestParamString query,@RequestParam(defaultValue ="GENERAL")TaskType taskType){String response = router.route(query, taskType);returnResponseEntity.ok(response);}}

    Cost Savings

    This pattern can reduce API costs by 60-80% by routing simple tasks to cheaper or local models.

    5
    Fallback & Retry Pattern

    Implement resilient AI calls with automatic fallback to alternative providers:

    Java Example
    @Service@Slf4jpublicclassResilientAIService{privatefinalList<ChatClient> clientPriorityList;publicResilientAIService(@Qualifier("openAiChatClient")ChatClient openAi,@Qualifier("claudeChatClient")ChatClient claude,@Qualifier("ollamaChatClient")ChatClient ollama){// Priority order: OpenAI -> Claude -> Ollama (local fallback)this.clientPriorityList =List.of(openAi, claude, ollama);}publicStringcallWithFallback(String prompt){for(int i =0; i < clientPriorityList.size(); i++){ChatClient client = clientPriorityList.get(i);try{
    log.info("Attempting provider {} of {}", i +1, clientPriorityList.size());return client.prompt().user(prompt).call().content();}catch(Exception e){
    log.warn("Provider {} failed: {}. Trying next...", i +1, e.getMessage());if(i == clientPriorityList.size()-1){thrownewRuntimeException("All providers failed", e);}}}thrownewRuntimeException("No providers available");}}

    Production Tip

    Consider using Spring Retry or Resilience4j for more sophisticated retry policies with exponential backoff and circuit breakers.

    What You've Learned

    Multi-LLM Benefits

    Cost, latency, and specialization

    Configuration

    Multiple provider setup in YAML

    Named Beans

    @Qualifier for specific clients

    Router Pattern

    Task-based model selection

    Fallback Pattern

    Resilient multi-provider calls

    Cost Optimization

    60-80% savings with smart routing

    💬 Comments & Discussion