OpenAI Integration

    Integrate OpenAI's GPT models into your Spring Boot applications with a unified API, streaming support, and rigorous type safety.

    OpenAI has set the standard for modern AI capabilities with models like GPT-4,GPT-4 Turbo, and GPT-4o. Spring AI provides first-class support for OpenAI, giving Java developers access to chat completions, embeddings, image generation with DALL-E, speech synthesis, and audio transcription—all through a consistent, type-safe API.

    The OpenAI integration handles connection management, automatic retries, rate limiting backoff, and streaming. You write business logic; Spring AI handles the infrastructure. And when you need to switch to Azure OpenAI for enterprise deployment, your code stays the same—only configuration changes.

    Available OpenAI Models

    Chat Models

    • GPT-4o
      Latest
    • GPT-4 Turbo128k context
    • GPT-48k/32k context
    • GPT-3.5 Turbo
      Budget

    Other Capabilities

    • DALL-E 3 — Image generation
    • Whisper — Speech to text
    • TTS — Text to speech
    • Vision — Image understanding

    Core Concepts

    Tokens & Context

    LLMs process text in chunks called tokens (≈0.75 words). Each model has a strict context window—GPT-4 Turbo supports 128k tokens, while GPT-4 supports 8k/32k. This limit includes both input and output.

    Chat Roles

    System: Sets behavioral guidelines and persona.
    User: The actual query or input from humans.
    Assistant: The model's response (use for conversation history).

    Temperature

    Controls randomness (0.0 to 2.0). Use 0.0-0.3 for factual/code tasks, 0.7-1.0 for creative writing. Higher values increase variety but may reduce coherence.

    Understanding these concepts is essential for cost control and output quality. Token limits determine how much context you can provide—for RAG applications, you might use 80% of the context for retrieved documents and 20% for the actual conversation. Temperature dramatically affects output: customer support bots should use low temperature for consistent answers, while brainstorming assistants benefit from higher values.

    Getting Started

    Configuration

    Add Spring AI OpenAI dependency and configure your API key

    pom.xml
    <!-- Maven Dependency --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId></dependency><!-- Add Spring AI BOM for version management --><dependencyManagement><dependencies><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-bom</artifactId><version>1.0.0-M4</version><type>pom</type><scope>import</scope></dependency></dependencies></dependencyManagement>

    Application Properties

    Configure your OpenAI connection settings

    application.properties
    # Required: Your OpenAI API key (use environment variable in production)spring.ai.openai.api-key=${OPENAI_API_KEY}# Model selection (default: gpt-4o)spring.ai.openai.chat.options.model=gpt-4o# Generation parametersspring.ai.openai.chat.options.temperature=0.7spring.ai.openai.chat.options.max-tokens=2000spring.ai.openai.chat.options.top-p=1.0# Optional: Organization ID (for teams)spring.ai.openai.organization-id=org-xxxx# Optional: Override base URL (for proxies or Azure)# spring.ai.openai.base-url=https://your-proxy.com/v1

    Never commit API keys! Use environment variables: export OPENAI_API_KEY=sk-...or Spring's @Value("${OPENAI_API_KEY}") injection.

    Basic Usage

    Chat Service Implementation

    OpenAIChatService.java
    @ServicepublicclassOpenAIChatService{privatefinalChatClient chatClient;publicOpenAIChatService(ChatClient.Builder builder){this.chatClient = builder
    .defaultSystem("""
    You are a helpful customer service assistant for TechStore.
    Be friendly, concise, and helpful. If you don't know something,
    say so honestly rather than making up information.
    """).build();}// Simple chat - returns complete responsepublicStringchat(String userMessage){return chatClient.prompt().user(userMessage).call().content();}// Streaming - returns tokens as they're generatedpublicFlux<String>streamChat(String userMessage){return chatClient.prompt().user(userMessage).stream().content();}// With conversation historypublicStringchatWithHistory(String userMessage,List<Message> history){return chatClient.prompt().messages(history).user(userMessage).call().content();}}

    The ChatClient.Builder is injected automatically by Spring AI. Use .defaultSystem() to set a persona that applies to all requests—this is where you define your assistant's behavior, personality, and constraints. The system prompt is crucial for consistent, on-brand responses.

    Advanced Features

    Streaming with SSE

    Stream responses to your frontend in real-time using Server-Sent Events.

    Streaming Controller
    @GetMapping(value ="/chat/stream",
    produces =MediaType.TEXT_EVENT_STREAM_VALUE)publicFlux<String>streamChat(@RequestParamString message){return chatClient.prompt().user(message).stream().content().map(chunk ->"data: "+ chunk +"\n\n");}

    Structured Output (JSON Mode)

    Parse AI responses directly into Java objects—no regex needed.

    Structured Output
    publicrecordProductInfo(String name,String category,BigDecimal price,List<String> features
    ){}publicProductInfoextractProduct(String description){return chatClient.prompt().user("Extract product info: "+ description).call().entity(ProductInfo.class);}

    Custom Model Parameters

    Override default settings per-request for fine-grained control over model behavior.

    Per-Request Options
    importorg.springframework.ai.openai.OpenAiChatOptions;publicStringgenerateCreativeContent(String prompt){return chatClient.prompt().user(prompt).options(OpenAiChatOptions.builder().model("gpt-4o").temperature(0.9)// More creative.maxTokens(2000)// Longer responses.topP(0.95)// Nucleus sampling.presencePenalty(0.6)// Encourage new topics.frequencyPenalty(0.3)// Reduce repetition.build()).call().content();}publicStringgenerateCode(String spec){return chatClient.prompt().user(spec).options(OpenAiChatOptions.builder().model("gpt-4-turbo").temperature(0.0)// Deterministic.maxTokens(4000)// Room for code.build()).call().content();}

    Vision: Image Understanding

    GPT-4o and GPT-4 Vision can analyze images. Pass images as URLs or base64-encoded data.

    Vision API Usage
    publicStringanalyzeImage(String imageUrl,String question){var userMessage =newUserMessage(question,List.of(newMedia(MimeTypeUtils.IMAGE_PNG, imageUrl)));return chatClient.prompt().messages(userMessage).call().content();}// Example: Analyze a product imageString analysis =analyzeImage("https://example.com/product.jpg","Describe this product. What are its key features?");

    Pricing & Cost Optimization

    GPT-3.5 Turbo

    Budget-friendly

    $0.50

    per 1M input tokens

    Best for: Simple Q&A, classification, summarization

    GPT-4o

    Balanced

    $5.00

    per 1M input tokens

    Best for: Complex tasks, code, multimodal

    GPT-4 Turbo

    Maximum context

    $10.00

    per 1M input tokens

    Best for: Long documents, complex reasoning

    Cost optimization tips:
    • Use GPT-3.5 Turbo for simple tasks—it's 10-20x cheaper
    • Set maxTokens to prevent unexpectedly long responses
    • Cache responses for repeated queries with Spring Cache
    • Use smaller models for initial filtering, GPT-4 for final answers

    Error Handling & Rate Limits

    Resilient Service with Retry
    @ServicepublicclassResilientOpenAIService{privatefinalChatClient chatClient;@Retryable(
    value ={OpenAiApiException.class},
    maxAttempts =3,
    backoff =@Backoff(delay =1000, multiplier =2))publicStringchat(String message){try{return chatClient.prompt().user(message).call().content();}catch(OpenAiApiException e){if(e.getStatusCode()==429){
    log.warn("Rate limit hit, retry scheduled...");throw e;// Will be retried}if(e.getStatusCode()==503){
    log.error("OpenAI service unavailable");thrownewServiceUnavailableException("AI service temporarily down");}throw e;}}@RecoverpublicStringfallback(OpenAiApiException e,String message){
    log.error("All retries exhausted for: {}", message);return"I'm sorry, our AI service is currently unavailable. Please try again later.";}}

    OpenAI enforces rate limits based on your tier—typically requests per minute (RPM) and tokens per minute (TPM). When you hit these limits, the API returns a 429 Too Many Requests error.Spring Retry with exponential backoff handles this gracefully, automatically waiting before retrying.

    Best Practices

    💰 Cost Management

    • Set maxTokens to prevent runaway costs
    • Use GPT-3.5 Turbo for simple tasks (10x cheaper)
    • Cache common queries with Spring Cache
    • Monitor usage via OpenAI dashboard

    🛡️ Security

    • Never expose API keys in frontend code
    • Sanitize user input to prevent prompt injection
    • Implement rate limiting per user
    • Use content moderation for user inputs

    ⚡ Performance

    • Use streaming for better perceived latency
    • Batch similar requests when possible
    • Set appropriate timeouts (30-60s typical)
    • Monitor response times and token usage

    🎯 Quality

    • Write detailed, specific system prompts
    • Include examples in prompts for consistency
    • Use temperature 0 for factual tasks
    • Test prompts with varied inputs

    Start Building with OpenAI

    Now that you understand OpenAI integration, build intelligent chatbots, implement RAG, or explore other AI providers supported by Spring AI.