Adding Memory to Chatbots
Transform your stateless chatbot into a context-aware assistant that remembers user preferences, conversation history, and key information across sessions.
In the previous tutorial, you built a basic chatbot that processes each message independently. But real conversations have context—users expect the AI to remember what was said earlier. When a customer says "I'd like to return the shoes I mentioned," your chatbot needs to know which shoes they're talking about.
Spring AI solves this with the ChatMemory abstraction and Advisors—components that intercept requests and responses to automatically inject conversation history into each prompt. This means the LLM receives not just the current message, but the full context needed to generate coherent, personalized responses.
Why Conversation Memory Matters
❌ Without Memory
✓ With Memory
Choose Your Memory Strategy
Spring AI supports different memory backends. Choose based on your deployment requirements.
In-Memory (Default)
Fast, simple storage that lives in application memory. Perfect for development and testing.
- + Zero configuration
- + Ultra-fast access
- + No external dependencies
- - Lost on restart
- - Not shared across instances
- - Limited by JVM heap
Redis-Backed
Persistent, distributed storage using Redis. Ideal for production with multiple app instances.
- + Survives restarts
- + Shared across instances
- + TTL support
- - Requires Redis setup
- - Network latency
- - Additional infrastructure
Summarized Memory
Intelligent memory that summarizes old conversations to stay within token limits.
- + Unlimited conversation length
- + Cost-effective
- + Preserves key context
- - May lose details
- - Requires extra LLM calls
- - More complex implementation
Configure In-Memory Storage
Let's start with the simplest approach—in-memory storage. This is perfect for development and single-instance deployments. Spring AI's InMemoryChatMemory stores conversations in a HashMap, keyed by session ID.
@ConfigurationpublicclassChatConfig{@BeanpublicChatMemorychatMemory(){// In-memory storage - lost on restartreturnnewInMemoryChatMemory();}}Create the Memory-Aware Controller
The key to memory is the MessageChatMemoryAdvisor. This advisor automatically: retrieves past messages from storage, appends them to the prompt, saves the new exchange after the response, and manages conversation context seamlessly.
@RestController@RequestMapping("/api/chat")publicclassChatController{privatefinalChatClient chatClient;publicChatController(ChatClient.Builder builder,ChatMemory chatMemory){this.chatClient = builder
.defaultSystem("""
You are a helpful customer support assistant for TechCorp.
Be friendly, professional, and remember user preferences.
If the user mentions their name, remember it for future messages.
""").defaultAdvisors(newMessageChatMemoryAdvisor(chatMemory)).build();}@PostMappingpublicStringchat(@RequestBodyChatRequest request,@RequestHeader("X-Session-Id")String sessionId){return chatClient.prompt().user(request.getMessage()).advisors(advisor -> advisor
.param(CHAT_MEMORY_CONVERSATION_ID_KEY, sessionId).param(CHAT_MEMORY_RETRIEVE_SIZE_KEY,20)).call().content();}}Key Parameters Explained
CHAT_MEMORY_CONVERSATION_ID_KEYUnique identifier for each conversation. Use session IDs, user IDs, or any string that groups related messages together.
CHAT_MEMORY_RETRIEVE_SIZE_KEYHow many previous messages to include. More context = better understanding, but also higher token costs and latency.
Pro tip: The session ID should come from your authentication layer. For anonymous users, generate a UUID on first visit and store it in a cookie or local storage.
Test Your Memory-Enabled Chatbot
Now let's verify that memory is working. Make multiple requests with the same session ID and watch the chatbot maintain context across the conversation.
# Test conversation with memory# First messagecurl-X POST http://localhost:8080/api/chat \-H"Content-Type: application/json"\-H"X-Session-Id: user-123"\-d'{"message": "Hi! My name is Sarah and I love hiking."}'# Response: "Hello Sarah! It is great to meet a fellow hiking enthusiast..."# Second message - same sessioncurl-X POST http://localhost:8080/api/chat \-H"Content-Type: application/json"\-H"X-Session-Id: user-123"\-d'{"message": "What outdoor activities would you recommend for me?"}'# Response: "Since you mentioned loving hiking, Sarah, here are some ideas..."Persistent Memory with Redis
For production deployments, you'll want memory that survives restarts and scales across multiple instances. Here's how to implement a Redis-backed memory store with automatic expiration.
<!-- pom.xml - Add these dependencies --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId></dependency><!-- Optional: For persistent memory with Redis --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-redis</artifactId></dependency>@ConfigurationpublicclassRedisChatMemoryConfig{@BeanpublicChatMemorychatMemory(RedisTemplate<String,Object> redisTemplate){returnnewRedisChatMemory(redisTemplate,Duration.ofHours(24));}}// Custom Redis implementationpublicclassRedisChatMemoryimplementsChatMemory{privatefinalRedisTemplate<String,Object> redisTemplate;privatefinalDuration ttl;privatestaticfinalStringKEY_PREFIX="chat:memory:";publicRedisChatMemory(RedisTemplate<String,Object> redisTemplate,Duration ttl){this.redisTemplate = redisTemplate;this.ttl = ttl;}@Overridepublicvoidadd(String conversationId,List<Message> messages){String key =KEY_PREFIX+ conversationId;List<Message> existing =get(conversationId,Integer.MAX_VALUE);
existing.addAll(messages);
redisTemplate.opsForValue().set(key, existing, ttl);}@OverridepublicList<Message>get(String conversationId,int lastN){String key =KEY_PREFIX+ conversationId;@SuppressWarnings("unchecked")List<Message> messages =(List<Message>) redisTemplate.opsForValue().get(key);if(messages ==null)returnnewArrayList<>();int start =Math.max(0, messages.size()- lastN);returnnewArrayList<>(messages.subList(start, messages.size()));}@Overridepublicvoidclear(String conversationId){
redisTemplate.delete(KEY_PREFIX+ conversationId);}}TTL (Time-To-Live): Set an appropriate expiration for conversations. 24 hours is common for customer support, while longer periods may be needed for ongoing projects or assistant-style applications.
Managing Token Limits with Windowed Memory
Long conversations can exceed the model's context window (e.g., 128K tokens for GPT-4). Windowed memory keeps only the N most recent messages, automatically trimming older ones.
@ConfigurationpublicclassWindowedMemoryConfig{@BeanpublicChatClientchatClient(ChatClient.Builder builder,ChatMemory chatMemory){return builder
.defaultSystem("You are a helpful assistant.").defaultAdvisors(// Use windowed memory - only keep last 10 messagesnewMessageChatMemoryAdvisor(chatMemory,"",10)).build();}}Smart Memory with Conversation Summarization
For the best of both worlds—unlimited conversation length while preserving key context—implement summarization. When conversations get long, older messages are condensed into a summary.
@ServicepublicclassSummarizedMemoryService{privatefinalChatClient chatClient;privatefinalChatMemory chatMemory;privatestaticfinalintSUMMARIZE_THRESHOLD=20;publicStringchat(String sessionId,String userMessage){// Check if conversation is getting longList<Message> history = chatMemory.get(sessionId,Integer.MAX_VALUE);if(history.size()>SUMMARIZE_THRESHOLD){summarizeOldMessages(sessionId, history);}return chatClient.prompt().user(userMessage).advisors(a -> a.param(CHAT_MEMORY_CONVERSATION_ID_KEY, sessionId)).call().content();}privatevoidsummarizeOldMessages(String sessionId,List<Message> messages){// Keep last 5 messages, summarize the restList<Message> toSummarize = messages.subList(0, messages.size()-5);String summary = chatClient.prompt().system("Summarize this conversation in 2-3 sentences, preserving key facts:").user(formatMessages(toSummarize)).call().content();// Clear and start fresh with summary + recent messages
chatMemory.clear(sessionId);
chatMemory.add(sessionId,List.of(newSystemMessage("Previous conversation summary: "+ summary)));
chatMemory.add(sessionId, messages.subList(messages.size()-5, messages.size()));}}Cost consideration: Summarization requires an extra LLM call. Only trigger it when necessary (e.g., every 20 messages) to balance cost and quality.
Memory Best Practices
Use meaningful session IDs
Combine user ID + context (e.g., 'user-123-order-support') for better organization and debugging.
Set appropriate TTLs
Expire old conversations to prevent unbounded storage growth and comply with data retention policies.
Handle memory failures gracefully
If Redis is unavailable, fall back to in-memory or stateless mode rather than failing completely.
Monitor memory usage
Track conversation lengths and storage size. Alert if conversations grow unexpectedly large.