Back to LangChain4j Overview
    Level 3
    25 min read

    Memory & Conversations

    Build AI applications that remember context

    Spring AI Craft Team
    Updated Dec 2024

    The Memory Problem

    Large Language Models are inherently stateless. Each API call is independent — the model has no memory of previous interactions. This might seem like a limitation, but it's actually by design. Statelessness makes LLMs scalable and predictable, but it creates a challenge when building conversational applications that feel natural and human-like.

    Consider this scenario: a user asks "What's the capital of France?" and the AI responds "Paris." Then the user follows up with "What's the population?" Without memory, the AI has no idea what "the" refers to — it doesn't know you were talking about Paris. Every conversation would require restating all context, making the user experience frustrating and unnatural. Imagine if every time you talked to a friend, they had no recollection of anything you'd ever discussed before — that's what stateless AI feels like.

    The impact on user experience is significant. Users expect AI assistants to behave like human conversationalists — remembering names, preferences, and the thread of discussion. A customer service bot that can't recall the issue a user described two messages ago is essentially useless. An AI coding assistant that forgets what project you're working on after each question becomes tedious. Memory is what transforms a simple Q&A system into a genuine conversational partner.

    LangChain4j solves this with ChatMemory — a mechanism that stores conversation history and automatically includes it with each new request. The result is AI that feels like it truly understands and remembers your conversation, even though the underlying model is stateless. Behind the scenes, LangChain4j is simply prepending the conversation history to each new prompt, but to the user, it appears seamless.

    Without Memory

    User: What's the capital of France?

    AI: Paris is the capital of France.

    User: What's its population?

    AI: I'm not sure what you're referring to. Could you please specify?

    With Memory

    User: What's the capital of France?

    AI: Paris is the capital of France.

    User: What's its population?

    AI: Paris has a population of approximately 2.1 million in the city proper.

    ChatMemory Basics

    LangChain4j provides the ChatMemory interface with several implementations. The simplest is MessageWindowChatMemory, which keeps the last N messages in memory. This is perfect for most conversational applications where you need recent context but don't need to remember everything from hours or days ago.

    The window-based approach is elegant because it naturally handles the token limit problem. LLMs have maximum context lengths (typically 4K to 128K tokens), and you can't send infinite conversation history. By keeping only recent messages, MessageWindowChatMemory ensures you never exceed these limits while maintaining the most relevant context. It's like how humans naturally focus on recent conversation when chatting — you don't need to remember every word from an hour ago to continue a productive discussion.

    Simple In-Memory Chat

    Here's the most basic way to add memory to your chat application. The memory automatically tracks user messages and AI responses, including them in subsequent prompts. Notice how little code is required — LangChain4j's AI Services abstraction handles all the complexity:

    MemoryChatService.java
    importdev.langchain4j.memory.ChatMemory;importdev.langchain4j.memory.chat.MessageWindowChatMemory;importdev.langchain4j.model.chat.ChatLanguageModel;importdev.langchain4j.service.AiServices;publicclassMemoryChatService{interfaceAssistant{Stringchat(String userMessage);}publicstaticvoidmain(String[] args){ChatLanguageModel model =OpenAiChatModel.builder().apiKey(System.getenv("OPENAI_API_KEY")).modelName("gpt-4o-mini").build();// Create memory that keeps last 10 messagesChatMemory memory =MessageWindowChatMemory.withMaxMessages(10);// Create AI service with memoryAssistant assistant =AiServices.builder(Assistant.class).chatLanguageModel(model).chatMemory(memory).build();// Conversation with memorySystem.out.println(assistant.chat("My name is Alice"));// "Hello Alice! Nice to meet you."System.out.println(assistant.chat("What is my name?"));// "Your name is Alice."}}

    How It Works Under the Hood

    When you call chat(), LangChain4j retrieves all messages from memory, appends your new message, sends everything to the LLM, and then stores both the user message and AI response back into memory. This happens automatically — you just call chat() like normal.

    Memory Window Size

    Choosing the right window size is a balance between context and cost. More messages mean better context but higher token usage and slower responses. Start with 10-20 messages and adjust based on your use case:

    MemoryWindowExamples.java
    // Small window: Fast, cheap, limited contextChatMemory quickChat =MessageWindowChatMemory.withMaxMessages(5);// Medium window: Good balance for most appsChatMemory standardChat =MessageWindowChatMemory.withMaxMessages(20);// Large window: Maximum context, higher costChatMemory deepChat =MessageWindowChatMemory.withMaxMessages(50);// Token-based window: More precise controlChatMemory tokenBased =TokenWindowChatMemory.builder().maxTokens(4000)// ~3000 words of context.tokenizer(newOpenAiTokenizer()).build();

    Multi-User Memory Management

    In real applications, you'll have multiple users chatting simultaneously. Each user needs their own isolated memory — you don't want User A's conversation leaking into User B's chat. This isolation is critical for both privacy and functionality. Imagine a banking chatbot that confused one customer's account details with another's — it would be a disaster.

    LangChain4j handles this elegantly with ChatMemoryProvider. The ChatMemoryProvider is a factory that creates or retrieves memory instances based on a memory ID. Typically, this ID corresponds to a user ID, session ID, or conversation thread ID. Each unique ID gets its own isolated memory space, completely separate from all others.

    This pattern also enables powerful features like conversation threading. A single user might have multiple parallel conversations — one about their order, another about a technical question. By using different memory IDs for each thread, you can maintain completely separate contexts for each conversation, just like how messaging apps keep different chat threads independent.

    MultiUserChatService.java
    importdev.langchain4j.memory.chat.ChatMemoryProvider;publicclassMultiUserChatService{interfaceAssistant{Stringchat(@MemoryIdString odUserId,@UserMessageString message);}publicstaticvoidmain(String[] args){// Memory provider creates separate memory for each userChatMemoryProvider memoryProvider = memoryId ->MessageWindowChatMemory.builder().id(memoryId).maxMessages(20).build();Assistant assistant =AiServices.builder(Assistant.class).chatLanguageModel(model).chatMemoryProvider(memoryProvider).build();// Each user has isolated memory
    assistant.chat("user-123","I love hiking");
    assistant.chat("user-456","I love cooking");// Memories are separate
    assistant.chat("user-123","What do I love?");// "You mentioned that you love hiking!"
    assistant.chat("user-456","What do I love?");// "You mentioned that you love cooking!"}}

    Memory Leak Warning

    In-memory storage grows unboundedly if you create new memory IDs without cleanup. In production, use a persistent store (Redis, database) with TTL policies, or implement periodic cleanup of inactive conversations.

    Persistent Memory Storage

    In-memory storage is great for development and testing, but in production you need persistence. When your application restarts, all conversations would be lost with in-memory storage. For enterprise applications where continuity matters, this is unacceptable. Users expect to return to a conversation days later and have the AI remember what was discussed.

    LangChain4j integrates with various data stores to persist conversation history across restarts and even across multiple application instances. Whether you're using PostgreSQL, MongoDB, Redis, or any other database, you can implement the simple ChatMemoryStore interface to plug in your preferred storage backend. This flexibility means you can use whatever database your team already knows and operates.

    Persistent memory also enables powerful features like conversation history APIs, analytics on user interactions, and compliance with data retention requirements. You can query past conversations, export chat logs for review, build dashboards showing conversation trends, and even implement features like "continue from where you left off" when users return after days or weeks.

    Custom ChatMemoryStore

    Implement the ChatMemoryStore interface to store messages in any backend. The interface is simple — just three methods for getting, updating, and deleting messages:

    JpaChatMemoryStore.java
    importdev.langchain4j.store.memory.chat.ChatMemoryStore;importdev.langchain4j.data.message.ChatMessage;@ComponentpublicclassJpaChatMemoryStoreimplementsChatMemoryStore{privatefinalChatMessageRepository repository;@OverridepublicList<ChatMessage>getMessages(Object memoryId){return repository.findByMemoryIdOrderByCreatedAt(memoryId.toString()).stream().map(this::toChatMessage).toList();}@OverridepublicvoidupdateMessages(Object memoryId,List<ChatMessage> messages){
    repository.deleteByMemoryId(memoryId.toString());
    messages.forEach(msg ->{ChatMessageEntity entity =newChatMessageEntity();
    entity.setMemoryId(memoryId.toString());
    entity.setContent(msg.text());
    entity.setType(msg.type().name());
    entity.setCreatedAt(Instant.now());
    repository.save(entity);});}@OverridepublicvoiddeleteMessages(Object memoryId){
    repository.deleteByMemoryId(memoryId.toString());}}

    Using the Persistent Store

    PersistentMemoryConfig.java
    @ConfigurationpublicclassPersistentMemoryConfig{@BeanpublicChatMemoryProviderchatMemoryProvider(JpaChatMemoryStore store){return memoryId ->MessageWindowChatMemory.builder().id(memoryId).maxMessages(50).chatMemoryStore(store)// Use persistent store.build();}}

    Survives Restarts

    Conversations persist across app deployments

    Scalable

    Multiple instances share the same memory

    Auditable

    Query and analyze conversation history

    Memory Best Practices

    Implementing memory correctly can make or break your AI application's user experience and cost efficiency. Here are battle-tested practices from production deployments that will help you avoid common pitfalls and build robust conversational systems.

    1. Start Small, Scale Up

    Begin with a small message window (10 messages) and increase only if users report context issues. Larger windows mean higher costs and slower responses.

    2. Summarize Long Conversations

    For very long conversations, periodically summarize older messages and store the summary instead of raw messages. This preserves context while reducing token usage.

    3. Implement TTL Policies

    Set expiration times on conversation memory. A user who hasn't chatted in 30 days probably doesn't need their old conversation preserved.

    4. Consider Privacy

    Stored conversations may contain sensitive information. Implement proper access controls, encryption, and provide users the ability to delete their chat history.

    🎉 Level 3 Complete!

    You've mastered memory and conversations! Your AI applications can now maintain context across interactions, handle multiple users, and persist conversations to databases. In the next level, we'll explore Tools and Function Calling — enabling your AI to take actions in the real world.

    💬 Comments & Discussion