What is the best way to learn Java 8?

Start with Lambda Expressions and Functional Interfaces, then progress to Stream API and Optional. Practice with real coding examples and take quizzes to test your understanding.

How do I prepare for Spring Boot interviews?

Focus on core concepts like dependency injection, REST APIs, Spring Data JPA, and Spring Security. Practice with our 100+ Spring Boot quiz questions covering real interview scenarios.

What topics are covered in System Design?

We cover scalability patterns, database design, microservices architecture, distributed systems, caching strategies, API design, and security architecture.

Spring AI Tutorials

Tutorial 12

Streaming Response in Spring AI ChatClient

Implement real-time streaming responses for better user experience using Flux and Server-Sent Events

1
Why Use Streaming?

When an LLM generates a response, it produces tokens one at a time. Streaming sends these tokens to the client as they're generated, rather than waiting for the complete response.

Without Streaming

User waits 5-30 seconds staring at a loading spinner until the entire response is ready

With Streaming

User sees words appearing in real-time, like watching someone type — feels instant and interactive

Perceived Speed

Response feels immediate

Better UX

ChatGPT-like experience

Early Cancel

Stop if response is wrong

2
Basic Streaming with ChatClient

Using stream() Instead of call()

Replace .call() with .stream() to get a reactive Flux:

Java Example

importorg.springframework.ai.chat.client.ChatClient;importreactor.core.publisher.Flux;@ServicepublicclassStreamingChatService{privatefinalChatClient chatClient;publicStreamingChatService(ChatClient.Builder builder){this.chatClient = builder.build();}// Non-streaming (blocks until complete)publicStringchat(String message){return chatClient.prompt().user(message).call().content();// Returns complete String}// Streaming (emits tokens as they arrive)publicFlux<String>chatStream(String message){return chatClient.prompt().user(message).stream().content();// Returns Flux<String>}}

Flux Explained

Flux<String> is a Reactor type representing 0 to N items emitted over time. Each emitted item is a token (word or word fragment) from the LLM.

3
REST Endpoint with Server-Sent Events

Create a Streaming Controller

Use MediaType.TEXT_EVENT_STREAM_VALUE for SSE:

Java Example

importorg.springframework.http.MediaType;importorg.springframework.web.bind.annotation.*;importreactor.core.publisher.Flux;@RestController@RequestMapping("/api/chat")@RequiredArgsConstructorpublicclassStreamingChatController{privatefinalStreamingChatService chatService;// Standard endpoint (waits for complete response)@PostMappingpublicStringchat(@RequestBodyChatRequest request){return chatService.chat(request.message());}// Streaming endpoint using Server-Sent Events@PostMapping(value ="/stream", produces =MediaType.TEXT_EVENT_STREAM_VALUE)publicFlux<String>chatStream(@RequestBodyChatRequest request){return chatService.chatStream(request.message());}}recordChatRequest(String message){}

Frontend JavaScript Consumer

Javascript Example

// Using EventSource for SSEasyncfunctionstreamChat(message){const response =awaitfetch('/api/chat/stream',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({ message })});const reader = response.body.getReader();const decoder =newTextDecoder();let fullResponse ='';while(true){const{ done, value }=await reader.read();if(done)break;const chunk = decoder.decode(value);
fullResponse += chunk;// Update UI with each chunk
document.getElementById('response').textContent = fullResponse;}}

Real-Time Effect

The response text will appear word by word, creating the familiar ChatGPT-style typing effect.

4
Advanced Streaming Patterns

Access Full ChatResponse Metadata

Get token usage, finish reason, and other metadata from streaming responses:

Java Example

publicFlux<ChatResponse>chatStreamWithMetadata(String message){return chatClient.prompt().user(message).stream().chatResponse();// Returns Flux<ChatResponse> with full metadata}// Process responses with metadatapublicvoidprocessStream(String message){chatStreamWithMetadata(message).doOnNext(response ->{// Access contentString content = response.getResult().getOutput().getContent();// Access metadata (available on final chunk)var metadata = response.getMetadata();if(metadata !=null){Long promptTokens = metadata.getUsage().getPromptTokens();Long completionTokens = metadata.getUsage().getGenerationTokens();System.out.println("Tokens used: "+(promptTokens + completionTokens));}}).subscribe();}

Buffering for Word Boundaries

Buffer tokens to emit complete words instead of fragments:

Java Example

publicFlux<String>streamByWords(String message){return chatClient.prompt().user(message).stream().content().bufferUntil(token -> token.contains(" ")|| token.contains("\n")).map(tokens ->String.join("", tokens));}

Timeout and Error Handling

Java Example

publicFlux<String>streamWithTimeout(String message){return chatClient.prompt().user(message).stream().content().timeout(Duration.ofSeconds(60)).onErrorResume(TimeoutException.class, e ->Flux.just("[Response timed out]")).onErrorResume(Exception.class, e ->Flux.just("[Error: "+ e.getMessage()+"]"));}

5
Best Practices

Always Set Timeouts

Prevent hanging connections with proper timeout configuration.

Handle Client Disconnection

Use .takeUntilOther() to cancel when client disconnects.

Backpressure Handling

Use .onBackpressureBuffer() for slow consumers.

Memory with Tools

Chat memory advisors may not work with streaming — test thoroughly.

WebSocket Alternative

For bidirectional communication (e.g., cancellation signals), consider WebSocket instead of SSE:

Java Example

@ControllerpublicclassChatWebSocketHandlerimplementsWebSocketHandler{privatefinalChatClient chatClient;@OverridepublicMono<Void>handle(WebSocketSession session){return session.receive().flatMap(message ->{String userMessage = message.getPayloadAsText();return chatClient.prompt().user(userMessage).stream().content().map(session::textMessage).as(session::send);}).then();}}

What You've Learned

Streaming Benefits

Real-time UX, early cancellation

Basic Streaming

.stream() returns Flux<String>

SSE Endpoints

TEXT_EVENT_STREAM media type

Frontend Integration

Fetch API with ReadableStream

Advanced Patterns

Buffering, metadata, timeouts

Best Practices

Error handling, backpressure

Streaming Response in Spring AI ChatClient

1Why Use Streaming?

Without Streaming

With Streaming

2Basic Streaming with ChatClient

Using stream() Instead of call()

3REST Endpoint with Server-Sent Events

Create a Streaming Controller

Frontend JavaScript Consumer

4Advanced Streaming Patterns

Access Full ChatResponse Metadata

Buffering for Word Boundaries

Timeout and Error Handling

5Best Practices

Always Set Timeouts

Handle Client Disconnection

Backpressure Handling

Memory with Tools

WebSocket Alternative

What You've Learned

💬 Comments & Discussion

1
Why Use Streaming?

2
Basic Streaming with ChatClient

3
REST Endpoint with Server-Sent Events

4
Advanced Streaming Patterns

5
Best Practices