What is the best way to learn Java 8?

Start with Lambda Expressions and Functional Interfaces, then progress to Stream API and Optional. Practice with real coding examples and take quizzes to test your understanding.

How do I prepare for Spring Boot interviews?

Focus on core concepts like dependency injection, REST APIs, Spring Data JPA, and Spring Security. Practice with our 100+ Spring Boot quiz questions covering real interview scenarios.

What topics are covered in System Design?

We cover scalability patterns, database design, microservices architecture, distributed systems, caching strategies, API design, and security architecture.

Spring AI Tutorials

Tutorial 01

Core AI Concepts

Understand the fundamental concepts of AI and how they apply to Spring AI development

Understanding Spring AI Core Concepts

Foundation for building AI-powered applications

This tutorial explains the core concepts used in Spring AI. It is recommended to read it carefully to understand its implementation principles. These concepts form the foundation for building intelligent applications with Spring AI.

AI Models

Prompts & Templates

Embeddings & RAG

1Artificial Intelligence Models

AI models are algorithms designed specifically to process and generate information, often mimicking human cognitive functions. By learning patterns and insights from massive datasets, these models can make predictions, generate text, images, or other outputs, enabling cross-industry applications.

AI models come in various types, each suited to specific scenarios. While ChatGPT and its generative AI capabilities have attracted a large user base with their text input/output capabilities, numerous models and companies offer diverse input/output formats.

Model Types by Input/Output

Input	Output	Examples
Text	Text	ChatGPT, Claude, Gemini
Text	Image	DALL-E, Midjourney, Stable Diffusion
Image	Text	GPT-4 Vision, Claude Vision
Text	Audio	OpenAI TTS, ElevenLabs
Audio	Text	Whisper, AssemblyAI
Text	Numeric (Vector)	Text Embeddings (OpenAI, Cohere)

Pre-trained Models

What makes models like GPT unique is their pre-trained nature (the "P" in GPT stands for "pre-trained"). This makes AI a general-purpose development tool that doesn't require a deep background in machine learning or model training.

2Prompts

Prompts are the linguistic input foundation that guides AI models to generate specific outputs. For users familiar with ChatGPT, prompts may simply seem like text entered in a dialog box, but their meaning goes far beyond that. In many AI models, prompt text is not a simple string.

ChatGPT's API allows multiple text inputs to be included in a single prompt, with each input assigned a specific role:

System Role

Used to instruct the model's behavior and set the interaction context

User Role

Typically represents the user's input and questions

Prompt Engineering

The importance of interaction design has given rise to the independent discipline of "Prompt Engineering." Investing time in carefully designing prompts can significantly improve output quality. Research has found that effective prompts like "Take a deep breath and solve this step by step" can dramatically improve model performance.

3Prompt Templates

Creating effective prompts requires establishing the request context and replacing placeholders in the template with specific values entered by the user. Spring AI uses the OSS library StringTemplate to implement this functionality.

Example Prompt Template:

Text Example

Tell me a {adjective} joke about {content}.

In Spring AI, prompt templates can be compared to "views" in the Spring MVC architecture. By providing model objects (usually java.util.Map) to populate placeholders, the "rendered" string constitutes the prompt content sent to the AI model.

Using Prompt Templates in Spring AI:

Java Example

PromptTemplate promptTemplate =newPromptTemplate("Tell me a {adjective} joke about {content}.");Prompt prompt = promptTemplate.create(Map.of("adjective","funny","content","programming"));String response = chatClient.prompt(prompt).call().content();

4Embedding Vectors

Embedded vectors are numerical representations of text, images, or videos that capture the relationships between input content. They achieve this by converting content into arrays of floating-point numbers (vectors).

Semantic Space Visualization

[0.2, 0.8, 0.1...]

"Happy"

[0.25, 0.75, 0.15...]

"Joyful"

[0.9, 0.1, 0.8...]

"Sad"

Similar meanings = Similar vectors = Closer in semantic space

Practical Applications

Embeddings enable text classification, semantic search, and product recommendations by identifying and grouping related concepts based on their "position" in semantic space.

5Tokens

Tokens are the basic unit for AI model operation. The model converts words into tokens during input and converts tokens back into words during output. In English, one token ≈ 0.75 words.

Tokens = Cost

When using managed AI models, cost is determined by token count. Both inputs and outputs count.

Context Window

Models have token limits restricting text processed per API call.

Model Context Windows

Model	Context Window
ChatGPT 3.5	4K tokens
GPT-4	8K / 16K / 32K tokens
Claude	100K+ tokens
Latest Research	1M+ tokens

Reference: Shakespeare's complete works ≈ 900,000 words ≈ 1.2 million tokens

6Structured Output

AI model outputs are traditionally returned as java.lang.String, even when requested in JSON format. The result may be a correct JSON string, but it's not a JSON data structure—it's always a string type.

Spring AI Structured Output:

Java Example

// Define your output structurerecordMovieReview(String title,int rating,String summary,List<String> pros,List<String> cons
){}// Spring AI automatically maps LLM output to your recordMovieReview review = chatClient
.prompt("Review the movie 'Inception'").call().entity(MovieReview.class);// Now use review.title(), review.rating(), etc.

Spring AI Magic

Spring AI automatically adds instructions to direct the LLM to generate responses that can be mapped to your Java objects, handling the complexity for you.

7Retrieval Augmented Generation (RAG)

How can we enable AI models to acquire information beyond their training data? GPT's dataset only extends to its training cutoff, so RAG technology has emerged to address the challenge of integrating relevant data into prompts.

RAG Pipeline Overview

Documents

ETL Process

Vector DB

Semantic Search

AI Response

Three Techniques for Custom Data

1. Fine-tuning

Adjusting model weights. Challenging and resource-intensive for large models.

2. Prompt Stuffing (RAG) ⭐

Embed relevant data into prompts. Spring AI provides full support for this approach.

3. Tool Invocation

Document Segmentation Rules

Split while maintaining semantic boundaries (don't split paragraphs or code mid-way)
Each fragment should be a small percentage of the AI model's token limit

8Tool Invocation

Large Language Models (LLMs) are in a fixed state after training, resulting in outdated knowledge and an inability to access or modify external data. The tool invocation mechanism addresses these shortcomings by connecting LLMs to external system APIs.

Tool Call Flow

Model receives tool definitions with names, descriptions, and parameter schemas

Model decides to invoke a tool and returns tool name + parameters

Application executes the tool with provided parameters

Results returned to model → Model generates final response

Spring AI Tool Example:

Java Example

@ServicepublicclassWeatherService{@Tool(description ="Get current weather for a location")publicWeatherInfogetWeather(@ToolParam(description ="City name")String city,@ToolParam(description ="Country code")String country
){// Call weather API and return resultreturn weatherApiClient.getCurrentWeather(city, country);}}

9Evaluating AI Responses

Effectively evaluating the response output of an AI system is crucial for ensuring the accuracy and usability of the final application. Several emerging technologies support this evaluation using pre-trained models themselves.

Evaluation Metrics

Relevance

Does the response address the query?

Coherence

Is the response logically structured?

Factual Accuracy

Is the information correct?

Self-Evaluation Approach

Submit both the user's request and the AI model's response to the model, then ask whether the response matches the provided data. Vector database information can enhance this evaluation process.

10LLM Interview Questions & Answers

Common interview questions about Large Language Models and AI concepts to help you prepare for technical discussions.

Q1What is tokenization, and why is it important in LLMs?

Tokenization is the process of splitting text into smaller units called tokens, which can be words, subwords, or even characters. For instance, the word "tokenization" might be broken down into smaller subwords like "token" and "ization."

This step is crucial because LLMs do not understand raw text directly. Instead, they process sequences of numbers that represent these tokens. Effective tokenization allows models to:

Handle various languages
Manage rare words
Reduce vocabulary size, improving efficiency and performance

Q2What is LoRA and QLoRA?

LoRA and QLoRA are techniques designed to optimize the fine-tuning of Large Language Models, focusing on reducing memory usage and enhancing efficiency.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning method that introduces new trainable parameters without increasing model size. It works by adding low-rank matrix adaptations to existing layers, allowing significant performance improvements while keeping resource consumption low. Ideal for environments with limited computational resources.

QLoRA (Quantized LoRA)

Builds on LoRA by incorporating quantization (4-bit Normal Float, Double Quantization, Paged Optimizers) to further optimize memory usage. By reducing precision of model weights (e.g., from 16-bit to 4-bit) while retaining accuracy, QLoRA enables fine-tuning with minimal memory footprint.

Q3What is beam search, and how does it differ from greedy decoding?

Beam search is a search algorithm used during text generation to find the most likely sequence of words.

Greedy Decoding

Chooses the single highest-probability word at each step

Beam Search

Explores multiple possible sequences in parallel, maintaining top k candidates (beams)

Beam search balances between finding high-probability sequences and exploring alternative paths, leading to more coherent and contextually appropriate outputs.

Q4Explain the concept of temperature in LLM text generation.

Temperature is a hyperparameter that controls the randomness of text generation by adjusting the probability distribution over possible next tokens.

Highly deterministic

0.7

Balanced

More diverse/creative

Q5What are Sequence-to-Sequence Models?

Sequence-to-Sequence (Seq2Seq) Models are neural network architectures designed to transform one sequence of data into another. They're used for tasks with variable-length inputs and outputs:

Machine Translation

English → Spanish

Text Summarization

Article → Summary

Chatbots

Query → Response

Q6What role do embeddings play in LLMs, and how are they initialized?

Embeddings are dense, continuous vector representations of tokens that capture semantic and syntactic information. They map discrete tokens (words or subwords) into a high-dimensional space suitable for neural network input.

Initialization methods:

Randomly initialized
Pretrained vectors like Word2Vec or GloVe

During training, embeddings are fine-tuned to capture task-specific nuances, enhancing model performance.

Q7What is Next Sentence Prediction and how is it useful in language modelling?

Next Sentence Prediction (NSP) is a technique used in training models like BERT. It helps models understand relationships between two sentences—crucial for question answering, dialogue generation, and information retrieval.

Training process:

50% of time: Second sentence is the actual next sentence (positive pairs)
50% of time: Second sentence is random from corpus (negative pairs)

The model learns to classify whether the second sentence correctly follows the first.

Q8How does prompt engineering influence the output of LLMs?

Prompt engineering involves crafting input prompts to guide an LLM's output effectively. Since LLMs are highly sensitive to input phrasing, a well-designed prompt can significantly influence response quality and relevance.

Key benefits:

Adding context improves accuracy
Specific instructions enhance task performance
Essential for zero-shot and few-shot learning scenarios

Q9What is overfitting in machine learning, and how can it be prevented?

Overfitting occurs when a model performs well on training data but poorly on unseen data. The model learns noise and outliers, making it too tailored to the training set.

Prevention techniques:

Regularization (L1, L2)

Add penalty to loss function

Dropout

Randomly deactivate neurons

Data Augmentation

Expand training dataset

Early Stopping

Stop when validation loss plateaus

Q10What are Generative and Discriminative models?

Generative Models

Learn underlying data distribution and generate new samples. Model joint probability P(X,Y). Example: Language models predicting next word.

Discriminative Models

Learn decision boundaries between classes. Model conditional probability P(Y|X). Example: Sentiment analysis classifying text.

In short: Generative models generate data, discriminative models classify it.

Q11How is GPT-4 different from GPT-3 in terms of capabilities?

Feature	GPT-3	GPT-4
Parameters	175 billion	~1 trillion
Modality	Text only	Text + Images
Context Window	4,096 tokens	Up to 25,000 tokens
Accuracy	Standard	Improved, less hallucination
Languages	Limited multilingual	26+ languages with higher accuracy

Key Concepts Summary

AI Models

Pre-trained algorithms for various I/O types

Prompts & Templates

Dynamic prompt generation

Embeddings

Vector representations for search

Tokens

Basic unit = cost + limits

RAG

Integrate custom data

Tools

Connect to external APIs

Core AI Concepts

Understanding Spring AI Core Concepts

1Artificial Intelligence Models

Model Types by Input/Output

2Prompts

System Role

User Role

3Prompt Templates

Example Prompt Template:

Using Prompt Templates in Spring AI:

4Embedding Vectors

Semantic Space Visualization

5Tokens

Tokens = Cost

Context Window

Model Context Windows

6Structured Output

Spring AI Structured Output:

7Retrieval Augmented Generation (RAG)

RAG Pipeline Overview

Three Techniques for Custom Data

1. Fine-tuning

2. Prompt Stuffing (RAG) ⭐

3. Tool Invocation

Document Segmentation Rules

8Tool Invocation

Tool Call Flow

Spring AI Tool Example:

9Evaluating AI Responses

Evaluation Metrics

Relevance

Coherence

Factual Accuracy

10LLM Interview Questions & Answers

Q1What is tokenization, and why is it important in LLMs?

Q2What is LoRA and QLoRA?

LoRA (Low-Rank Adaptation)

QLoRA (Quantized LoRA)

Q3What is beam search, and how does it differ from greedy decoding?

Greedy Decoding

Beam Search

Q4Explain the concept of temperature in LLM text generation.

Q5What are Sequence-to-Sequence Models?

Q6What role do embeddings play in LLMs, and how are they initialized?

Q7What is Next Sentence Prediction and how is it useful in language modelling?

Q8How does prompt engineering influence the output of LLMs?

Q9What is overfitting in machine learning, and how can it be prevented?

Q10What are Generative and Discriminative models?

Generative Models

Discriminative Models

Q11How is GPT-4 different from GPT-3 in terms of capabilities?

Key Concepts Summary

AI Models

Prompts & Templates

Embeddings

Tokens

RAG

Tools

💬 Comments & Discussion