Core AI Concepts
Understand the fundamental concepts of AI and how they apply to Spring AI development
Understanding Spring AI Core Concepts
Foundation for building AI-powered applications
This tutorial explains the core concepts used in Spring AI. It is recommended to read it carefully to understand its implementation principles. These concepts form the foundation for building intelligent applications with Spring AI.
AI Models
Prompts & Templates
Embeddings & RAG
1Artificial Intelligence Models
AI models are algorithms designed specifically to process and generate information, often mimicking human cognitive functions. By learning patterns and insights from massive datasets, these models can make predictions, generate text, images, or other outputs, enabling cross-industry applications.
AI models come in various types, each suited to specific scenarios. While ChatGPT and its generative AI capabilities have attracted a large user base with their text input/output capabilities, numerous models and companies offer diverse input/output formats.
Model Types by Input/Output
| Input | Output | Examples |
|---|---|---|
| Text | Text | ChatGPT, Claude, Gemini |
| Text | Image | DALL-E, Midjourney, Stable Diffusion |
| Image | Text | GPT-4 Vision, Claude Vision |
| Text | Audio | OpenAI TTS, ElevenLabs |
| Audio | Text | Whisper, AssemblyAI |
| Text | Numeric (Vector) | Text Embeddings (OpenAI, Cohere) |
Pre-trained Models
What makes models like GPT unique is their pre-trained nature (the "P" in GPT stands for "pre-trained"). This makes AI a general-purpose development tool that doesn't require a deep background in machine learning or model training.
2Prompts
Prompts are the linguistic input foundation that guides AI models to generate specific outputs. For users familiar with ChatGPT, prompts may simply seem like text entered in a dialog box, but their meaning goes far beyond that. In many AI models, prompt text is not a simple string.
ChatGPT's API allows multiple text inputs to be included in a single prompt, with each input assigned a specific role:
System Role
Used to instruct the model's behavior and set the interaction context
User Role
Typically represents the user's input and questions
Prompt Engineering
The importance of interaction design has given rise to the independent discipline of "Prompt Engineering." Investing time in carefully designing prompts can significantly improve output quality. Research has found that effective prompts like "Take a deep breath and solve this step by step" can dramatically improve model performance.
3Prompt Templates
Creating effective prompts requires establishing the request context and replacing placeholders in the template with specific values entered by the user. Spring AI uses the OSS library StringTemplate to implement this functionality.
Example Prompt Template:
Tell me a {adjective} joke about {content}.In Spring AI, prompt templates can be compared to "views" in the Spring MVC architecture. By providing model objects (usually java.util.Map) to populate placeholders, the "rendered" string constitutes the prompt content sent to the AI model.
Using Prompt Templates in Spring AI:
PromptTemplate promptTemplate =newPromptTemplate("Tell me a {adjective} joke about {content}.");Prompt prompt = promptTemplate.create(Map.of("adjective","funny","content","programming"));String response = chatClient.prompt(prompt).call().content();4Embedding Vectors
Embedded vectors are numerical representations of text, images, or videos that capture the relationships between input content. They achieve this by converting content into arrays of floating-point numbers (vectors).
Semantic Space Visualization
[0.2, 0.8, 0.1...]
"Happy"
[0.25, 0.75, 0.15...]
"Joyful"
[0.9, 0.1, 0.8...]
"Sad"
Similar meanings = Similar vectors = Closer in semantic space
Practical Applications
Embeddings enable text classification, semantic search, and product recommendations by identifying and grouping related concepts based on their "position" in semantic space.
5Tokens
Tokens are the basic unit for AI model operation. The model converts words into tokens during input and converts tokens back into words during output. In English, one token ≈ 0.75 words.
Tokens = Cost
When using managed AI models, cost is determined by token count. Both inputs and outputs count.
Context Window
Models have token limits restricting text processed per API call.
Model Context Windows
| Model | Context Window |
|---|---|
| ChatGPT 3.5 | 4K tokens |
| GPT-4 | 8K / 16K / 32K tokens |
| Claude | 100K+ tokens |
| Latest Research | 1M+ tokens |
Reference: Shakespeare's complete works ≈ 900,000 words ≈ 1.2 million tokens
6Structured Output
AI model outputs are traditionally returned as java.lang.String, even when requested in JSON format. The result may be a correct JSON string, but it's not a JSON data structure—it's always a string type.
Spring AI Structured Output:
// Define your output structurerecordMovieReview(String title,int rating,String summary,List<String> pros,List<String> cons
){}// Spring AI automatically maps LLM output to your recordMovieReview review = chatClient
.prompt("Review the movie 'Inception'").call().entity(MovieReview.class);// Now use review.title(), review.rating(), etc.Spring AI Magic
Spring AI automatically adds instructions to direct the LLM to generate responses that can be mapped to your Java objects, handling the complexity for you.
7Retrieval Augmented Generation (RAG)
How can we enable AI models to acquire information beyond their training data? GPT's dataset only extends to its training cutoff, so RAG technology has emerged to address the challenge of integrating relevant data into prompts.
RAG Pipeline Overview
Documents
ETL Process
Vector DB
Semantic Search
AI Response
Three Techniques for Custom Data
1. Fine-tuning
Adjusting model weights. Challenging and resource-intensive for large models.
2. Prompt Stuffing (RAG) ⭐
Embed relevant data into prompts. Spring AI provides full support for this approach.
3. Tool Invocation
Register tools to connect LLMs with external APIs for real-time data.
Document Segmentation Rules
- Split while maintaining semantic boundaries (don't split paragraphs or code mid-way)
- Each fragment should be a small percentage of the AI model's token limit
8Tool Invocation
Large Language Models (LLMs) are in a fixed state after training, resulting in outdated knowledge and an inability to access or modify external data. The tool invocation mechanism addresses these shortcomings by connecting LLMs to external system APIs.
Tool Call Flow
Model receives tool definitions with names, descriptions, and parameter schemas
Model decides to invoke a tool and returns tool name + parameters
Application executes the tool with provided parameters
Results returned to model → Model generates final response
Spring AI Tool Example:
@ServicepublicclassWeatherService{@Tool(description ="Get current weather for a location")publicWeatherInfogetWeather(@ToolParam(description ="City name")String city,@ToolParam(description ="Country code")String country
){// Call weather API and return resultreturn weatherApiClient.getCurrentWeather(city, country);}}9Evaluating AI Responses
Effectively evaluating the response output of an AI system is crucial for ensuring the accuracy and usability of the final application. Several emerging technologies support this evaluation using pre-trained models themselves.
Evaluation Metrics
Relevance
Does the response address the query?
Coherence
Is the response logically structured?
Factual Accuracy
Is the information correct?
Self-Evaluation Approach
Submit both the user's request and the AI model's response to the model, then ask whether the response matches the provided data. Vector database information can enhance this evaluation process.
10LLM Interview Questions & Answers
Common interview questions about Large Language Models and AI concepts to help you prepare for technical discussions.
Q1What is tokenization, and why is it important in LLMs?
Tokenization is the process of splitting text into smaller units called tokens, which can be words, subwords, or even characters. For instance, the word "tokenization" might be broken down into smaller subwords like "token" and "ization."
This step is crucial because LLMs do not understand raw text directly. Instead, they process sequences of numbers that represent these tokens. Effective tokenization allows models to:
- Handle various languages
- Manage rare words
- Reduce vocabulary size, improving efficiency and performance
Q2What is LoRA and QLoRA?
LoRA and QLoRA are techniques designed to optimize the fine-tuning of Large Language Models, focusing on reducing memory usage and enhancing efficiency.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that introduces new trainable parameters without increasing model size. It works by adding low-rank matrix adaptations to existing layers, allowing significant performance improvements while keeping resource consumption low. Ideal for environments with limited computational resources.
QLoRA (Quantized LoRA)
Builds on LoRA by incorporating quantization (4-bit Normal Float, Double Quantization, Paged Optimizers) to further optimize memory usage. By reducing precision of model weights (e.g., from 16-bit to 4-bit) while retaining accuracy, QLoRA enables fine-tuning with minimal memory footprint.
Q3What is beam search, and how does it differ from greedy decoding?
Beam search is a search algorithm used during text generation to find the most likely sequence of words.
Greedy Decoding
Chooses the single highest-probability word at each step
Beam Search
Explores multiple possible sequences in parallel, maintaining top k candidates (beams)
Beam search balances between finding high-probability sequences and exploring alternative paths, leading to more coherent and contextually appropriate outputs.
Q4Explain the concept of temperature in LLM text generation.
Temperature is a hyperparameter that controls the randomness of text generation by adjusting the probability distribution over possible next tokens.
~0
Highly deterministic
0.7
Balanced
>1
More diverse/creative
Q5What are Sequence-to-Sequence Models?
Sequence-to-Sequence (Seq2Seq) Models are neural network architectures designed to transform one sequence of data into another. They're used for tasks with variable-length inputs and outputs:
Machine Translation
English → Spanish
Text Summarization
Article → Summary
Chatbots
Query → Response
Q6What role do embeddings play in LLMs, and how are they initialized?
Embeddings are dense, continuous vector representations of tokens that capture semantic and syntactic information. They map discrete tokens (words or subwords) into a high-dimensional space suitable for neural network input.
Initialization methods:
- Randomly initialized
- Pretrained vectors like Word2Vec or GloVe
During training, embeddings are fine-tuned to capture task-specific nuances, enhancing model performance.
Q7What is Next Sentence Prediction and how is it useful in language modelling?
Next Sentence Prediction (NSP) is a technique used in training models like BERT. It helps models understand relationships between two sentences—crucial for question answering, dialogue generation, and information retrieval.
Training process:
- 50% of time: Second sentence is the actual next sentence (positive pairs)
- 50% of time: Second sentence is random from corpus (negative pairs)
The model learns to classify whether the second sentence correctly follows the first.
Q8How does prompt engineering influence the output of LLMs?
Prompt engineering involves crafting input prompts to guide an LLM's output effectively. Since LLMs are highly sensitive to input phrasing, a well-designed prompt can significantly influence response quality and relevance.
Key benefits:
- Adding context improves accuracy
- Specific instructions enhance task performance
- Essential for zero-shot and few-shot learning scenarios
Q9What is overfitting in machine learning, and how can it be prevented?
Overfitting occurs when a model performs well on training data but poorly on unseen data. The model learns noise and outliers, making it too tailored to the training set.
Prevention techniques:
Add penalty to loss function
Randomly deactivate neurons
Expand training dataset
Stop when validation loss plateaus
Q10What are Generative and Discriminative models?
Generative Models
Learn underlying data distribution and generate new samples. Model joint probability P(X,Y). Example: Language models predicting next word.
Discriminative Models
Learn decision boundaries between classes. Model conditional probability P(Y|X). Example: Sentiment analysis classifying text.
In short: Generative models generate data, discriminative models classify it.
Q11How is GPT-4 different from GPT-3 in terms of capabilities?
| Feature | GPT-3 | GPT-4 |
|---|---|---|
| Parameters | 175 billion | ~1 trillion |
| Modality | Text only | Text + Images |
| Context Window | 4,096 tokens | Up to 25,000 tokens |
| Accuracy | Standard | Improved, less hallucination |
| Languages | Limited multilingual | 26+ languages with higher accuracy |
Key Concepts Summary
AI Models
Pre-trained algorithms for various I/O types
Prompts & Templates
Dynamic prompt generation
Embeddings
Vector representations for search
Tokens
Basic unit = cost + limits
RAG
Integrate custom data
Tools
Connect to external APIs