Spring AI Tutorials
    Tutorial 01

    Core AI Concepts

    Understand the fundamental concepts of AI and how they apply to Spring AI development

    Understanding Spring AI Core Concepts

    Foundation for building AI-powered applications

    This tutorial explains the core concepts used in Spring AI. It is recommended to read it carefully to understand its implementation principles. These concepts form the foundation for building intelligent applications with Spring AI.

    AI Models

    Prompts & Templates

    Embeddings & RAG

    1Artificial Intelligence Models

    AI models are algorithms designed specifically to process and generate information, often mimicking human cognitive functions. By learning patterns and insights from massive datasets, these models can make predictions, generate text, images, or other outputs, enabling cross-industry applications.

    AI models come in various types, each suited to specific scenarios. While ChatGPT and its generative AI capabilities have attracted a large user base with their text input/output capabilities, numerous models and companies offer diverse input/output formats.

    Model Types by Input/Output

    InputOutputExamples
    TextTextChatGPT, Claude, Gemini
    TextImageDALL-E, Midjourney, Stable Diffusion
    ImageTextGPT-4 Vision, Claude Vision
    TextAudioOpenAI TTS, ElevenLabs
    AudioTextWhisper, AssemblyAI
    TextNumeric (Vector)Text Embeddings (OpenAI, Cohere)

    Pre-trained Models

    What makes models like GPT unique is their pre-trained nature (the "P" in GPT stands for "pre-trained"). This makes AI a general-purpose development tool that doesn't require a deep background in machine learning or model training.

    2Prompts

    Prompts are the linguistic input foundation that guides AI models to generate specific outputs. For users familiar with ChatGPT, prompts may simply seem like text entered in a dialog box, but their meaning goes far beyond that. In many AI models, prompt text is not a simple string.

    ChatGPT's API allows multiple text inputs to be included in a single prompt, with each input assigned a specific role:

    System Role

    Used to instruct the model's behavior and set the interaction context

    User Role

    Typically represents the user's input and questions

    Prompt Engineering

    The importance of interaction design has given rise to the independent discipline of "Prompt Engineering." Investing time in carefully designing prompts can significantly improve output quality. Research has found that effective prompts like "Take a deep breath and solve this step by step" can dramatically improve model performance.

    3Prompt Templates

    Creating effective prompts requires establishing the request context and replacing placeholders in the template with specific values entered by the user. Spring AI uses the OSS library StringTemplate to implement this functionality.

    Example Prompt Template:

    Text Example
    Tell me a {adjective} joke about {content}.

    In Spring AI, prompt templates can be compared to "views" in the Spring MVC architecture. By providing model objects (usually java.util.Map) to populate placeholders, the "rendered" string constitutes the prompt content sent to the AI model.

    Using Prompt Templates in Spring AI:

    Java Example
    PromptTemplate promptTemplate =newPromptTemplate("Tell me a {adjective} joke about {content}.");Prompt prompt = promptTemplate.create(Map.of("adjective","funny","content","programming"));String response = chatClient.prompt(prompt).call().content();

    4Embedding Vectors

    Embedded vectors are numerical representations of text, images, or videos that capture the relationships between input content. They achieve this by converting content into arrays of floating-point numbers (vectors).

    Semantic Space Visualization

    [0.2, 0.8, 0.1...]

    "Happy"

    [0.25, 0.75, 0.15...]

    "Joyful"

    [0.9, 0.1, 0.8...]

    "Sad"

    Similar meanings = Similar vectors = Closer in semantic space

    Practical Applications

    Embeddings enable text classification, semantic search, and product recommendations by identifying and grouping related concepts based on their "position" in semantic space.

    5Tokens

    Tokens are the basic unit for AI model operation. The model converts words into tokens during input and converts tokens back into words during output. In English, one token ≈ 0.75 words.

    Tokens = Cost

    When using managed AI models, cost is determined by token count. Both inputs and outputs count.

    Context Window

    Models have token limits restricting text processed per API call.

    Model Context Windows

    ModelContext Window
    ChatGPT 3.54K tokens
    GPT-48K / 16K / 32K tokens
    Claude100K+ tokens
    Latest Research1M+ tokens

    Reference: Shakespeare's complete works ≈ 900,000 words ≈ 1.2 million tokens

    6Structured Output

    AI model outputs are traditionally returned as java.lang.String, even when requested in JSON format. The result may be a correct JSON string, but it's not a JSON data structure—it's always a string type.

    Spring AI Structured Output:

    Java Example
    // Define your output structurerecordMovieReview(String title,int rating,String summary,List<String> pros,List<String> cons
    ){}// Spring AI automatically maps LLM output to your recordMovieReview review = chatClient
    .prompt("Review the movie 'Inception'").call().entity(MovieReview.class);// Now use review.title(), review.rating(), etc.

    Spring AI Magic

    Spring AI automatically adds instructions to direct the LLM to generate responses that can be mapped to your Java objects, handling the complexity for you.

    7Retrieval Augmented Generation (RAG)

    How can we enable AI models to acquire information beyond their training data? GPT's dataset only extends to its training cutoff, so RAG technology has emerged to address the challenge of integrating relevant data into prompts.

    RAG Pipeline Overview

    Documents

    ETL Process

    Vector DB

    Semantic Search

    AI Response

    Three Techniques for Custom Data

    1. Fine-tuning

    Adjusting model weights. Challenging and resource-intensive for large models.

    2. Prompt Stuffing (RAG) ⭐

    Embed relevant data into prompts. Spring AI provides full support for this approach.

    3. Tool Invocation

    Register tools to connect LLMs with external APIs for real-time data.

    Document Segmentation Rules

    • Split while maintaining semantic boundaries (don't split paragraphs or code mid-way)
    • Each fragment should be a small percentage of the AI model's token limit

    8Tool Invocation

    Large Language Models (LLMs) are in a fixed state after training, resulting in outdated knowledge and an inability to access or modify external data. The tool invocation mechanism addresses these shortcomings by connecting LLMs to external system APIs.

    Tool Call Flow

    Model receives tool definitions with names, descriptions, and parameter schemas

    Model decides to invoke a tool and returns tool name + parameters

    Application executes the tool with provided parameters

    Results returned to model → Model generates final response

    Spring AI Tool Example:

    Java Example
    @ServicepublicclassWeatherService{@Tool(description ="Get current weather for a location")publicWeatherInfogetWeather(@ToolParam(description ="City name")String city,@ToolParam(description ="Country code")String country
    ){// Call weather API and return resultreturn weatherApiClient.getCurrentWeather(city, country);}}

    9Evaluating AI Responses

    Effectively evaluating the response output of an AI system is crucial for ensuring the accuracy and usability of the final application. Several emerging technologies support this evaluation using pre-trained models themselves.

    Evaluation Metrics

    Relevance

    Does the response address the query?

    Coherence

    Is the response logically structured?

    Factual Accuracy

    Is the information correct?

    Self-Evaluation Approach

    Submit both the user's request and the AI model's response to the model, then ask whether the response matches the provided data. Vector database information can enhance this evaluation process.

    10LLM Interview Questions & Answers

    Common interview questions about Large Language Models and AI concepts to help you prepare for technical discussions.

    Q1What is tokenization, and why is it important in LLMs?

    Tokenization is the process of splitting text into smaller units called tokens, which can be words, subwords, or even characters. For instance, the word "tokenization" might be broken down into smaller subwords like "token" and "ization."

    This step is crucial because LLMs do not understand raw text directly. Instead, they process sequences of numbers that represent these tokens. Effective tokenization allows models to:

    • Handle various languages
    • Manage rare words
    • Reduce vocabulary size, improving efficiency and performance
    Q2What is LoRA and QLoRA?

    LoRA and QLoRA are techniques designed to optimize the fine-tuning of Large Language Models, focusing on reducing memory usage and enhancing efficiency.

    LoRA (Low-Rank Adaptation)

    A parameter-efficient fine-tuning method that introduces new trainable parameters without increasing model size. It works by adding low-rank matrix adaptations to existing layers, allowing significant performance improvements while keeping resource consumption low. Ideal for environments with limited computational resources.

    QLoRA (Quantized LoRA)

    Builds on LoRA by incorporating quantization (4-bit Normal Float, Double Quantization, Paged Optimizers) to further optimize memory usage. By reducing precision of model weights (e.g., from 16-bit to 4-bit) while retaining accuracy, QLoRA enables fine-tuning with minimal memory footprint.

    Q3What is beam search, and how does it differ from greedy decoding?

    Beam search is a search algorithm used during text generation to find the most likely sequence of words.

    Greedy Decoding

    Chooses the single highest-probability word at each step

    Beam Search

    Explores multiple possible sequences in parallel, maintaining top k candidates (beams)

    Beam search balances between finding high-probability sequences and exploring alternative paths, leading to more coherent and contextually appropriate outputs.

    Q4Explain the concept of temperature in LLM text generation.

    Temperature is a hyperparameter that controls the randomness of text generation by adjusting the probability distribution over possible next tokens.

    ~0

    Highly deterministic

    0.7

    Balanced

    >1

    More diverse/creative

    Q5What are Sequence-to-Sequence Models?

    Sequence-to-Sequence (Seq2Seq) Models are neural network architectures designed to transform one sequence of data into another. They're used for tasks with variable-length inputs and outputs:

    Machine Translation

    English → Spanish

    Text Summarization

    Article → Summary

    Chatbots

    Query → Response

    Q6What role do embeddings play in LLMs, and how are they initialized?

    Embeddings are dense, continuous vector representations of tokens that capture semantic and syntactic information. They map discrete tokens (words or subwords) into a high-dimensional space suitable for neural network input.

    Initialization methods:

    • Randomly initialized
    • Pretrained vectors like Word2Vec or GloVe

    During training, embeddings are fine-tuned to capture task-specific nuances, enhancing model performance.

    Q7What is Next Sentence Prediction and how is it useful in language modelling?

    Next Sentence Prediction (NSP) is a technique used in training models like BERT. It helps models understand relationships between two sentences—crucial for question answering, dialogue generation, and information retrieval.

    Training process:

    • 50% of time: Second sentence is the actual next sentence (positive pairs)
    • 50% of time: Second sentence is random from corpus (negative pairs)

    The model learns to classify whether the second sentence correctly follows the first.

    Q8How does prompt engineering influence the output of LLMs?

    Prompt engineering involves crafting input prompts to guide an LLM's output effectively. Since LLMs are highly sensitive to input phrasing, a well-designed prompt can significantly influence response quality and relevance.

    Key benefits:

    • Adding context improves accuracy
    • Specific instructions enhance task performance
    • Essential for zero-shot and few-shot learning scenarios
    Q9What is overfitting in machine learning, and how can it be prevented?

    Overfitting occurs when a model performs well on training data but poorly on unseen data. The model learns noise and outliers, making it too tailored to the training set.

    Prevention techniques:

    Regularization (L1, L2)

    Add penalty to loss function

    Dropout

    Randomly deactivate neurons

    Data Augmentation

    Expand training dataset

    Early Stopping

    Stop when validation loss plateaus

    Q10What are Generative and Discriminative models?
    Generative Models

    Learn underlying data distribution and generate new samples. Model joint probability P(X,Y). Example: Language models predicting next word.

    Discriminative Models

    Learn decision boundaries between classes. Model conditional probability P(Y|X). Example: Sentiment analysis classifying text.

    In short: Generative models generate data, discriminative models classify it.

    Q11How is GPT-4 different from GPT-3 in terms of capabilities?
    FeatureGPT-3GPT-4
    Parameters175 billion~1 trillion
    ModalityText onlyText + Images
    Context Window4,096 tokensUp to 25,000 tokens
    AccuracyStandardImproved, less hallucination
    LanguagesLimited multilingual26+ languages with higher accuracy

    Key Concepts Summary

    AI Models

    Pre-trained algorithms for various I/O types

    Prompts & Templates

    Dynamic prompt generation

    Embeddings

    Vector representations for search

    Tokens

    Basic unit = cost + limits

    RAG

    Integrate custom data

    Tools

    Connect to external APIs

    💬 Comments & Discussion