Spring AI Tutorials
    Tutorial 03

    Generative AI & LLMs

    A Friendly Family Get-Together

    Explore the AI landscape from history to embeddings, understanding how LLMs process and generate text

    1
    The Journey to Generative AI

    A Brief History: From Rule-Based Systems to Generative AI

    1950s

    Alan Turing proposes the Turing Test; early rule-based systems emerge

    1980s

    Expert systems dominate; backpropagation revives neural networks

    2012

    AlexNet wins ImageNet; deep learning renaissance begins

    2017

    "Attention Is All You Need" paper introduces Transformer architecture

    2022+

    ChatGPT, GPT-4, Claude, and multimodal AI revolutionize the field

    Key Milestones That Shaped Modern AI

    ๐ŸŽฏ The Perceptron (1958)

    First neural network capable of learning, though limited to linear problems.

    ๐Ÿ”„ Backpropagation (1986)

    Efficient training algorithm that made deep networks possible.

    ๐Ÿ“Š Word2Vec (2013)

    Word embeddings that capture semantic relationships.

    ๐Ÿค– BERT & GPT (2018)

    Transformer-based models that revolutionized NLP.

    AI Evolution Timeline

    1950s

    Rule-Based

    1980s

    Expert Systems

    2012

    Deep Learning

    2017

    Transformers

    2022+

    ChatGPT Era

    2
    Meet the AI Family

    Understanding the hierarchy of AI technologies helps clarify where LLMs fit in the broader landscape.

    ๐Ÿง  Artificial Intelligence (AI)

    The broadest category: any system that can perform tasks requiring human-like intelligence.

    โš™๏ธ Machine Learning (ML)

    AI that learns patterns from data without explicit programming.

    ๐Ÿ”ฎ Deep Learning (DL)

    ML using neural networks with multiple layers.

    โœจ Generative AI

    DL models that can create new content (text, images, audio, code).

    Supervised Learning

    Learns from labeled examples (classification, regression)

    Unsupervised Learning

    Finds patterns in unlabeled data (clustering, dimensionality reduction)

    Reinforcement Learning

    Learns by trial and error with rewards (games, robotics)

    3
    Generative AI Models Explained

    Generative AI encompasses various model architectures, each with unique strengths:

    GANs (Generative Adversarial Networks)

    Two networks compete: a generator creates content, a discriminator evaluates it.

    Image Generation, Style Transfer

    VAEs (Variational Autoencoders)

    Encode data into a latent space, then decode to generate new variations.

    Data Compression, Anomaly Detection

    Diffusion Models

    Learn to reverse a noise process, gradually refining random noise into coherent output.

    DALL-E, Stable Diffusion, Midjourney

    Transformers

    Use attention mechanisms to process sequences in parallel, excelling at language tasks.

    GPT, Claude, Gemini, LLaMA

    Multimodal Models

    Modern models like GPT-4V and Gemini can process and generate multiple types of contentโ€”text, images, audio, and videoโ€”in a unified architecture.

    Generative AI Model Applications

    4
    Large Language Models (LLMs) โ€“ The Text Specialists

    What Makes an LLM "Large"?

    Billions of Parameters

    GPT-4 has ~1.7 trillion parameters

    Deep Architecture

    Dozens to hundreds of transformer layers

    Massive Training Data

    Trained on trillions of tokens from the internet

    Why LLMs Excel at Text

    ๐ŸŽฏ Self-Attention Mechanism

    Every token attends to every other token, capturing long-range dependencies.

    ๐Ÿ“š Pre-training on Diverse Text

    Learns grammar, facts, reasoning, and even some common sense.

    ๐Ÿ”„ Next-Token Prediction

    Simple objective that leads to emergent complex behaviors.

    Transformer Architecture (Simplified)

    Output Probabilities
    Softmax
    Linear Layer

    Transformer Block ร— N

    Feed Forward Network
    Add & Normalize
    Multi-Head Self-Attention
    Add & Normalize
    Token Embedding
    +
    Positional Encoding
    Input Tokens

    5
    Tokens โ€“ The Building Blocks of Language Models

    LLMs don't read text character by character or word by wordโ€”they process tokens.

    Example Tokenization

    Input: "Hello, world!"

    Tokens: Hello,world!

    Input: "tokenization"

    Tokens: tokenization

    ~4

    Characters per token (average in English)

    ~0.75

    Words per token (rough estimate)

    ~100k

    Vocabulary size (GPT models)

    LLM Text Processing Pipeline

    flowchart LR A["๐Ÿ“ Raw Text"] --> B["๐Ÿ”ค Tokenizer"] B --> C["๐Ÿ”ข Token IDs"] C --> D["๐Ÿ“Š Embeddings"] D --> E["๐Ÿง  LLM"] E --> F["๐Ÿ“ˆ Logits"] F --> G["๐ŸŽฒ Sampling"] G --> H["๐Ÿ”ค Decode"] H --> I["๐Ÿ“ Output Text"] style A fill:#dcfce7 style B fill:#bbf7d0 style C fill:#86efac style D fill:#4ade80 style E fill:#22c55e style F fill:#4ade80 style G fill:#86efac style H fill:#bbf7d0 style I fill:#dcfce7
    Using Tokens in Spring AI
    // Token counting is important for API cost estimationChatClient client =ChatClient.builder(chatModel).build();String prompt ="Explain machine learning in simple terms";// Approximate: 6 tokens for this promptChatResponse response = client.prompt().user(prompt).call().chatResponse();// Access token usage from response metadataUsage usage = response.getMetadata().getUsage();System.out.println("Prompt tokens: "+ usage.getPromptTokens());System.out.println("Completion tokens: "+ usage.getGenerationTokens());System.out.println("Total tokens: "+ usage.getTotalTokens());

    6
    Inside an LLM's Vocabulary

    An LLM's vocabulary is its dictionary of all possible tokens it can recognize and generate.

    Tokenization Methods

    BPE (Byte Pair Encoding)

    Used by GPT models. Iteratively merges frequent character pairs.

    WordPiece

    Used by BERT. Similar to BPE but with likelihood-based merging.

    SentencePiece

    Language-agnostic. Treats text as raw bytes.

    Special Tokens

    [BOS]

    Beginning of sequence marker

    [EOS]

    End of sequence marker

    [PAD]

    Padding for batch processing

    [UNK]

    Unknown token placeholder

    7
    Embeddings & Vector Representations

    Embeddings convert discrete tokens into continuous vectors that capture semantic meaning.

    Why Embeddings Matter

    "king" - "man" + "woman"

    โ‰ˆ

    "queen"

    "Paris" - "France" + "Italy"

    โ‰ˆ

    "Rome"

    "good" โ†” "bad"

    โ‰ˆ

    "happy" โ†” "sad"

    Words in Vector Space

    Dimension 1Dim 2
    king
    man
    queen
    woman

    Similar words cluster together

    Parallel arrows = similar relationships

    Creating Embeddings with Spring AI
    @AutowiredprivateEmbeddingModel embeddingModel;publicList<Double>getEmbedding(String text){// Create embedding requestEmbeddingRequest request =newEmbeddingRequest(List.of(text),EmbeddingOptionsBuilder.builder().build());// Get embedding responseEmbeddingResponse response = embeddingModel.call(request);// Extract the embedding vector (typically 1536 dimensions for OpenAI)return response.getResult().getOutput();}// Calculate similarity between two textspublicdoublecosineSimilarity(String text1,String text2){List<Double> vec1 =getEmbedding(text1);List<Double> vec2 =getEmbedding(text2);double dotProduct =0.0, norm1 =0.0, norm2 =0.0;for(int i =0; i < vec1.size(); i++){
    dotProduct += vec1.get(i)* vec2.get(i);
    norm1 +=Math.pow(vec1.get(i),2);
    norm2 +=Math.pow(vec2.get(i),2);}return dotProduct /(Math.sqrt(norm1)*Math.sqrt(norm2));}

    8
    How Embeddings Are Created

    The Embedding Process

    Token

    "hello"

    Token ID

    15339

    Embedding Vector

    [0.12, -0.45, ...]

    Embedding Matrix

    A learnable lookup table of size [vocabulary_size ร— embedding_dim]

    Vocabulary: 100,000 tokens

    Embedding dim: 1,536

    Parameters: ~153.6M

    Learned Through Training

    Embeddings are optimized during training to capture semantic relationships

    • โ€ข Similar words have similar vectors
    • โ€ข Relationships are preserved
    • โ€ข Context influences meaning

    Embedding Matrix Lookup

    flowchart LR subgraph Input T1["Token: 'hello'"] T2["Token: 'world'"] end subgraph Lookup["Vocabulary Lookup"] ID1["ID: 15339"] ID2["ID: 8922"] end subgraph Matrix["Embedding Matrix
    100K ร— 1536"] R1["Row 15339"] R2["Row 8922"] end subgraph Output["Embedding Vectors"] V1["[0.12, -0.45, ...]"] V2["[-0.23, 0.67, ...]"] end T1 --> ID1 T2 --> ID2 ID1 --> R1 ID2 --> R2 R1 --> V1 R2 --> V2 style Input fill:#fff7ed style Lookup fill:#ffedd5 style Matrix fill:#fed7aa style Output fill:#fdba74

    9
    Understanding Word Order

    Unlike RNNs, Transformers process all tokens in parallelโ€”so they need a way to know word order.

    Why Order Matters

    "The dog bit the man"

    Different meaning from...

    "The man bit the dog"

    Same words, different meaning!

    Positional Encoding

    Positional encodings are added to token embeddings to inject position information:

    Final Embedding = Token Embedding + Positional Encoding

    // Sinusoidal encoding (original Transformer)

    PE(pos, 2i) = sin(pos / 10000^(2i/d))

    PE(pos, 2i+1) = cos(pos / 10000^(2i/d))

    Absolute Positional Encoding

    Each position gets a unique vector. Used in original Transformer, GPT-2.

    Relative/Rotary Encoding (RoPE)

    Encodes relative distances. Used in LLaMA, modern models.

    10
    Understanding Context

    The attention mechanism is the secret sauce that allows LLMs to understand context.

    Self-Attention in Action

    In the sentence: "The cat sat on the mat because it was tired"

    Self-attention allows "it" to strongly attend to "cat", understanding they refer to the same entity.

    Attention weights when processing "it":

    The0.05
    cat0.45
    sat0.03
    on0.02
    the0.02
    mat0.08
    because0.05
    it0.20
    was0.05
    tired0.05

    Higher bars = stronger attention. "it" strongly attends to "cat" (0.45)

    Context Window Sizes

    GPT-3.54K - 16K tokens
    GPT-48K - 128K tokens
    Claude 3200K tokens
    Gemini 1.5 Pro1M tokens

    Why Context Size Matters

    Longer documents can be processed at once

    Better understanding of complex topics

    Maintain conversation history longer

    More effective RAG implementations

    Your AI Journey Roadmap

    AI History & Evolution

    From rule-based systems to generative AI

    AI, ML, DL Relationships

    Understanding the AI family tree

    Generative Models

    GANs, VAEs, Diffusion, Transformers

    LLM Architecture

    Why LLMs excel at text generation

    Tokenization

    Tokens, vocabularies, and encoding

    Embeddings & Context

    Vector representations and attention

    ๐Ÿ’ฌ Comments & Discussion