What is the best way to learn Java 8?

Start with Lambda Expressions and Functional Interfaces, then progress to Stream API and Optional. Practice with real coding examples and take quizzes to test your understanding.

How do I prepare for Spring Boot interviews?

Focus on core concepts like dependency injection, REST APIs, Spring Data JPA, and Spring Security. Practice with our 100+ Spring Boot quiz questions covering real interview scenarios.

What topics are covered in System Design?

We cover scalability patterns, database design, microservices architecture, distributed systems, caching strategies, API design, and security architecture.

Spring AI Tutorials

Tutorial 03

Generative AI & LLMs

A Friendly Family Get-Together

Explore the AI landscape from history to embeddings, understanding how LLMs process and generate text

1
The Journey to Generative AI

A Brief History: From Rule-Based Systems to Generative AI

1950s

Alan Turing proposes the Turing Test; early rule-based systems emerge

1980s

Expert systems dominate; backpropagation revives neural networks

2012

AlexNet wins ImageNet; deep learning renaissance begins

2017

"Attention Is All You Need" paper introduces Transformer architecture

2022+

ChatGPT, GPT-4, Claude, and multimodal AI revolutionize the field

Key Milestones That Shaped Modern AI

🎯 The Perceptron (1958)

First neural network capable of learning, though limited to linear problems.

🔄 Backpropagation (1986)

Efficient training algorithm that made deep networks possible.

📊 Word2Vec (2013)

Word embeddings that capture semantic relationships.

🤖 BERT & GPT (2018)

Transformer-based models that revolutionized NLP.

AI Evolution Timeline

1950s

Rule-Based

1980s

Expert Systems

2012

Deep Learning

2017

Transformers

2022+

ChatGPT Era

2
Meet the AI Family

Understanding the hierarchy of AI technologies helps clarify where LLMs fit in the broader landscape.

🧠 Artificial Intelligence (AI)

The broadest category: any system that can perform tasks requiring human-like intelligence.

⚙️ Machine Learning (ML)

AI that learns patterns from data without explicit programming.

🔮 Deep Learning (DL)

ML using neural networks with multiple layers.

✨ Generative AI

DL models that can create new content (text, images, audio, code).

Supervised Learning

Learns from labeled examples (classification, regression)

Unsupervised Learning

Finds patterns in unlabeled data (clustering, dimensionality reduction)

Reinforcement Learning

Learns by trial and error with rewards (games, robotics)

3
Generative AI Models Explained

Generative AI encompasses various model architectures, each with unique strengths:

GANs (Generative Adversarial Networks)

Two networks compete: a generator creates content, a discriminator evaluates it.

Image Generation, Style Transfer

VAEs (Variational Autoencoders)

Encode data into a latent space, then decode to generate new variations.

Data Compression, Anomaly Detection

Diffusion Models

Learn to reverse a noise process, gradually refining random noise into coherent output.

DALL-E, Stable Diffusion, Midjourney

Transformers

Use attention mechanisms to process sequences in parallel, excelling at language tasks.

GPT, Claude, Gemini, LLaMA

Multimodal Models

Modern models like GPT-4V and Gemini can process and generate multiple types of content—text, images, audio, and video—in a unified architecture.

Generative AI Model Applications

4
Large Language Models (LLMs) – The Text Specialists

What Makes an LLM "Large"?

Billions of Parameters

GPT-4 has ~1.7 trillion parameters

Deep Architecture

Dozens to hundreds of transformer layers

Massive Training Data

Trained on trillions of tokens from the internet

Why LLMs Excel at Text

🎯 Self-Attention Mechanism

Every token attends to every other token, capturing long-range dependencies.

📚 Pre-training on Diverse Text

Learns grammar, facts, reasoning, and even some common sense.

🔄 Next-Token Prediction

Simple objective that leads to emergent complex behaviors.

Transformer Architecture (Simplified)

Output Probabilities

Softmax

Linear Layer

Transformer Block × N

Feed Forward Network

Add & Normalize

Multi-Head Self-Attention

Add & Normalize

Token Embedding

Positional Encoding

Input Tokens

5
Tokens – The Building Blocks of Language Models

LLMs don't read text character by character or word by word—they process tokens.

Example Tokenization

Input: "Hello, world!"

Tokens: Hello,world!

Input: "tokenization"

Tokens: tokenization

Characters per token (average in English)

~0.75

Words per token (rough estimate)

~100k

Vocabulary size (GPT models)

LLM Text Processing Pipeline

flowchart LR A["📝 Raw Text"] --> B["🔤 Tokenizer"] B --> C["🔢 Token IDs"] C --> D["📊 Embeddings"] D --> E["🧠 LLM"] E --> F["📈 Logits"] F --> G["🎲 Sampling"] G --> H["🔤 Decode"] H --> I["📝 Output Text"] style A fill:#dcfce7 style B fill:#bbf7d0 style C fill:#86efac style D fill:#4ade80 style E fill:#22c55e style F fill:#4ade80 style G fill:#86efac style H fill:#bbf7d0 style I fill:#dcfce7

Using Tokens in Spring AI

// Token counting is important for API cost estimationChatClient client =ChatClient.builder(chatModel).build();String prompt ="Explain machine learning in simple terms";// Approximate: 6 tokens for this promptChatResponse response = client.prompt().user(prompt).call().chatResponse();// Access token usage from response metadataUsage usage = response.getMetadata().getUsage();System.out.println("Prompt tokens: "+ usage.getPromptTokens());System.out.println("Completion tokens: "+ usage.getGenerationTokens());System.out.println("Total tokens: "+ usage.getTotalTokens());

6
Inside an LLM's Vocabulary

An LLM's vocabulary is its dictionary of all possible tokens it can recognize and generate.

Tokenization Methods

BPE (Byte Pair Encoding)

Used by GPT models. Iteratively merges frequent character pairs.

WordPiece

Used by BERT. Similar to BPE but with likelihood-based merging.

SentencePiece

Language-agnostic. Treats text as raw bytes.

Special Tokens

[BOS]

Beginning of sequence marker

[EOS]

End of sequence marker

[PAD]

Padding for batch processing

[UNK]

Unknown token placeholder

7
Embeddings & Vector Representations

Embeddings convert discrete tokens into continuous vectors that capture semantic meaning.

Why Embeddings Matter

"king" - "man" + "woman"

≈

"queen"

"Paris" - "France" + "Italy"

≈

"Rome"

"good" ↔ "bad"

≈

"happy" ↔ "sad"

Words in Vector Space

Dimension 1Dim 2

king

man

queen

woman

Similar words cluster together

Parallel arrows = similar relationships

Creating Embeddings with Spring AI

@AutowiredprivateEmbeddingModel embeddingModel;publicList<Double>getEmbedding(String text){// Create embedding requestEmbeddingRequest request =newEmbeddingRequest(List.of(text),EmbeddingOptionsBuilder.builder().build());// Get embedding responseEmbeddingResponse response = embeddingModel.call(request);// Extract the embedding vector (typically 1536 dimensions for OpenAI)return response.getResult().getOutput();}// Calculate similarity between two textspublicdoublecosineSimilarity(String text1,String text2){List<Double> vec1 =getEmbedding(text1);List<Double> vec2 =getEmbedding(text2);double dotProduct =0.0, norm1 =0.0, norm2 =0.0;for(int i =0; i < vec1.size(); i++){
dotProduct += vec1.get(i)* vec2.get(i);
norm1 +=Math.pow(vec1.get(i),2);
norm2 +=Math.pow(vec2.get(i),2);}return dotProduct /(Math.sqrt(norm1)*Math.sqrt(norm2));}

8
How Embeddings Are Created

The Embedding Process

Token

"hello"

Token ID

15339

Embedding Vector

[0.12, -0.45, ...]

Embedding Matrix

A learnable lookup table of size [vocabulary_size × embedding_dim]

Vocabulary: 100,000 tokens

Embedding dim: 1,536

Parameters: ~153.6M

Learned Through Training

Embeddings are optimized during training to capture semantic relationships

• Similar words have similar vectors
• Relationships are preserved
• Context influences meaning

Embedding Matrix Lookup

flowchart LR subgraph Input T1["Token: 'hello'"] T2["Token: 'world'"] end subgraph Lookup["Vocabulary Lookup"] ID1["ID: 15339"] ID2["ID: 8922"] end subgraph Matrix["Embedding Matrix
100K × 1536"] R1["Row 15339"] R2["Row 8922"] end subgraph Output["Embedding Vectors"] V1["[0.12, -0.45, ...]"] V2["[-0.23, 0.67, ...]"] end T1 --> ID1 T2 --> ID2 ID1 --> R1 ID2 --> R2 R1 --> V1 R2 --> V2 style Input fill:#fff7ed style Lookup fill:#ffedd5 style Matrix fill:#fed7aa style Output fill:#fdba74

9
Understanding Word Order

Unlike RNNs, Transformers process all tokens in parallel—so they need a way to know word order.

Why Order Matters

"The dog bit the man"

Different meaning from...

"The man bit the dog"

Same words, different meaning!

Positional Encoding

Positional encodings are added to token embeddings to inject position information:

Final Embedding = Token Embedding + Positional Encoding

// Sinusoidal encoding (original Transformer)

PE(pos, 2i) = sin(pos / 10000^(2i/d))

PE(pos, 2i+1) = cos(pos / 10000^(2i/d))

Absolute Positional Encoding

Each position gets a unique vector. Used in original Transformer, GPT-2.

Relative/Rotary Encoding (RoPE)

Encodes relative distances. Used in LLaMA, modern models.

10
Understanding Context

The attention mechanism is the secret sauce that allows LLMs to understand context.

Self-Attention in Action

In the sentence: "The cat sat on the mat because it was tired"

Self-attention allows "it" to strongly attend to "cat", understanding they refer to the same entity.

Attention weights when processing "it":

The0.05

cat0.45

sat0.03

on0.02

the0.02

mat0.08

because0.05

it0.20

was0.05

tired0.05

Higher bars = stronger attention. "it" strongly attends to "cat" (0.45)

Context Window Sizes

GPT-3.54K - 16K tokens

GPT-48K - 128K tokens

Claude 3200K tokens

Gemini 1.5 Pro1M tokens

Why Context Size Matters

Longer documents can be processed at once

Better understanding of complex topics

Maintain conversation history longer

More effective RAG implementations

Your AI Journey Roadmap

AI History & Evolution

From rule-based systems to generative AI

AI, ML, DL Relationships

Understanding the AI family tree

Generative Models

GANs, VAEs, Diffusion, Transformers

LLM Architecture

Why LLMs excel at text generation

Tokenization

Tokens, vocabularies, and encoding

Embeddings & Context

Vector representations and attention

Generative AI & LLMs

1The Journey to Generative AI

A Brief History: From Rule-Based Systems to Generative AI

Key Milestones That Shaped Modern AI

AI Evolution Timeline

2Meet the AI Family

3Generative AI Models Explained

GANs (Generative Adversarial Networks)

VAEs (Variational Autoencoders)

Diffusion Models

Transformers

Generative AI Model Applications

4Large Language Models (LLMs) – The Text Specialists

What Makes an LLM "Large"?

Why LLMs Excel at Text

Transformer Architecture (Simplified)

5Tokens – The Building Blocks of Language Models

Example Tokenization

LLM Text Processing Pipeline

6Inside an LLM's Vocabulary

Tokenization Methods

Special Tokens

7Embeddings & Vector Representations

Why Embeddings Matter

Words in Vector Space

8How Embeddings Are Created

The Embedding Process

Embedding Matrix

Learned Through Training

Embedding Matrix Lookup

9Understanding Word Order

Why Order Matters

Positional Encoding

10Understanding Context

Self-Attention in Action

Context Window Sizes

Why Context Size Matters

Your AI Journey Roadmap

💬 Comments & Discussion

1
The Journey to Generative AI

2
Meet the AI Family

3
Generative AI Models Explained

4
Large Language Models (LLMs) – The Text Specialists

5
Tokens – The Building Blocks of Language Models

6
Inside an LLM's Vocabulary

7
Embeddings & Vector Representations

8
How Embeddings Are Created

9
Understanding Word Order

10
Understanding Context