Generative AI & LLMs
A Friendly Family Get-Together
Explore the AI landscape from history to embeddings, understanding how LLMs process and generate text
1The Journey to Generative AI
A Brief History: From Rule-Based Systems to Generative AI
Alan Turing proposes the Turing Test; early rule-based systems emerge
Expert systems dominate; backpropagation revives neural networks
AlexNet wins ImageNet; deep learning renaissance begins
"Attention Is All You Need" paper introduces Transformer architecture
ChatGPT, GPT-4, Claude, and multimodal AI revolutionize the field
Key Milestones That Shaped Modern AI
๐ฏ The Perceptron (1958)
First neural network capable of learning, though limited to linear problems.
๐ Backpropagation (1986)
Efficient training algorithm that made deep networks possible.
๐ Word2Vec (2013)
Word embeddings that capture semantic relationships.
๐ค BERT & GPT (2018)
Transformer-based models that revolutionized NLP.
AI Evolution Timeline
1950s
Rule-Based
1980s
Expert Systems
2012
Deep Learning
2017
Transformers
2022+
ChatGPT Era
2Meet the AI Family
Understanding the hierarchy of AI technologies helps clarify where LLMs fit in the broader landscape.
๐ง Artificial Intelligence (AI)
The broadest category: any system that can perform tasks requiring human-like intelligence.
โ๏ธ Machine Learning (ML)
AI that learns patterns from data without explicit programming.
๐ฎ Deep Learning (DL)
ML using neural networks with multiple layers.
โจ Generative AI
DL models that can create new content (text, images, audio, code).
Supervised Learning
Learns from labeled examples (classification, regression)
Unsupervised Learning
Finds patterns in unlabeled data (clustering, dimensionality reduction)
Reinforcement Learning
Learns by trial and error with rewards (games, robotics)
3Generative AI Models Explained
Generative AI encompasses various model architectures, each with unique strengths:
GANs (Generative Adversarial Networks)
Two networks compete: a generator creates content, a discriminator evaluates it.
VAEs (Variational Autoencoders)
Encode data into a latent space, then decode to generate new variations.
Diffusion Models
Learn to reverse a noise process, gradually refining random noise into coherent output.
Transformers
Use attention mechanisms to process sequences in parallel, excelling at language tasks.
Multimodal Models
Modern models like GPT-4V and Gemini can process and generate multiple types of contentโtext, images, audio, and videoโin a unified architecture.
Generative AI Model Applications
4Large Language Models (LLMs) โ The Text Specialists
What Makes an LLM "Large"?
Billions of Parameters
GPT-4 has ~1.7 trillion parameters
Deep Architecture
Dozens to hundreds of transformer layers
Massive Training Data
Trained on trillions of tokens from the internet
Why LLMs Excel at Text
๐ฏ Self-Attention Mechanism
Every token attends to every other token, capturing long-range dependencies.
๐ Pre-training on Diverse Text
Learns grammar, facts, reasoning, and even some common sense.
๐ Next-Token Prediction
Simple objective that leads to emergent complex behaviors.
Transformer Architecture (Simplified)
Transformer Block ร N
5Tokens โ The Building Blocks of Language Models
LLMs don't read text character by character or word by wordโthey process tokens.
Example Tokenization
Input: "Hello, world!"
Tokens: Hello,world!
Input: "tokenization"
Tokens: tokenization
~4
Characters per token (average in English)
~0.75
Words per token (rough estimate)
~100k
Vocabulary size (GPT models)
LLM Text Processing Pipeline
// Token counting is important for API cost estimationChatClient client =ChatClient.builder(chatModel).build();String prompt ="Explain machine learning in simple terms";// Approximate: 6 tokens for this promptChatResponse response = client.prompt().user(prompt).call().chatResponse();// Access token usage from response metadataUsage usage = response.getMetadata().getUsage();System.out.println("Prompt tokens: "+ usage.getPromptTokens());System.out.println("Completion tokens: "+ usage.getGenerationTokens());System.out.println("Total tokens: "+ usage.getTotalTokens());6Inside an LLM's Vocabulary
An LLM's vocabulary is its dictionary of all possible tokens it can recognize and generate.
Tokenization Methods
BPE (Byte Pair Encoding)
Used by GPT models. Iteratively merges frequent character pairs.
WordPiece
Used by BERT. Similar to BPE but with likelihood-based merging.
SentencePiece
Language-agnostic. Treats text as raw bytes.
Special Tokens
[BOS]Beginning of sequence marker
[EOS]End of sequence marker
[PAD]Padding for batch processing
[UNK]Unknown token placeholder
7Embeddings & Vector Representations
Embeddings convert discrete tokens into continuous vectors that capture semantic meaning.
Why Embeddings Matter
"king" - "man" + "woman"
โ
"queen"
"Paris" - "France" + "Italy"
โ
"Rome"
"good" โ "bad"
โ
"happy" โ "sad"
Words in Vector Space
Similar words cluster together
Parallel arrows = similar relationships
@AutowiredprivateEmbeddingModel embeddingModel;publicList<Double>getEmbedding(String text){// Create embedding requestEmbeddingRequest request =newEmbeddingRequest(List.of(text),EmbeddingOptionsBuilder.builder().build());// Get embedding responseEmbeddingResponse response = embeddingModel.call(request);// Extract the embedding vector (typically 1536 dimensions for OpenAI)return response.getResult().getOutput();}// Calculate similarity between two textspublicdoublecosineSimilarity(String text1,String text2){List<Double> vec1 =getEmbedding(text1);List<Double> vec2 =getEmbedding(text2);double dotProduct =0.0, norm1 =0.0, norm2 =0.0;for(int i =0; i < vec1.size(); i++){
dotProduct += vec1.get(i)* vec2.get(i);
norm1 +=Math.pow(vec1.get(i),2);
norm2 +=Math.pow(vec2.get(i),2);}return dotProduct /(Math.sqrt(norm1)*Math.sqrt(norm2));}8How Embeddings Are Created
The Embedding Process
Token
"hello"
Token ID
15339
Embedding Vector
[0.12, -0.45, ...]
Embedding Matrix
A learnable lookup table of size [vocabulary_size ร embedding_dim]
Vocabulary: 100,000 tokens
Embedding dim: 1,536
Parameters: ~153.6M
Learned Through Training
Embeddings are optimized during training to capture semantic relationships
- โข Similar words have similar vectors
- โข Relationships are preserved
- โข Context influences meaning
Embedding Matrix Lookup
100K ร 1536"] R1["Row 15339"] R2["Row 8922"] end subgraph Output["Embedding Vectors"] V1["[0.12, -0.45, ...]"] V2["[-0.23, 0.67, ...]"] end T1 --> ID1 T2 --> ID2 ID1 --> R1 ID2 --> R2 R1 --> V1 R2 --> V2 style Input fill:#fff7ed style Lookup fill:#ffedd5 style Matrix fill:#fed7aa style Output fill:#fdba74
9Understanding Word Order
Unlike RNNs, Transformers process all tokens in parallelโso they need a way to know word order.
Why Order Matters
"The dog bit the man"
Different meaning from...
"The man bit the dog"
Same words, different meaning!
Positional Encoding
Positional encodings are added to token embeddings to inject position information:
Final Embedding = Token Embedding + Positional Encoding
// Sinusoidal encoding (original Transformer)
PE(pos, 2i) = sin(pos / 10000^(2i/d))
PE(pos, 2i+1) = cos(pos / 10000^(2i/d))
Absolute Positional Encoding
Each position gets a unique vector. Used in original Transformer, GPT-2.
Relative/Rotary Encoding (RoPE)
Encodes relative distances. Used in LLaMA, modern models.
10Understanding Context
The attention mechanism is the secret sauce that allows LLMs to understand context.
Self-Attention in Action
In the sentence: "The cat sat on the mat because it was tired"
Self-attention allows "it" to strongly attend to "cat", understanding they refer to the same entity.
Attention weights when processing "it":
Higher bars = stronger attention. "it" strongly attends to "cat" (0.45)
Context Window Sizes
Why Context Size Matters
Longer documents can be processed at once
Better understanding of complex topics
Maintain conversation history longer
More effective RAG implementations
Your AI Journey Roadmap
AI History & Evolution
From rule-based systems to generative AI
AI, ML, DL Relationships
Understanding the AI family tree
Generative Models
GANs, VAEs, Diffusion, Transformers
LLM Architecture
Why LLMs excel at text generation
Tokenization
Tokens, vocabularies, and encoding
Embeddings & Context
Vector representations and attention