Transformers

Neuron feedback loop useful for sequential word/token prediction.
Giant vectors at every step could have information loss.
"Attention is all you need" - Hidden state for each step (token).
RNN still sequential, can't parallelize.
Attention Scores: Computed by the dot product of Query and Key & Value matrices.
Replaces RNN with feed-forward neural networks.
Processes entire input at once
Attention provides context

Transformer Components

Encoder: The encoder processes the input sequence and generates a set of hidden states that represent the input information. It consists of multiple layers, each containing self-attention and feed-forward neural networks.
Decoder: Generates target sequence from the source sequence.
Token Embedding: Captures semantic relationships but not positions.
Self-Attention:
- Each token gets Query, Key, and Value matrices.
- Uses softmax to normalize.
- Masking prevents looking at future tokens.
Applications: Chat, question answering, text classification, NER, summarization, translation, code generation.

Generative Pretrained Transformers (GPT)

LLM Concepts:
- Tokens: Numerical representation of words.
- Embeddings: Math representation (vector) encoding meaning of a token.
- Top P: Probability threshold for token inclusion.
- Top K: Alternate mechanism where K candidates exist.
- Temperature: Level of randomness.
- Context window: Number of tokens an LLM can process at once.
- Max tokens: Max input/output tokens.
Transfer Learning / Fine-Tuning:
- Adds layers on top of models.
- Can train on data like emails to mimic new replies.

Bedrock

API for interacting with serverless generative AI Foundation models.

IAM required (not root).
bedrock: Manage, deploy, train models.
bedrock-runtime: Perform inference.
bedrock-agent: API.
Fine-tuning: Bake data into model to avoid prompts.
Continued Pre-Training: Uses unlabeled data to familiarize the model with new information.
Retrieval-Augmented Generation (RAG):
- Queries external databases (e.g., vector search).
- Reduces hallucinations.
- Doesn't train a model.
- Faster and cheaper but adds more tokens.
LLM Agents / Agentic AI:
- Gives tools to LLM.
- PT (Provisioned Throughput): Increases token processing rate.
- Uses InvokeAgent request with an alias ID.
- Action Groups define available tools.

GuardRails

Content filtering for prompts and responses
Works with text foundation models
Word filtering
Topic filtering
Profanities
PII removal (or masking)
Contextual Grounding Check
Helps prevent hallucination
Measures “grounding” (how similar the response is to the contextual data received)
And relevance (of response to the query)
Can be incorporated into agents and knowledge bases
May configure the “blocked message” respons

Foundation Models

Overview

Foundation models are large-scale pre-trained AI models designed to handle a variety of tasks such as natural language processing, image generation, and code completion. These models are typically fine-tuned for specific applications. SageMaker Jumpstart enables quick deployment of foundation models within SageMaker notebooks.

Notable Models

GPTn (OpenAI, Microsoft)–Advanced language models for text generation and reasoning.
BERT (Google)–Transformer-based model for NLP tasks like sentiment analysis and question answering.
DALL·E (OpenAI, Microsoft)–AI model for generating images from text prompts.
LLaMa (Meta)–Large-scale language model designed for efficiency and research applications.
AWS Jurassic 2–Multilingual foundation model supporting multiple languages.
Claude (Anthropic)–AI assistant optimized for safe and transparent conversational AI.
Stable Diffusion (Stability AI)–Open-source model for generating images from text prompts.
AWS Titan–AWS’s in-house foundation model for various AI applications.

Neural Networks Common Algorithms