Transformers
- Neuron feedback loop useful for sequential word/token prediction.
- Giant vectors at every step could have information loss.
- "Attention is all you need" - Hidden state for each step (token).
- RNN still sequential, can't parallelize.
- Attention Scores: Computed by the dot product of Query and Key & Value matrices.
- Replaces RNN with feed-forward neural networks.
- Processes entire input at once
- Attention provides context
Transformer Components
- Encoder: The encoder processes the input sequence and generates a set of hidden states that represent the input information. It consists of multiple layers, each containing self-attention and feed-forward neural networks.
- Decoder: Generates target sequence from the source sequence.
- Token Embedding: Captures semantic relationships but not positions.
- Self-Attention:
- Each token gets Query, Key, and Value matrices.
- Uses softmax to normalize.
- Masking prevents looking at future tokens.
- Applications: Chat, question answering, text classification, NER, summarization, translation, code generation.
Generative Pretrained Transformers (GPT)
- LLM Concepts:
- Tokens: Numerical representation of words.
- Embeddings: Math representation (vector) encoding meaning of a token.
Top P
: Probability threshold for token inclusion.Top K
: Alternate mechanism where K candidates exist.Temperature
: Level of randomness.- Context window: Number of tokens an LLM can process at once.
- Max tokens: Max input/output tokens.
- Transfer Learning / Fine-Tuning:
- Adds layers on top of models.
- Can train on data like emails to mimic new replies.
Bedrock
API for interacting with serverless generative AI Foundation models.
- IAM required (not root).
bedrock
: Manage, deploy, train models.bedrock-runtime
: Perform inference.bedrock-agent
: API.- Fine-tuning: Bake data into model to avoid prompts.
- Continued Pre-Training: Uses unlabeled data to familiarize the model with new information.
- Retrieval-Augmented Generation (RAG):
- Queries external databases (e.g., vector search).
- Reduces hallucinations.
- Doesn't train a model.
- Faster and cheaper but adds more tokens.
- LLM Agents / Agentic AI:
- Gives tools to LLM.
PT (Provisioned Throughput)
: Increases token processing rate.- Uses InvokeAgent request with an alias ID.
- Action Groups define available tools.
GuardRails
- Content filtering for prompts and responses
- Works with text foundation models
- Word filtering
- Topic filtering
- Profanities
- PII removal (or masking)
- Contextual Grounding Check
- Helps prevent hallucination
- Measures “grounding” (how similar the response is to the contextual data received)
- And relevance (of response to the query)
- Can be incorporated into agents and knowledge bases
- May configure the “blocked message” respons
Foundation Models
Overview
Foundation models are large-scale pre-trained AI models designed to handle a variety of tasks such as natural language processing, image generation, and code completion. These models are typically fine-tuned for specific applications. SageMaker Jumpstart enables quick deployment of foundation models within SageMaker notebooks.
Notable Models
- GPTn (OpenAI, Microsoft)–Advanced language models for text generation and reasoning.
- BERT (Google)–Transformer-based model for NLP tasks like sentiment analysis and question answering.
- DALL·E (OpenAI, Microsoft)–AI model for generating images from text prompts.
- LLaMa (Meta)–Large-scale language model designed for efficiency and research applications.
- AWS Jurassic 2–Multilingual foundation model supporting multiple languages.
- Claude (Anthropic)–AI assistant optimized for safe and transparent conversational AI.
- Stable Diffusion (Stability AI)–Open-source model for generating images from text prompts.
- AWS Titan–AWS’s in-house foundation model for various AI applications.