notes
ml
Transformers

Transformers

  • Neuron feedback loop useful for sequential word/token prediction.
  • Giant vectors at every step could have information loss.
  • "Attention is all you need" - Hidden state for each step (token).
  • RNN still sequential, can't parallelize.
  • Attention Scores: Computed by the dot product of Query and Key & Value matrices.
  • Replaces RNN with feed-forward neural networks.
  • Processes entire input at once
  • Attention provides context

Transformer Components

  • Encoder: The encoder processes the input sequence and generates a set of hidden states that represent the input information. It consists of multiple layers, each containing self-attention and feed-forward neural networks.
  • Decoder: Generates target sequence from the source sequence.
  • Token Embedding: Captures semantic relationships but not positions.
  • Self-Attention:
    • Each token gets Query, Key, and Value matrices.
    • Uses softmax to normalize.
    • Masking prevents looking at future tokens.
  • Applications: Chat, question answering, text classification, NER, summarization, translation, code generation.

Generative Pretrained Transformers (GPT)

  • LLM Concepts:
    • Tokens: Numerical representation of words.
    • Embeddings: Math representation (vector) encoding meaning of a token.
    • Top P: Probability threshold for token inclusion.
    • Top K: Alternate mechanism where K candidates exist.
    • Temperature: Level of randomness.
    • Context window: Number of tokens an LLM can process at once.
    • Max tokens: Max input/output tokens.
  • Transfer Learning / Fine-Tuning:
    • Adds layers on top of models.
    • Can train on data like emails to mimic new replies.

Bedrock

API for interacting with serverless generative AI Foundation models.

  • IAM required (not root).
  • bedrock: Manage, deploy, train models.
  • bedrock-runtime: Perform inference.
  • bedrock-agent: API.
  • Fine-tuning: Bake data into model to avoid prompts.
  • Continued Pre-Training: Uses unlabeled data to familiarize the model with new information.
  • Retrieval-Augmented Generation (RAG):
    • Queries external databases (e.g., vector search).
    • Reduces hallucinations.
    • Doesn't train a model.
    • Faster and cheaper but adds more tokens.
  • LLM Agents / Agentic AI:
    • Gives tools to LLM.
    • PT (Provisioned Throughput): Increases token processing rate.
    • Uses InvokeAgent request with an alias ID.
    • Action Groups define available tools.

GuardRails

  • Content filtering for prompts and responses
  • Works with text foundation models
  • Word filtering
  • Topic filtering
  • Profanities
  • PII removal (or masking)
  • Contextual Grounding Check
  • Helps prevent hallucination
  • Measures “grounding” (how similar the response is to the contextual data received)
  • And relevance (of response to the query)
  • Can be incorporated into agents and knowledge bases
  • May configure the “blocked message” respons

Foundation Models

Overview

Foundation models are large-scale pre-trained AI models designed to handle a variety of tasks such as natural language processing, image generation, and code completion. These models are typically fine-tuned for specific applications. SageMaker Jumpstart enables quick deployment of foundation models within SageMaker notebooks.

Notable Models

  • GPTn (OpenAI, Microsoft)–Advanced language models for text generation and reasoning.
  • BERT (Google)–Transformer-based model for NLP tasks like sentiment analysis and question answering.
  • DALL·E (OpenAI, Microsoft)–AI model for generating images from text prompts.
  • LLaMa (Meta)–Large-scale language model designed for efficiency and research applications.
  • AWS Jurassic 2–Multilingual foundation model supporting multiple languages.
  • Claude (Anthropic)–AI assistant optimized for safe and transparent conversational AI.
  • Stable Diffusion (Stability AI)–Open-source model for generating images from text prompts.
  • AWS Titan–AWS’s in-house foundation model for various AI applications.