For a hands-on learning experience to develop LLM applications, join our LLM Bootcamp today.
First 4 seats get a discount of 20%! So hurry up!

Large Language Models Bootcamp

NEW

. 5 DAYS . 40 HOURS . IN-PERSON / ONLINE

Comprehensive, hands-on curriculum to give you a headstart in building Large Language Model Applications.
LLM Bootcamp | Data Science Dojo
$4999

20% OFF

Ratings | LMM Bootcamp | Data Science Dojo 4.95 · 640+ reviews

$3999

Includes:

  • Software subscriptions and cloud services credit up to USD $500.
  • Breakfast, lunch and beverages daily.
  • Course material and access to coding labs, Jupyter notebooks, and hundreds of learning resources for 1 year.

INSTRUCTORS AND GUEST SPEAKERS

Learn From Industry Leaders

COURSE OVERVIEW

Learn to Build and Deploy Custom LLM Applications

Pre-trained Large Language Models like ChatGPT offer impressive capabilities but they cannot be used in scenarios where the underlying data is proprietary and requires industry-specific knowledge. Businesses are rushing to build custom LLM applications that offer enhanced performance, control, customization and most importantly, competitive advantage. This bootcamp offers a comprehensive introduction to get started with building a ChatGPT on your own data. By the end of the bootcamp, you will be capable of building LLM-powered applications on any dataset of your choice.

In collaboration with

Who Is This Course For

Anyone

Anyone interested in getting a headstart by getting a hands-on experience with building LLM applications.

Data professionals

Data professionals who want to supercharge their data skills using cutting-edge generative AI tools and techniques.

Product leaders

Product leaders at enterprises or startups seeking to leverage LLMs to enhance their products, processes and services.

Curriculum Highlights

Generative AI and LLM Fundamentals

A comprehensive introduction to the fundamentals of generative AI, foundation models and Large language models

Canonical Architectures of LLM Applications

An in-depth understanding of various LLM-powered application architectures and their relative tradeoffs

Embeddings and Vector Databases

Hands-on experience with vector databases and vector embeddings

Prompt Engineering

Practical experience with writing effective prompts for your LLM applications

Orchestration Frameworks: LangChain

Practical experience with orchestration frameworks like LangChain

Customizing Large Language Models

Practical experience with fine-tuning, parameter efficient tuning and retrieval parameter-efficient + retrieval-augmented approaches

Building An End-to-End Custom LLM Application

A custom LLM application created on selected datasets

$4999

20% OFF


Ratings | LMM Bootcamp | Data Science Dojo 4.95 · 640+ reviews

$3999

Includes:

  • Software subscriptions and cloud services credit up to USD $500.
  • Breakfast, lunch and beverages daily.
  • Course material and access to coding labs, Jupyter notebooks, and hundreds of learning resources for 1 year.

Technologies and Tools

Testimonials

Our customers and partners love us!
Omar Smith | LLM Bootcamp | Data Science Dojo
Play Video
Roger Campbell | LLM Bootcamp | Data Science Dojo
Play Video
Tariq Hook | LLM Bootcamp | Data Science Dojo
Play Video
Ali Abuharb | LLM Bootcamp | Data Science Dojo
Play Video
Dave Horton | LLM Bootcamp | Data Science Dojo
Play Video
Francisco Morales | LLM Boorcamp | Data Science Dojo
Play Video
Shakeeb Syed | LLM Bootcamp | Data Science Dojo
Play Video
Yashwant Reddy | LLM Bootcamp | Data Science Dojo
Play Video
Sahar Nesaei | LLM Bootcamp | Data Science Dojo
Play Video
Florian Klonek | LLM Bootcamp | Data Science Dojo
Play Video
Large Language Models Bootcamp - Maryam Bagher
Play Video
Large Language Models Bootcamp - Kshitij Singh
Play Video
Large Language Models Bootcamp - Jared Miller
Play Video
Victor Green | Large Language Models Bootcamp | Data Science Dojo
Play Video
Ken Btler | Large Language Models Bootcamp | Data Science Dojo
Play Video
Abrar Bhuiyan | Large Language Models Bootcamp | Data Science Dojo
Play Video
Erika Davis | Large Language Models Bootcamp | Data Science Dojo
Play Video
Aishwariya Raman | Large Language Models Bootcamp | Data Science Dojo
Play Video
Luis Armando | Large Language Models Bootcamp | Data Science Dojo
Play Video
Amity Fox | Large Language Models Bootcamp | Data Science Dojo
Play Video
David Martins | Large Language Models Bootcamp | Data Science Dojo
Play Video
Ed Wiley | Large Language Models Bootcamp | Data Science Dojo
Play Video

Earn a Verified Certificate of Completion

In association with
UNM's continuing education | Data Science Dojo

Earn a Large Language Models certificate in association with the University of New Mexico Continuing Education, verifying your skills. Step into the market with a proven and trusted skillset.

Course Syllabus

DAY 1 - LLM Fundamentals

Understanding the LLM Ecosystem

In this module, we will understand the common use cases of large language models and the fundamental building blocks of such applications. Learners will be introduced to the following topics:

  • Large language models and foundation models
  • Prompts and prompt engineering
  • Context window and token limits
  • Embeddings and vector databases
  • Build custom LLM applications by:
    • Training a new model from scratch
    • Fine-tuning foundation LLMs
    • In-context learning
  • Canonical architecture for an end-to-end LLM application

Adoption Challenges and Risks

In this module, we will explore the primary challenges and risks associated with adopting generative AI technologies. Learners will be introduced to the following topics at a very high level without going into the technical details:

  • Misaligned behavior of AI systems 
  • Handling complex datasets 
  • Limitations due to context length 
  • Managing cost and latency 
  • Addressing prompt brittleness 
  • Ensuring security in AI applications 
  • Achieving reproducibility
  • Evaluating AI performance and outcomes

Evolution of Embedding

In this module, we will be reviewing how embeddings have evolved from the simplest one-hot encoding approach to more recent semantic embedding approaches. The module will go over the following topics:

  • Review of classical techniques
    • Review of binary/one-hot, bag-of-words (BoW) and TF-IDF techniques for vectorization
    • Capturing local context with n-grams and challenges
    • Semantic Encoding Techniques
      • Overview of Word2Vec and dense word embeddings
      • Application of Word2Vec in text analytics and NLP tasks
    • Text Embeddings
      • Word and sentence embeddings
    • Text similarity measures
      • Dot product, Cosine similarity, Inner product
  • Hands-on Exercise
    • Creating TF-IDF embeddings on a document corpus
    • Calculating similarity between sentences using cosine similarity and dot product

DAY 2 - Vector Databases and Prompt Engineering

Attention Mechanism and Transformers

Dive into the world of large language models, discovering the potent mix of text embeddings, attention mechanisms, and the game-changing transformer model architecture.

  • Attention mechanism and transformer models
    • Encoder decoder
    • Transformer networks: tokenization, embedding, positional encoding and transformers block
    • Attention mechanism
    • Self-Attention
    • Multi-head Attention
    • Transformer models
  • Supplementary hands-on exercises

    • Understanding attention mechanisms and attention scoring functions

Vector Databases

Learn about efficient vector storage and retrieval with vector database, indexing techniques, retrieval methods, and hands-on exercises.

  • Overview
    • Rationale for vector databases
    • Importance of vector databases in LLMs
    • Popular vector databases
  • Different types of search
    • Vector search, text search, hybrid search
  • Indexing techniques
    • Product Quantization (PQ), Locality Sensitive Hashing (LSH) and Hierarchical Navigable Small World (HNSW)
  • Retrieval techniques
    • Cosine Similarity, Nearest Neighbor Search
  • Advanced Retrieval Augmented Generation techniques
    • Limitations of embeddings and similarity in semantic search
    • Query transformation for better retrieval
    • Relevance scoring in hybrid search using Reciprocal Rank Fusion (RRF)
    • Using auto-cut feature to remove irrelevant results dynamically
    • Improving search relevance by using language understanding to re-rank search results
  • Challenges using vector databases in production
    • Scaling optimization
    • Reliability optimization
    • Cost optimization
  • Hands-on Exercise
    • Learn how to perform similarity searches with vectors as input.
    • Learn how to perform queries using vector similarity searches with embedding models and vectors.
    • Learn how to combine the results of a vector search and a keyword (BM25F) search using hybrid search approach.
    • Learn how to use multi-tenancy features for the efficient and secure management of data across multiple users or tenants.
    • Learn how to compress vectors using product quantization to reduce memory footprint.

Semantic Search

Understand how semantic search overcomes the fundamental limitation in lexical search i.e. lack of semantics . Learn how to use embeddings and similarity in order to build a semantic search model.

  • Understanding and Implementing Semantic Search
    • Introduction and importance of semantic search
    • Distinguishing semantic search from lexical search
    • Semantic search using text embeddings
  • Exploring Advanced Concepts and Techniques in Semantic Search
    • Multilingual search
    • Limitations of embeddings and similarity in semantic search
    • Improving semantic search beyond embeddings and similarity

DAY 3 - Fine-tuning LLMs

Prompt Engineering

Unleash your creativity and efficiency with prompt engineering. Seamlessly prompt models, control outputs, and generate captivating content across various domains and tasks.

  • Prompt Design and Engineering
    • Crafting Instructions for Effective Prompting
    • Utilizing Examples to Guide Model Behavior
  • Innovative Use Case Development
    • Tailoring Prompts to Goals, Tasks, and Domains
    • Practical Examples:
      • Summarizing Complex Reports
      • Extracting Sentiment and Key Topics from Texts
  • Understanding and Mitigating Prompt Engineering Risks
    • Identifying Common Risks: Prompt Injection, Prompt Leaking, Jailbreaking
    • Best Practices for Secure Prompt Engineering
  • Advanced Prompting Techniques
    • Enhancing Performance with Few-Shot and Chain-of-Thought (CoT) Prompting
    • Exploring Program-aided Language Models (PAL) and ReAct Methods

LLM Fine Tuning

In-depth discussion on fine-tuning of large language models through theoretical discussions, exploring rationale, limitations, and Parameter Efficient Fine Tuning.

  • Fine Tuning Foundation LLMs
    • Transfer learning and Fine-tuning
    • Different fine-tuning techniques
    • Limitations for fine-tuning
    • Parameter-efficient fine-tuning in depth.
      * Quantization of LLMs
      * Low-Rank Adaptation (LoRA) and QLoRA
    • Fine-tuning vs. RAG: When to use one or the other. Risks and limitations.
  • Hands-on Exercise:
    • In-Class: Instruction fine-tuning, deploying, and evaluating a LLaMA2-7B 4-bit quantized model

LangChain

Build LLM Apps using LangChain. Learn about LangChain's key components such as models, prompts, parsers, memory, chains, and Question-Answering.

  • Introduction to LangChain:
    • Why do we need an orchestration tool for LLM application development?
    • What is LangChain?
    • Different components of LangChain
  • Why are orchestration frameworks needed?
    • Eliminate the need for foundation model retraining
    • Overcoming token limits
    • Connecters for data sources
  • Interface with any LLM using model I/O
    • Model I/O overview
    • Components of model I/O: Language models, chat models, prompts, example selectors, and output parsers
    • Overview of prompts, prompt templates, and example selectors
    • Different types of models: language, chat, and embedding models
    • Structuring language model responses using various types of output parsers
  • Connecting external data with LLM application with retrieval
    • Retrieval overview
    • The rationale for the requirement of retrieval and how does it work with LangChain
    • Components of retrieval: Document loaders, text splitters, vector stores, and retrievers
    • Loading public, private, structured, and unstructured data with document loaders
    • Transforming documents to fewer chunks and extracting metadata using document transformers
    • Embedding and vector stores for converting documents into vectors and for efficient storage and retrieval
    • Optimizing retrieval using different retrieval techniques available in LangChain
  • Creating complex LLM workflows with chains
    • Chains overview
    • Various foundational chain types: LLM, router, sequential, and transformation
    • Summarizing large documents using different document chains like stuff, refine, and map-reduce
  • Retain context and refer to past interactions with the memory component
    • How memory can empower AI applications
    • Different types of memories: simple buffer memory, conversation summarization, vector-store-backed-memory
    • Overcoming token limit by using memory based on summarization of past conversations
    • Utilize vector stores for memory
  • Dynamic decision-making with LLMs using agents
    • Agents overview
    • Components of agents: Tools, toolkits, prompt, and memory
    • Different types of agents: Self-ask with search, ReAct, JSON chat, structured chat
    • Working with agents using LangGraph
  • Monitoring and logging using callbacks
    • Monitoring LLM application using callbacks
    • Understanding how callbacks work with different events
  • Hands-on exercise
    • Interface with any LLM using model I/O
    • Building RAG application with retrieval
    • Creating complex LLM workflows with chains
    • Adding memory to LLM-based application
    • Harnessing dynamic decision-making using agents
    • Supplementary exercises; many coding exercises on LangChain components model I/O, memory, chains, memory, and agents

DAY 4 - LangChain for LLM Application Development

Multi-Agent Applications

Use LLMs to make decisions about what to do next. Enable these decisions with tools. In this module, we’ll talk about agents. We’ll learn what they are, how they work, and how to use them within the LangChain library to superpower our LLMs.

  • Agents and Tools
  • Agent Types
    • Conversational agents
    • OpenAI functions agents
    • ReAct agents
    • Plan and execute agents
  • Hands-on Exercise: Create and execute some of the following agents
    • Excel agent
    • JSON agent
    • Python Pandas agent
    • Document comparison agent
    • Power BI agent

Advanced RAG

In this module, we'll explore the challenges in developing RAG-based enterprise-level Large Language Model (LLM) applications. We will discuss the following:

  • Basic RAG pipeline. Limitations of naïve approach
  • Indexing
    • Chunking size optimization
    • Embedding Models
  • Querying - Challenges
    • Large Document Slices
    • Query Ambiguity
  • Query - Optimizations
    • Multi-Query Retrieval
    • Multi-Step Retrieval
    • Step-Back Prompting
    • Query Transformations
  • Retrieval - Challenges
    • Inefficient Retrieval of Large Documents
    • Lack of Conversation Context
    • Complex Retrieval from Multiple Sources
  • Retrieval - Optimizations
    • Hybrid Search and Meta-data integration
    • Sentence window retrieval
    • Parent-child chunk retrieval
    • Hierarchical Index Retrieval
    • Hypothetical Document embeddings (HyDE)
  • Generation - Challenges
    • Information Overload
    • Insufficient Context Window
    • Chaotic Contexts
    • Hallucination
    • Inaccurate Responses
  • Generation - Optimizations
    • Information Compression
    • Thread of Thought (ThoT)
    • Generator Fine-tuning
    • Adapter methods
    • Chain of Note (CoN)
    • Expert Prompting
  • Access control and governance

LLM Evaluation

Dive into large language model (LLM) evaluation, examining its importance, common issues, benchmark datasets, and key metrics such as BLEU, ROUGE, and RAGAs, and apply these insights through a hands-on summarization exercise.

  • Introduction to LLM evaluation
    • What is evaluation and why is it important for LLMs?
    • Overview of common mistakes made by LLMs
    • A brief introduction to benchmark datasets and metrics
    • Common LLM evaluation tasks
  • Benchmark datasets
    • Explore datasets for different tasks including natural language understanding, reasoning, knowledge retrieval, etc.
    • Learn about different datasets such as MMLU, HELM, and BBH.
  • Evaluation metrics
    • Explain commonly used automatic metrics (BLEU, ROUGE, BERTScore)
    • Compare strengths and weaknesses of different metrics
    • Discuss the role of human evaluation and techniques (Likert scale)
  • RAGAS
    • Introduction and basic workflow
    • Evaluation metrics
      • Faithfulness
      • Context precision
      • Answer relevancy
      • Context recall
    • Detailed workflow stages
    • Practical Applications
      • Summarization
      • Open-domain QA
      • Fact-checking
  • Hands-on exercise
    • Evaluating LLMs summarization using metrics like ROUGE, METEOR, and Bertscore
    • Evaluation using G-Eval
    • Evaluation of end-to-end RAG pipeline with RAGAs

DAY 5 - Project: Build A Custom LLM Application On Your Own Data

LLM Bootcamp Project: Build A Multi-Agent LLM Application

On the last day of the LLM bootcamp, the learners will apply the concepts and techniques learned during the bootcamp to build an LLM application. Learners will choose to implement one of the following:

  • Basic Chatbot: A simple chatbot designed to answer general queries.
  • Chatbot Agent: An advanced agent that integrates with your data to provide more tailored responses.
  • Chat with Your Data: Allows users to upload documents (e.g., PDFs) and interact with the content through queries.

Attendees will receive the following:

  • Comprehensive Datasets: Access a vast collection of documents from a variety of industries to support your project's data needs and ensure robust functionality.
  • Step-by-Step Implementation Guides: Detailed instructions that guide you through each phase of your project, from initial setup to final deployment.
  • Ready-to-Use Code Templates: Utilize code templates available in Data Science Dojo's sandbox environments to streamline the development process and get your application up and running quickly.
  • Cloud-Based Resources: Gain exclusive access to powerful cloud resources, including your own OpenAI key, facilitating the hassle-free deployment of your application on platforms like Streamlit.

At the culmination of the bootcamp, you will have a fully operational LLM application deployed on a public cloud platform, such as Streamlit. This deployment process includes setting up a continuous integration and continuous deployment (CI/CD) pipeline to ensure that your application can be updated and maintained effortlessly. By the end of the bootcamp, you'll be equipped not only with a finished project but also with the knowledge and skills to deploy and scale applications in real-world scenarios.

Course Schedule

Daily schedule: 9 am - 5 pm PT | Breakfast, lunch and beverages | Breakout sessions and in-class activities