For a hands-on learning experience to develop LLM applications, join our LLM Bootcamp today.
First 6 seats get an early bird discount of 30%! So hurry up!

In the rapidly evolving world of artificial intelligence and large language models, developers are constantly seeking ways to create more flexible, powerful, and intuitive AI agents.

While LangChain has been a game-changer in this space, allowing for the creation of complex chains and agents, there’s been a growing need for even more sophisticated control over agent runtimes.

Enter LangGraph, a cutting-edge module built on top of LangChain that’s set to revolutionize how we design and implement AI workflows.

In this blog, we present a detailed LangGraph tutorial on building a chatbot, revolutionizing AI agent workflows.

 

llm bootcamp banner

 

 Understanding LangGraph

LangGraph is an extension of the LangChain ecosystem that introduces a novel approach to creating AI agent runtimes. At its core, LangGraph allows developers to represent complex workflows as cyclical graphs, providing a more intuitive and flexible way to design agent behaviors.

The primary motivation behind LangGraph is to address the limitations of traditional directed acyclic graphs (DAGs) in representing AI workflows. While DAGs are excellent for linear processes, they fall short when it comes to implementing the kind of iterative, decision-based flows that advanced AI agents often require.

 

Explore the difference between LangChain and LlamaIndex

 

LangGraph solves this by enabling the creation of workflows with cycles, where an AI can revisit previous steps, make decisions, and adapt its behavior based on intermediate results. This is particularly useful in scenarios where an agent might need to refine its approach or gather additional information before proceeding.

Key Components of LangGraph

To effectively use LangGraph, it’s crucial to understand its fundamental components:

 

LangChain tutorial

 

Nodes

Nodes in LangGraph represent individual functions or tools that your AI agent can use. These can be anything from API calls to complex reasoning tasks performed by language models. Each node is a discrete step in your workflow that processes input and produces output.

Edges

Edges connect the nodes in your graph, defining the flow of information and control. LangGraph supports two types of edges: 

  • Simple Edges: These are straightforward connections between nodes, indicating that the output of one node should be passed as input to the next. 
  • Conditional Edges: These are more complex connections that allow for dynamic routing based on the output of a node. This is where LangGraph truly shines, enabling adaptive workflows.

 

Read about LangChain agents and their use for time series analysis

 

State

State is the information that can be passed between nodes in a whole graph. If you want to keep track of specific information during the workflow then you can use state. 

There are 2 types of graphs which you can make in LangGraph: 

  • Basic Graph: The basic graph will only pass the output of the first node to the next node because it can’t contain states. 
  • Stateful Graph: This graph can contain a state which will be passed between nodes and you can access this state at any node.

 

How generative AI and LLMs work

 

LangGraph Tutorial Using a  Simple Example: Build a Basic Chatbot

We’ll create a simple chatbot using LangGraph. This chatbot will respond directly to user messages. Though simple, it will illustrate the core concepts of building with LangGraph. By the end of this section, you will have a built rudimentary chatbot.

Start by creating a StateGraph. A StateGraph object defines the structure of our chatbot as a state machine. We’ll add nodes to represent the LLM and functions our chatbot can call and edges to specify how the bot should transition between these functions.

 

Explore this guide to building LLM chatbots

 

 

 

So now our graph knows two things: 

  1. Every node we define will receive the current State as input and return a value that updates that state. 
  2. messages will be appended to the current list, rather than directly overwritten. This is communicated via the prebuilt add_messages function in the Annotated syntax. 

Next, add a chatbot node. Nodes represent units of work. They are typically regular Python functions.

 

 

Notice how the chatbot node function takes the current State as input and returns a dictionary containing an updated messages list under the key “messages”. This is the basic pattern for all LangGraph node functions. 

The add_messages function in our State will append the LLM’s response messages to whatever messages are already in the state. 

Next, add an entry point. This tells our graph where to start its work each time we run it.

 

 

Similarly, set a finish point. This instructs the graph “Any time this node is run, you can exit.”

 

 

Finally, we’ll want to be able to run our graph. To do so, call “compile()” on the graph builder. This creates a “CompiledGraph” we can use invoke on our state.

 

 
You can visualize the graph using the get_graph method and one of the “draw” methods, like draw_ascii or draw_png. The draw methods each require additional dependencies.

 

 

LangGraph - AI agent workflows

 

Now let’s run the chatbot!

Tip: You can exit the chat loop at any time by typing “quit”, “exit”, or “q”.

 

 

Advanced LangGraph Techniques

LangGraph’s true potential is realized when dealing with more complex scenarios. Here are some advanced techniques: 

  1. Multi-step reasoning: Create graphs where the AI can make multiple decisions, backtrack, or explore different paths based on intermediate results.
  2. Tool integration: Seamlessly incorporate various external tools and APIs into your workflow, allowing the AI to gather and process diverse information.
  3. Human-in-the-loop workflows: Design graphs that can pause execution and wait for human input at critical decision points.
  4. Dynamic graph modification: Alter the structure of the graph at runtime based on the AI’s decisions or external factors.

 

Learn how to build custom Q&A chatbots

 

Real-World Applications

LangGraph’s flexibility makes it suitable for a wide range of applications: 

  1. Customer Service Bots: Create intelligent chatbots that can handle complex queries, access multiple knowledge bases, and escalate to human operators when necessary.
  2. Research Assistants: Develop AI agents that can perform literature reviews, synthesize information from multiple sources, and generate comprehensive reports.
  3. Automated Troubleshooting: Build expert systems that can diagnose and solve technical problems by following complex decision trees and accessing various diagnostic tools.
  4. Content Creation Pipelines: Design workflows for AI-assisted content creation, including research, writing, editing, and publishing steps.

 

Explore the list of top AI content generators

 

Conclusion

LangGraph represents a significant leap forward in the design and implementation of AI agent workflows. Enabling cyclical, state-aware graphs, opens up new possibilities for creating more intelligent, adaptive, and powerful AI systems.

As the field of AI continues to evolve, tools like LangGraph will play a crucial role in shaping the next generation of AI applications.

 

Explore a hands-on curriculum that helps you build custom LLM applications!

 

Whether you’re building simple chatbots or complex AI-powered systems, LangGraph provides the flexibility and power to bring your ideas to life. As we continue to explore the potential of this tool, we can expect to see even more innovative and sophisticated AI applications emerging in the near future.

To boost your knowledge further, join our webinar and learn how to Build Smarter Multi-Agent AI Applications with LangGraph.

LangGraph 101 event banner - langgraph tutorial

Transformers have revolutionized natural language processing with their use of self-attention mechanisms. In this post, we will study the key components of transformers to understand how they have become the basis of the state of the art in different tasks.  

 

Introduction: Attention is all you need 

The Transformer architecture was first introduced in the 2017 paper “Attention is All You Need” by researchers at Google. Unlike previous sequence models such as RNNs, Transformer relies entirely on self-attention to model dependencies in sequential data like text.   

 

Large language models knowledge test

 

Remarkably, this simple change led to major improvements in machine translation quality over existing methods. Since then, Transformers have been applied successfully to diverse NLP tasks like text generation, summarization, and question-answering. Their versatility has even led to applications in computer vision 

 

Large language model bootcamp

 

But what exactly is self-attention and why is it so effective? Let’s explore this. 

The limitations of Recurrent Neural Networks – RNNs   

Recurrent neural networks (RNNs) used to be the dominant approach for modeling sequences. An RNN processes textual data incrementally, maintaining a “memory” of the previous context. For example, to predict the next word in a sentence, an RNN model would incorporate information about all the preceding words.  

However, RNNs have certain limitations. They process data sequentially, making parallelization difficult. More critically, they struggle to learn long-range dependencies because the information gets diluted over many steps of time. Attention mechanisms were proposed to mitigate this issue. 

Learn to build LLM applications

Why use a transformer model?  

The transformer architecture has enabled the development of new models that can be trained on large datasets and significantly outperform recurrent neural networks like LSTMs. These new models are utilized for tasks like sequence classification, question answering, language modeling, named entity recognition, summarization, and translation.  

Let’s examine the key components of transformers to understand how they have become the foundation for state-of-the-art performance on different NLP tasks.  

Transformer design  

A transformer consists of an encoder and a decoder. The encoder’s role is to encode the inputs (i.e. sentences) into a state, often containing multiple tensors. This state is then passed to the decoder to generate the outputs.

In machine translation, the encoder converts a source sentence, e.g. “Hello world“, into a state, such as a vector, that captures its semantic meaning.

The decoder then utilizes this state to produce the translated target sentence, e.g. “Bonjour le monde.” Both the encoder and decoder primarily employ Multi-Head Attention and Feedforward Networks, which are the focus of this article.   

 

Transformer model architecture

Key transformer components  

1. Input embedding  

Embedding aims to create a vector representation of words where words with similar meanings will be close in terms of Euclidean distance. For instance, the words “bathroom” and “shower” are related to the same concept, so their word vectors are close in Euclidean space as they convey similar meanings.  

For the encoder, the authors opted for an embedding size of 512 (i.e. each word is represented by a 512-dimensional vector).  

  

  Input embedding

 

2. Positional encoding  

The position of a word plays a crucial role in understanding the sequence we want to model.  

Therefore, we add positional information about the word’s location in the sequence to its vector. The authors used the following sinusoidal. 

Position encoding   

 

We will explain positional encoding in more detail with an example.  

  Position encoding example

  

We note the position of each word in the sequence.  

We define dmodel = 512, which represents the size of the embedding vector of each word (i.e. the vector dimension). We can now rewrite the two positional encoding equations as:  

 

two positional encoding equations

 


We can see that the wavelength (i.e. frequency) λt decreases as the dimension increases, this forms a progression along the wave from 2pi to 10000.2pi.  

  

  wavelength

 

In this model, the absolute positional information of a word in a sequence is added directly to its initial vector. For this, the positional encoding must have the same size dmodel as the initial word vector.  


3.
Attention mechanism  

  • Scaled Dot-Product Attention  

  Scaled Dot-Product Attention

  

Let’s explain the attention mechanism. The key goal of attention is to estimate the relative relevance of the keywords compared to the query word for the same entity. For this, the attention mechanism takes a query vector Q representing a word, the keys K comprising all other words in the sentence, and values V representing the word vectors.  

In our case, V = Q (for the two self-attention layers). In other words, the attention mechanism provides the significance of a word in a given sentence.  

 

attention mechanism

  

When we compute the normalized dot product between the query and the keys, we get a tensor that represents the relative importance of each other word for the query. To go deeper into mathematics, we can try to understand why the authors used a dot product to calculate the relation between two words.  

 

Get Started with Generative AI                                    

 

A word is represented by a vector in Euclidian space, in this case, a vector of size 512.   

When computing the dot product between Q and KT, we calculate the product between Q’s orthogonal projection onto K. In other words, we estimate the alignment between the query and keyword vectors, returning a weight for each word in the sentence.  

We then normalize by dk to counteract large Q and K magnitudes which can push the softmax function into regions with tiny gradients. The softmax function regularizes the terms and rescales them between 0 and 1 (i.e., converts the dot product to a probability distribution), with the goal of normalizing all weights between 0 and 1.  

  softmax function

Finally, we multiply the weights (i.e., importance) by the values V to reduce irrelevant words and focus on the most significant words.  

    

Attention mechanism (2)

 

 

  • Multi-Head Attention  

  Multi-Head Attention

 

The key idea is that attention is applied multiple times in parallel on different projections of the input queries, keys, and values. This allows the model to learn different types of dependencies between the input words.  

  

The input queries (Q), keys (K), and values (V) are each linearly projected h times into smaller subspaces. For example, h=8 times into 64-dimensional spaces.  

Attention is then applied in each of these h projected subspaces in parallel, yielding h different attention outputs.  

 

Attention mechanism and transformers - LLM
Attention mechanism and transformers – LLM Bootcamp Data Science Dojo

 

These h outputs are concatenated and linearly projected again to get the final values. The projections allow the model to focus on different positional and semantic relationships between words since each projected subspace captures different information.  

Doing this in parallel (multi-head) instead of sequentially improves efficiency.  

The projection matrices are learned during training to discover the most useful projections. So, in summary, multi-head attention applies the attention mechanism in multiple parallel subspaces to learn different types of dependencies between words in an efficient way.  

  

Let’s dive into the mechanics of encoder-decoder architecture.  

Transformer model architecture   

 

In this section, we’ll explain how the encoder and decoder work together to translate an English sentence into a French one, step by step.  

1. Encoder  

Encoder

  • Convert a sequence of tokens to a sequence of vectors by using embeddings.    

Positional encoding

 

 

  • Add position information in each word vector.  

 

The key advantage of recurrent neural networks is their knack for understanding relationships between sequences and remembering information. On the other hand, Transformers employ positional encoding to factor in where words are in a sequence.  

  • Apply Multi-Head Attention  

Apply Multi-Head Attention   

  • Use Feed Forward Network  

 

2. Decoder  

  • Utilize embeddings to transform a French sentence into vectors.   

  decoder French

 

  • Add positional details within each word vector.    

Positional encoding French   

  • Apply Multi-Head Attention  

  Apply Multi-Head Attention French

 

  • Apply Feed Forward Network  

 

  • Apply Multi-Head Attention to the encoder output.  

Multi-Head Attention - encoder output
 

We can observe that the Transformer combines the encoder’s output with the decoder’s input. This enables it to discern the relationship between the vectors that encode the English and French sentences.  

  • Apply the Feed Forward Network again.  
  • Compute the probability for the next word by using linear + SoftMax block. The decoder returns the highest probability as the next word at the output.  

  Linear and SoftMax block

In our case, the next word after “Je” is “suis” 

 

Final thoughts 

The transformer model outperforms all the models on different benchmarks also there was no difference seen between the translation provided by the algorithm and by humans.   

Transformers are a major advance in NLP, they exceed RNN by having a lower training cost allowing to train models on larger corpora. Even today, transformers remain the basis of state-of-the-art models such as BERT, Roberta, XLNET, and GPT.  

 

 

References: 

https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf 

https://github.com/hkproj/transformer-from-scratch-notes 

http://jalammar.github.io/illustrated-transformer/ 

Large language models (LLMs) like GPT-3 and GPT-4. revolutionized the landscape of NLP. These models have laid a strong foundation for creating powerful, scalable applications. However, the potential of these models isaffected by the quality of the prompt. This highlights the importance of prompt engineering.

Furthermore, real-world NLP applications often require more complexity than a single ChatGPT session can provide. This is where LangChain comes into play! 

 

 

Get more information on Large Language models and its applications and tools by clicking below:


Large language model bootcamp

 

Harrison Chase’s brainchild, LangChain, is a Python library designed to help you leverage the power of LLMs to build custom NLP applications. As of May 2023, this game-changing library has already garnered almost 40,000 stars on GitHub. 

LangChain

 

Interested in learning about Large Language Models and building custom ChatGPT like applications for your business? Click below

Learn More                  

 

This comprehensive beginner’s guide provides a thorough introduction to LangChain, offering a detailed exploration of its core features. It walks you through the process of building a basic application using LangChain and shares valuable tips and industry best practices to make the most of this powerful framework. Whether you’re new to Language Learning Models (LLMs) or looking for a more efficient way to develop language generation applications, this guide serves as a valuable resource to help you leverage the capabilities of LLMs with LangChain. 

Overview of LangChain modules 

These modules are essential for any application using the Language Model (LLM).

 

LangChain offers standardized and adaptable interfaces for each module. Additionally, LangChain provides external integrations and even ready-made implementations for seamless usage. Let’s delve deeper into these modules. 

Overview of LangChain Modules
Overview of LangChain Modules

LLM

LLM is the fundamental component of LangChain. It is essentially a wrapper around a large language model that helps use the functionality and capability of a specific large language model. 

Chains

As stated earlier, LLM (Language Model) serves as the fundamental unit within LangChain. However, in line with the “LangChain” concept, it offers the ability to link together multiple LLM calls to address specific objectives. 

For instance, you may have a need to retrieve data from a specific URL, summarize the retrieved text, and utilize the resulting summary to answer questions. 

On the other hand, chains can also be simpler in nature. For instance, you might want to gather user input, construct a prompt using that input, and generate a response based on the constructed prompt. 

 

Large language model bootcamp

 

Prompts 

Prompts have become a popular modeling approach in programming. It simplifies prompt creation and management with specialized classes and functions, including the essential PromptTemplate. 

 

Document loaders and Utils 

LangChain’s Document Loaders and Utils modules simplify data access and computation. Document loaders convert diverse data sources into text for processing, while the utils module offers interactive system sessions and code snippets for mathematical computations. 

Vector stores 

The widely used index type involves generating numerical embeddings for each document using an embedding model. These embeddings, along with the associated documents, are stored in a vector store. This vector store enables efficient retrieval of relevant documents based on their embeddings. 

Agents

LangChain offers a flexible approach for tasks where the sequence of language model calls is not deterministic. Its “Agents” can act based on user input and previous responses. The library also integrates with vector databases and has memory capabilities to retain the state between calls, enabling more advanced interactions. 

 

Building our App 

Now that we’ve gained an understanding of LangChain, let’s build a PDF Q/A Bot app using LangChain and OpenAI. Let me first show you the architecture diagram for our app and then we will start with our app creation. 

 

QA Chatbot Architecture
QA Chatbot Architecture

 

Below is an example code that demonstrates the architecture of a PDF Q&A chatbot. This code utilizes the OpenAI language model for natural language processing, the FAISS database for efficient similarity search, PyPDF2 for reading PDF files, and Streamlit for creating a web application interface.

 

The chatbot leverages LangChain’s Conversational Retrieval Chain to find the most relevant answer from a document based on the user’s question. This integrated setup enables an interactive and accurate question-answering experience for the users. 

Importing necessary libraries 

Import Statements: These lines import the necessary libraries and functions required to run the application. 

  • PyPDF2: Python library used to read and manipulate PDF files. 
  • langchain: a framework for developing applications powered by language models. 
  • streamlit: A Python library used to create web applications quickly. 
Importing necessary libraries
Importing necessary libraries

If the LangChain and OpenAI are not installed already, you first need to run the following commands in the terminal. 

Install LangChain

 

Setting openAI API key 

You will replace the placeholder with your OpenAI API key which you can access from OpenAI API. The above line sets the OpenAI API key, which you need to use OpenAI’s language models. 

Setting OpenAI API Key

Streamlit UI 

These lines of code create the web interface using Streamlit. The user is prompted to upload a PDF file.

Streamlit UI
Streamlit UI

Reading the PDF file 

If a file has been uploaded, this block reads the PDF file, extracts the text from each page, and concatenates it into a single string. 

Reading the PDF File
Reading the PDF File

Text splitting 

Language Models are often limited by the amount of text that you can pass to them. Therefore, it is necessary to split them up into smaller chunks. It provides several utilities for doing so. 

Text Splitting 
Text Splitting

Using a Text Splitter can also help improve the results from vector store searches, as eg. smaller chunks may sometimes be more likely to match a query. Here we are splitting the text into 1k tokens with 200 tokens overlap. 

Embeddings 

Here, the OpenAIEmbeddings function is used to download embeddings, which are vector representations of the text data. These embeddings are then used with FAISS to create an efficient search index from the chunks of text.  

Embeddings
Embeddings

Creating conversational retrieval chain 

The chains developed are modular components that can be easily reused and connected. They consist of predefined sequences of actions encapsulated in a single line of code. With these chains, there’s no need to explicitly call the GPT model or define prompt properties. This specific chain allows you to engage in conversation while referencing documents and retains a history of interactions. 

Creating Conversational Retrieval Chain
Creating Conversational Retrieval Chain

Streamlit for generating responses and displaying in the App 

This block prepares a response that includes the generated answer and the source documents and displays it on the web interface. 

Streamlit for Generating Responses and Displaying in the App
Streamlit for Generating Responses and Displaying in the App

Let’s run our App 

QA Chatbot
QA Chatbot

Here we uploaded a PDF, asked a question, and got our required answer with the source document. See, that is how the magic of LangChain works.  

You can find the code for this app on my GitHub repository LangChain-Custom-PDF-Chatbot.

Build your own conversational AI applications 

Concluding the journey! Mastering LangChain for creating a basic Q&A application has been a success. I trust you have acquired a fundamental comprehension of LangChain’s potential. Now, take the initiative to delve into LangChain further and construct even more captivating applications. Enjoy the coding adventure.

 

Learn to build custom large language model applications today!