Large language models (LLMs) have taken the world by storm with their ability to understand and generate human-like text. These AI marvels can analyze massive amounts of data, answer your questions in comprehensive detail, and even create different creative text formats, like poems, code, scripts, musical pieces, emails, letters, etc.

It’s like having a conversation with a computer that feels almost like talking to a real person!

However, LLMs on their own exist within a self-contained world of text. They can’t directly interact with external systems or perform actions in the real world. This is where LLM agents come in and play a transformative role.


LLM agents act as powerful intermediaries, bridging the gap between the LLM’s internal world and the vast external world of data and applications. They essentially empower LLMs to become more versatile and take action on their behalf. Think of an LLM agent as a personal assistant for your LLM, fetching information and completing tasks based on your instructions.

For instance, you might ask an LLM, “What are the next available flights to New York from Toronto?” The LLM can access and process information but cannot directly search the web – it is reliant on its training data.

An LLM agent can step in, retrieve the data from a website, and provide the available list of flights to the LLM. The LLM can then present you with the answer in a clear and concise way.


Role of LLM agents at a glance
Role of LLM agents at a glance


By combining LLMs with agents, we unlock a new level of capability and versatility. In the following sections, we’ll dive deeper into the benefits of using LLM agents and explore how they are revolutionizing various applications.

Benefits and Use-cases of LLM Agents

Let’s explore in detail the transformative benefits of LLM agents and how they empower LLMs to become even more powerful.

Enhanced Functionality: Beyond Text Processing

LLMs excel at understanding and manipulating text, but they lack the ability to directly access and interact with external systems. An LLM agent bridges this gap by allowing the LLM to leverage external tools and data sources.

Imagine you ask an LLM, “What is the weather forecast for Seattle this weekend?” The LLM can understand the question but cannot directly access weather data. An LLM agent can step in, retrieve the forecast from a weather API, and provide the LLM with the information it needs to respond accurately.

This empowers LLMs to perform tasks that were previously impossible, like: 

  • Accessing and processing data from databases and APIs 
  • Executing code 
  • Interacting with web services 

Increased Versatility: A Wider Range of Applications

By unlocking the ability to interact with the external world, LLM agents significantly expand the range of applications for LLMs. Here are just a few examples: 

  • Data Analysis and Processing: LLMs can be used to analyze data from various sources, such as financial reports, social media posts, and scientific papers. LLM agents can help them extract key insights, identify trends, and answer complex questions. 
  • Content Generation and Automation: LLMs can be empowered to create different kinds of content, like articles, social media posts, or marketing copy. LLM agents can assist them by searching for relevant information, gathering data, and ensuring factual accuracy. 
  • Custom Tools and Applications: Developers can leverage LLM agents to build custom tools that combine the power of LLMs with external functionalities. Imagine a tool that allows an LLM to write and execute Python code, search for information online, and generate creative text formats based on user input. 


Improved Performance: Context and Information for Better Answers

LLM agents don’t just expand what LLMs can do, they also improve how they do it. By providing LLMs with access to relevant context and information, LLM agents can significantly enhance the quality of their responses: 

  • More Accurate Responses: When an LLM agent retrieves data from external sources, the LLM can generate more accurate and informative answers to user queries. 
  • Enhanced Reasoning: LLM agents can facilitate a back-and-forth exchange between the LLM and external systems, allowing the LLM to reason through problems and arrive at well-supported conclusions. 
  • Reduced Bias: By incorporating information from diverse sources, LLM agents can mitigate potential biases present in the LLM’s training data, leading to fairer and more objective responses. 

Enhanced Efficiency: Automating Tasks and Saving Time

LLM agents can automate repetitive tasks that would otherwise require human intervention. This frees up human experts to focus on more complex problems and strategic initiatives. Here are some examples: 

  • Data Extraction and Summarization: LLM agents can automatically extract relevant data from documents and reports, saving users time and effort. 
  • Research and Information Gathering: LLM agents can be used to search for information online, compile relevant data points, and present them to the LLM for analysis. 
  • Content Creation Workflows: LLM agents can streamline content creation workflows by automating tasks like data gathering, formatting, and initial drafts. 

In conclusion, LLM agents are a game-changer, transforming LLMs from powerful text processors to versatile tools that can interact with the real world. By unlocking enhanced functionality, increased versatility, improved performance, and enhanced efficiency, LLM agents pave the way for a new wave of innovative applications across various domains.

In the next section, we’ll explore how LangChain, a framework for building LLM applications, can be used to implement LLM agents and unlock their full potential.


Overview of an autonomous LLM agent system
Overview of an autonomous LLM agent system


Implementing LLM Agents with LangChain 

Now, let’s explore how LangChain, a framework specifically designed for building LLM applications, empowers us to implement LLM agents. 

What is LangChain?

LangChain is a powerful toolkit that simplifies the process of building and deploying LLM applications. It provides a structured environment where you can connect your LLM with various tools and functionalities, enabling it to perform actions beyond basic text processing. Think of LangChain as a Lego set for building intelligent applications powered by LLMs.



Implementing LLM Agents with LangChain: A Step-by-Step Guide

Let’s break down the process of implementing LLM agents with LangChain into manageable steps: 

Setting Up the Base LLM

The foundation of your LLM agent is the LLM itself. You can either choose an open-source model like Llama2 or Mixtral, or a proprietary model like OpenAI’s GPT or Cohere. 

Defining the Tools

Identify the external functionalities your LLM agent will need. These tools could be: 

  • APIs: Services that provide programmatic access to data or functionalities (e.g., weather API, stock market API) 
  • Databases: Collections of structured data your LLM can access and query (e.g., customer database, product database) 
  • Web Search Tools: Tools that allow your LLM to search the web for relevant information (e.g., duckduckgo, serper API) 
  • Coding Tools: Tools that allow your LLM to write and execute actual code (e.g., Python REPL Tool)


Defining the tools of an AI-powered LLM agent
Defining the tools of an AI-powered LLM agent


You can check out LangChain’s documentation to find a comprehensive list of tools and toolkits provided by LangChain that you can easily integrate into your agent, or you can easily define your own custom tool such as a calculator tool.

Creating an Agent

This is the brain of your LLM agent, responsible for communication and coordination. The agent understands the user’s needs, selects the appropriate tool based on the task, and interprets the retrieved information for response generation. 

Defining the Interaction Flow

Establish a clear sequence for how the LLM, agent, and tools interact. This flow typically involves: 

  • Receiving a user query 
  • The agent analyzes the query and identifies the necessary tools 
  • The agent passes in the relevant parameters to the chosen tool(s) 
  • The LLM processes the retrieved information from the tools
  • The agent formulates a response based on the retrieved information 

Integration with LangChain

LangChain provides the platform for connecting all the components. You’ll integrate your LLM and chosen tools within LangChain, creating an agent that can interact with the external environment. 

Testing and Refining

Once everything is set up, it’s time to test your LLM agent! Put it through various scenarios to ensure it functions as expected. Based on the results, refine the agent’s logic and interactions to improve its accuracy and performance. 

By following these steps and leveraging LangChain’s capabilities, you can build versatile LLM agents that unlock the true potential of LLMs.


LangChain Implementation of an LLM Agent with tools

In the next section, we’ll delve into a practical example, walking you through a Python Notebook that implements a LangChain-based LLM agent with retrieval (RAG) and web search tools. OpenAI’s GPT-4 has been used as the LLM of choice here. This will provide you with a hands-on understanding of the concepts discussed here. 

The agent has been equipped with two tools: 

  1. A retrieval tool that can be used to fetch information from a vector store of Data Science Dojo blogs on the topic of RAG. LangChain’s PyPDFLoader is used to load and chunk the PDF blog text, OpenAI embeddings are used to embed the chunks of data, and Weaviate client is used for indexing and storage of data. 
  1. A web search tool that can be used to query the web and bring up-to-date and relevant search results based on the user’s question. Google Serper API is used here as the search wrapper – you can also use duckduckgo search or Tavily API. 

Below is a diagram depicting the agent flow:


LangChain implementation of an LLM agent with tools
LangChain implementation of an LLM agent with tools


Let’s now start going through the code step-by-step. 

Installing Libraries

Let’s start by downloading all the necessary libraries that we’ll need. This includes libraries for handling language models, API clients, and document processing.


Importing and Setting API Keys

Now, we’ll ensure our environment has access to the necessary API keys for OpenAI and Serper by importing them and setting them as environment variables. 


Documents Preprocessing: Mounting Google Drive and Loading Documents

Let’s connect to Google Drive and load the relevant documents. I‘ve stored PDFs of various Data Science Dojo blogs related to RAG, which we’ll use for our tool. Following are the links to the blogs I have used: 



Extracting Text from PDFs

Using the PyPDFLoader from Langchain, we’ll extract text from each PDF by breaking them down into individual pages. This helps in processing and indexing them separately. 


Embedding and Indexing through Weaviate: Embedding Text Chunks

Now we’ll use Weaviate client to turn our text chunks into embeddings using OpenAI’s embedding model. This prepares our text for efficient querying and retrieval.


Setting Up the Retriever

With our documents embedded, let’s set up the retriever which will be crucial for fetching relevant information based on user queries.


Defining Tools: Retrieval and Search Tools Setup

Next, we define two key tools: one for retrieving information from our indexed blogs, and another for performing web searches for queries that extend beyond our local data.


Adding Tools to the List

We then add both tools to our tool list, ensuring our agent can access these during its operations.


Setting up the Agent: Creating the Prompt Template

Let’s create a prompt template that guides our agent on how to handle different types of queries using the tools we’ve set up. 


Initializing the LLM with GPT-4

For the best performance, I used GPT-4 as the LLM of choice as GPT-3.5 seemed to struggle with routing to tools correctly and would go back and forth between the two tools needlessly.


Creating and Configuring the Agent

With the tools and prompt template ready, let’s construct the agent. This agent will use our predefined LLM and tools to handle user queries.



Invoking the Agent: Agent Response to a RAG-related Query

Let’s put our agent to the test by asking a question about RAG and observing how it uses the tools to generate an answer.


Agent Response to an Unrelated Query

Now, let’s see how our agent handles a question that’s not about RAG. This will demonstrate the utility of our web search tool.



That’s all for the implementation of an LLM Agent through LangChain. You can find the full code here.


This is, of course, a very basic use case but it is a starting point. There is a myriad of stuff you can do using agents and LangChain has several cookbooks that you can check out. The best way to get acquainted with any technology is to actually get your hands dirty and use the technology in some way.

I’d encourage you to look up further tutorials and notebooks using agents and try building something yourself. Why not try delegating a task to an agent that you yourself find irksome – perhaps an agent can take off its burden from your shoulders!

LLM agents: A building block for LLM applications

To sum it up, LLM agents are a crucial element for building LLM applications. As you navigate through the process, make sure to consider the role and assistance they have to offer.


April 29, 2024

 Large language models (LLMs), such as OpenAI’s GPT-4, are swiftly metamorphosing from mere text generators into autonomous, goal-oriented entities displaying intricate reasoning abilities. This crucial shift carries the potential to revolutionize the manner in which humans connect with AI, ushering us into a new frontier.

This blog will break down the working of these agents, illustrating the impact they impart on what is known as the ‘Lang Chain’. 


Working of the agents 

Our exploration into the realm of LLM agents begins with understanding the key elements of their structure, namely the LLM core, the Prompt Recipe, the Interface and Interaction, and Memory. The LLM core forms the fundamental scaffold of an LLM agent. It is a neural network trained on a large dataset, serving as the primary source of the agent’s abilities in text comprehension and generation. 

The functionality of these agents heavily relies on prompt engineering. Prompt recipes are carefully crafted sets of instructions that shape the agent’s behaviors, knowledge, goals, and persona and embed them in prompts. 


langchain agents



The agent’s interaction with the outer world is dictated by its user interface, which could vary from command-line, graphical, to conversational interfaces. In the case of fully autonomous agents, prompts are programmatically received from other systems or agents.

Another crucial aspect of their structure is the inclusion of memory, which can be categorized into short-term and long-term. While the former helps the agent be aware of recent actions and conversation histories, the latter works in conjunction with an external database to recall information from the past. 


Ingredients involved in agent creation 

Creating robust and capable LLM agents demands integrating the core LLM with additional components for knowledge, memory, interfaces, and tools.



The LLM forms the foundation, while three key elements are required to allow these agents to understand instructions, demonstrate essential skills, and collaborate with humans: the underlying LLM architecture itself, effective prompt engineering, and the agent’s interface. 



Tools are functions that an agent can invoke. There are two important design considerations around tools: 

  • Giving the agent access to the right tools 
  • Describing the tools in a way that is most helpful to the agent 

Without thinking through both, you won’t be able to build a working agent. If you don’t give the agent access to a correct set of tools, it will never be able to accomplish the objectives you give it. If you don’t describe the tools well, the agent won’t know how to use them properly. Some of the vital tools a working agent needs are:


  1. SerpAPI : This page covers how to use the SerpAPI search APIs within Lang Chain. It is broken into two parts: installation and setup, and then references to the specific SerpAPI wrapper. Here are the details for its installation and setup:
  • Install requirements with pip install google-search-results 
  • Get a SerpAPI api key and either set it as an environment variable (SERPAPI_API_KEY) 

You can also easily load this wrapper as a tool (to use with an agent). You can do this with:



2. Math-tool: The llm-math tool wraps an LLM to do math operations. It can be loaded into the agent tools like: 

Python-REPL tool: Allows agents to execute Python code. To load this tool, you can use: 


Working of agents in LangChain: Exploring the dynamics

Working of agents in LangChain: Exploring the dynamics




The action of python REPL allows agent to execute the input code and provide the response. 


The impact of agents: 

A noteworthy advantage of LLM agents is their potential to exhibit self-initiated behaviors ranging from purely reactive to highly proactive. This can be harnessed to create versatile AI partners capable of comprehending natural language prompts and collaborating with human oversight. 


LLM agents leverage LLMs innate linguistic abilities to understand instructions, context, and goals, operate autonomously and semi-autonomously based on human prompts, and harness a suite of tools such as calculators, APIs, and search engines to complete assigned tasks, making logical connections to work towards conclusions and solutions to problems. Here are few of the services that are highly dominated by the use of Lang Chain agents:


Working of agents in LangChain: Exploring the dynamics



Facilitating language services 

Agents play a critical role in delivering language services such as translation, interpretation, and linguistic analysis. Ultimately, this process steers the actions of the agent through the encoding of personas, instructions, and permissions within meticulously constructed prompts.

Users effectively steer the agent by offering interactive cues following the AI’s responses. Thoughtfully designed prompts facilitate a smooth collaboration between humans and AI. Their expertise ensures accurate and efficient communication across diverse languages. 



Quality assurance and validation 

Ensuring the accuracy and quality of language-related services is a core responsibility. Agents verify translations, validate linguistic data, and maintain high standards to meet user expectations. Agents can manage relatively self-contained workflows with human oversight.

Use internal validation to verify the accuracy and coherence of their generated content. Agents undergo rigorous testing against various datasets and scenarios. These tests validate the agent’s ability to comprehend queries, generate accurate responses, and handle diverse inputs. 


Types of agents 

Agents use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning a response to the user. Here are the agents available in Lang Chain.  

Zero-Shot ReAct: This agent uses the ReAct framework to determine which tool to use based solely on the tool’s description. Any number of tools can be provided. This agent requires that a description is provided for each tool. Below is how we can set up this Agent: 


Working of agents in LangChain: Exploring the dynamics


Let’s invoke this agent and check if it’s working in chain 

Working of agents in LangChain: Exploring the dynamics



This will invoke the agent. 

Structured-Input ReAct: The structured tool chat agent is capable of using multi-input tools. Older agents are configured to specify an action input as a single string, but this agent can use a tool’s argument schema to create a structured action input. This is useful for more complex tool usage, like precisely navigating around a browser. Here is how one can setup the React agent:


Working of agents in LangChain: Exploring the dynamics


The further necessary imports required are:

Working of agents in LangChain: Exploring the dynamics



Setting up parameters:


Working of agents in LangChain: Exploring the dynamics

Creating the agent:

Working of agents in LangChain: Exploring the dynamics



Improving performance of an agent 

Enhancing the capabilities of agents in Large Language Models (LLMs) necessitates a multi-faceted approach. Firstly, it is essential to keep refining the art and science of prompt engineering, which is a key component in directing these systems securely and efficiently. As prompt engineering improves, so does the competencies of LLM agents, allowing them to venture into new spheres of AI assistance.

Secondly, integrating additional components can expand agents’ reasoning and expertise. These components include knowledge banks for updating domain-specific vocabularies, lookup tools for data gathering, and memory enhancement for retaining interactions.

Thus, increasing the autonomous capabilities of agents requires more than just improved prompts; they also need access to knowledge bases, memory, and reasoning tools.

Lastly, it is vital to maintain a clear iterative prompt cycle, which is key to facilitating natural conversations between users and LLM agents. Repeated cycling allows the LLM agent to converge on solutions, reveal deeper insights, and maintain topic focus within an ongoing conversation. 



The advent of large language model agents marks a turning point in the AI domain. With increasing advances in the field, these agents are strengthening their footing as autonomous, proactive entities capable of reasoning and executing tasks effectively.

The application and impact of Large Language Model agents are vast and game-changing, from conversational chatbots to workflow automation. The potential challenges or obstacles include ensuring the consistency and relevance of the information the agent processes, and the caution with which personal or sensitive data should be treated. The promising future outlook of these agents is the potentially increased level of automated and efficient interaction humans can have with AI. 

December 20, 2023

In this blog, we are enhancing our Language Model (LLM) experience by adopting the Retrieval-Augmented Generation (RAG) approach!

We’ll explore the fundamental architecture of RAG conceptually and delve deeper by implementing it through the Lang Chain orchestration framework and leveraging an open-source model from Hugging Face for both question answering and text embedding. 

So, let’s get started! 

Common hallucinations in large language models  

The most common problem faced by state-of-the-art LLMs is that they produce inaccurate or hallucinated responses. This mostly occurs when prompted with information not present in their training set, despite being trained on extensive data.


This discrepancy between the general knowledge embedded in the LLM’s weights and newer information can be bridged using RAG. The solution provided by RAG eliminates the need for computationally intensive and expertise-dependent fine-tuning, offering a more flexible approach to adapting to evolving information.


Read more about: AI hallucinations and risks associated with large language models




AI hallucinations
AI hallucinations

What is RAG? 

Retrieval-Augmented Generation involves enhancing the output of Large Language Models (LLMs) by providing them with additional information from an external knowledge source.


This method aims to improve the accuracy and contextuality of LLM-generated responses while minimizing factual inaccuracies. RAG empowers language models to sidestep the need for retraining, facilitating access to the most up-to-date information to produce trustworthy outputs through retrieval-based generation. 

Architecture of RAG approach

Retrieval augmented generation (RAG) - Elevate your large language models experience

Figure from Lang chain documentation

Prerequisites for code implementation 

  1. HuggingFace account and LLAMA2 model access:
  • Create a Hugging Face account (free sign-up available) to access open-source Llama 2 and embedding models. 
  • Request access to LLAMA2 models using this form (access is typically granted within a few hours). 
  • After gaining access to Llama 2 models, please proceed to the provided link, select the checkbox to indicate your agreement to the information, and then click ‘Submit’. 

2. Google Colab account:

  • Create a Google account if you don’t already have one. 
  • Use Google Colab for code execution. 

3. Google Colab environment setup: 

  • In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4 for faster execution of code. 

4. Library and dependency installation: 

  • Install necessary libraries and dependencies using the following command: 


5. Authentication with HuggingFace: 

  • Integrate your Hugging Face token into Colab’s environment:



  • When prompted, enter your Hugging Face token obtained from the “Access Token” tab in your Hugging Face settings. 


Step 1: Document Loading 

Loading a document refers to the process of retrieving and storing data as documents in memory from a specified source. This process is typically facilitated by document loaders, which provide a “load” method for accessing and loading documents into the memory. 

Lang chain has number of document loaders in this example we will be using “WebBaseLoader” class from the “langchain.document_loaders” module to load content from a specific web page.



The code extracts content from the web page ““. BeautifulSoup (`bs4`) is employed for HTML parsing, focusing on elements with the classes “post-content”, “post-title”, and “post-header.” The loaded content is stored in the variable `docs`. 



Step 2: Document transformation – Splitting/chunking document 

After loading the data, it can be transformed to fit the application’s requirements or to extract relevant portions. This involves splitting lengthy documents into smaller chunks that are compatible with the model and produce accurate and clear results. Lang Chain offers various text splitters, in this implementation we chose the “RecursiveCharacterTextSplitter” for generic text processing.



The code breaks documents into chunks of 1000 characters with a 200-character overlap. This chunking is employed for embedding and vector storage, enabling more focused retrieval of relevant content during runtime. The recursive splitter ensures chunks maintain contextual integrity by using common separators, like new lines, until the desired chunk size is achieved. 

Step 3: Storage in vector database 

After extracting text chunks, we store and index them for future searches using the RAG application. A common approach involves embedding the content of each split and storing these embeddings in a vector store. 

When searching, we embed the search query and perform a similarity search to identify stored splits with embeddings most similar to the query embedding. Cosine similarity, which measures the angle between embeddings, is a simple similarity measure. 

Using the Chroma vector store and open source “HuggingFaceEmbeddings” in Lang chain, we can embed and store all document splits in a single command. 

Text embedding: 

Text embedding converts textual data into numerical vectors that capture the semantic meaning of the text. This enables efficient identification of similar text pieces. An embedding model, which is a variant of Language Models (LLMs) specifically designed for this purpose. 

 Lang Chain’s Embeddings class facilitates interaction with various text embedding models. While any model can be used, we opted for “HuggingFaceEmbeddings”. 




This code initializes an instance of the HuggingFaceEmbeddings class, configuring it with an open-source pre-trained model located at “sentence-transformers/all-MiniLM-l6-v2“. By doing this text embedding is created for converting textual data into numerical vectors. 


Vector Stores: 

Vector stores are specialized databases designed to efficiently store and search for high-dimensional vectors, such as text embeddings. They enable the retrieval of the most similar embedding vectors based on a given query vector. Lang Chain integrates with various vector stores, and we are using “Chroma” vector store for this task.



This code utilizes the Chroma class to create a vector store (vectorstore) from the previously split documents (splits) using the specified embeddings (embeddings). The Chroma vector store facilitates efficient storage and retrieval of document vectors for further processing. 

Step 4: Retrieval of text chunks 

After storing the data, preparing the LLM model, and constructing the pipeline, we need to retrieve the data. Retrievers serve as interfaces that return documents based on a query. 

Retrievers cannot store documents; they can only retrieve them. Vector stores form the foundation of retrievers. Lang Chain offers a variety of retriever algorithms, here is the one we implement. 



Step 5: Generation of answer with RAG approach 

Preparing the LLM Model: 

In the context of Retrieval Augmented Generation (RAG), an LLM model plays a crucial role in generating comprehensive and informative responses to user queries. By leveraging its ability to process and understand natural language, the LLM model can effectively combine retrieved documents with the given query to produce insightful and relevant outputs.


These lines import the necessary libraries for handling pre-trained models and tokenization. The specific model “meta-llama/Llama-2-7b-chat-hfis chosen for its question-answering capabilities.




This code defines a transformer pipeline, which encapsulates the pre-trained HuggingFace model and its associated configuration. It specifies the task as “text-generation” and sets various parameters to optimize the pipeline’s performance. 



This line creates a Lang Chain pipeline (HuggingFace Pipeline) that wraps the transformer pipeline. The model_kwargs parameter adjusts the model’s “temperature” to control its creativity and randomness. 

Retrieval QA Chain: 

To combine question-answering with a retrieval step, we employ the RetrievalQA chain, which utilizes a language model and a vector database as a retriever. By default, we process all data in a single batch and set the chain type to “stuff” when interacting with the language model. 






This code initializes a RetrievalQA instance by specifying a chain type (“stuff”), a HuggingFacePipeline (llm), and a retriever (retriever-initialize previously in the code from vectorstore). The return_source_documents parameter is set to True to include source documents in the output, enhancing contextual information retrieval.

Finally, we call this QA chain with the specific question we want to ask.



The result will be: 



We can print source documents to see which document chunks the model used to generate the answer to this specific query.





In this output, only 2 out of 4 document contents are shown as an example, that were retrieved to answer the specific question. 


In conclusion, by embracing the Retrieval-Augmented Generation (RAG) approach, we have elevated our Language Model (LLM) experience to new heights.

Through a deep dive into the conceptual foundations of RAG and practical implementation using the Lang Chain orchestration framework, coupled with the power of an open-source model from Hugging Face, we have enhanced question answering capabilities of LLMs.

This journey exemplifies the seamless integration of innovative technologies to optimize LLM capabilities, paving the way for a more efficient and powerful language processing experience. Cheers to the exciting possibilities that arise from combining innovative approaches with open-source resources! 

December 6, 2023

In this blog, we delve into Large Language Model Evaluation and Tracing with LangSmith, emphasizing their pivotal role in ensuring application reliability and performance.

You’ll learn to set up LangSmith, connect it with LangChain, and master the process of precise tracing and evaluation, equipping you with the tools to optimize your Large Language Model applications and bring them to production. Discover the key to unlock your model’s full potential.

LLM evaluation and tracing with LangSmith


Whether you’re an experienced developer or just starting your journey, LangSmith’s private beta provides a valuable tool for your toolkit. 

Understanding the significance of evaluation and tracing is key to improving Large Language Model applications, ensuring the reliability, correctness, and performance of your models. This is a critical step in the development process, particularly if you’re working towards bringing your LLM application to production. 

LangSmith and LangChain in LLM application

To working on Large Language Models (LLMs), LangChain and LangSmith stand as key pillars for developers and AI enthusiasts.

LangChain simplifies the integration of powerful LLMs into applications, streamlining data access, and offering flexibility through concepts like “Chains” and “Agents.” It bridges the gap between these models and external data sources, enabling the creation of robust natural language processing applications.

LangSmith, developed by LangChain, takes LLM application development to the next level. It aids in debugging, monitoring, and evaluating LLM-based applications, with features like logging runs, visualizing components, and facilitating collaboration. It ensures the reliability and efficiency of your LLM applications.

These two tools together form a dynamic duo, unleashing the true potential of large language models in application development. In the upcoming sections, we’ll delve deeper into the mechanics, showcasing how they can elevate your LLM projects to new heights.


Quick start to LangSmith


Please note that LangSmith is currently in a private beta phase, so we’ll show you how to join the waitlist. Once LangSmith releases new invites, you’ll be at the forefront of this innovative platform. 

Sign up for an account here

welcome to LangSmith


Configuring LangSmith with LangChain 

Configuring LangSmith alongside LangChain is a straightforward procedure. It merely involves a few simple steps to establish LangSmith and start utilizing it for tracing and evaluation. 


To initiate your journey, follow the sequential steps provided below: 

  • Begin by creating a LangSmith account, as outlined in the prerequisites 
  • In your working folder, create .env file containing essential environment variables. Although initial placeholders are provided, these will be replaced in subsequent steps: 



  • Substitute the placeholder <your-openai-api-key> with your OpenAI API key obtained from OpenAI. 
  • For LangChain API key, navigate to settings page on LangSmith, generate the key and replace the placeholder. 


LangSmith-Create API key- 1


  • Return to the home page and create a project with a suitable name. Subsequently, copy the project name and update the placeholder. 


LangSmith - Project 2

  • Install it and any other necessary dependencies with the following command: 




  • Execute the provided example code to initiate the process: 



  • After running the code, return to the LangSmith home page, and access the project you just created. 

Getting started with LangSmith 3  


  • Within the “Traces” section, you will find the run that was recently executed. Click on it to access detailed trace information. 

Getting started with LangSmith 4

Congratulations, your initial run is now visible and traceable within LangSmith! 

Scenario # 01: LLM Tracing 

What is a trace? 

A ‘Run’ signifies a solitary instance of a task or operation within your LLM application. This could be anything from a single call to an LLM, chain, or agent. 



A ‘Trace’ encompasses an arrangement of runs structured in a hierarchical or interconnected manner. The highest-level run in a trace, known as the ‘Root Run,’ is the one directly triggered by the user or application. The root run is designated with an execution order of 1, indicating the order in which it was initiated within the trace when considered as a sequence. 

Examples of traces 

We’ve already examined a straightforward LLM Call trace, where we observed the input provided to the large language model and the resulting output. In this uncomplicated case, a single run was evident, devoid of any hierarchical or multiple run structures.  

Now, let’s delve further by tracing LangChain chain and agent to uncover deeper insights into their operations. 

Trace a sequential chain 

In this instance, we explore the tracing of a sequential chain within LangChain, a foundational chain of this platform. Sequential chains enable the connection of multiple chains, creating complex pipelines for specific scenarios. Detailed information on this can be found here. 

Let’s run this example of sequential chain and see what we get in the trace. 




Upon executing the code for this sequential chain and returning to our project, a new trace, ‘SimpleSequentialChain,’ becomes visible. 


LangSmith - ChatOpenAI 5   


Upon examination, this trace reveals a collection of LLM calls, featuring two distinct LLM call runs within its hierarchy. 


LangSmith - Sequential Chain 6


This delineation of execution order becomes apparent; in our example, the initial run entails extracting a title and constructing a synopsis, as displayed in the provided screenshot. 

LangSmith - ChatOpenAI 7

Subsequently, the second run utilizes the synopsis and the output from the first run to generate a review. 



LangSmith - ChatOpenAI 8

This meticulous tracing mechanism grants us the ability to inspect intermediate results, the messages transmitted to the LLM, and the outputs at each step, all while offering insights into token counts and latency measures. Furthermore, the option to filter traces based on various parameters adds an additional layer of customization and control. 


Evaluate and trace with LangSmith: Mastering LLM optimization



Trace an agent 

In this segment, we embark on a journey to trace an agent’s inner workings using LangSmith. For those keen to delve deeper into the world of agents, you’ll find comprehensive documentation in LangChain.

To provide a brief overview, we’ve engineered a ZeroShotAgent, equipping it with tools like DuckDuckGo search and paraphrasing capabilities. The agent interacts with user queries, employing these tools in a ReAct(Reason + Act) manner to generate response. 

Here is the code for agent: 




By tracing the agent’s actions, we gain insights into the sequence and tools utilized by the agent, as well as the intermediate outputs it produces. This tracing capability proves invaluable for agent design and debugging, allowing us to identify and resolve errors efficiently.


LangSmith - Agent executor 9


The trace reveals that the agent initiates with an LLM call, proceeds to search for DuckDuckGo Results Json, engages the paraphraser, and subsequently executes two additional LLM calls to generate responses, which in our case are the suggested blog topics. 

These traces underscore the critical role tracing plays in debugging and designing effective LLM applications. It’s important to note that all this information is meticulously logged in LangSmith, offering a treasure trove of insights for various applications, which we’ll briefly explore in subsequent sections. 


Sharing your trace 

LangSmith simplifies the process of sharing the logged runs. This feature facilitates easy publishing and replication of your work. For example, if you encounter a bug or unexpected output under specific conditions, you can share it with your team or create an issue on LangChain for collaborative troubleshooting.

By simply clicking the share option located at the top right corner of the page, you can effortlessly distribute your run for analysis and resolution 


LangSmith - Agent executor 10


LangSmith Run shared 11  


Scenario # 02: Testing and evaluation 

Why is testing and evaluation essential for LLMs? 

The development of high-quality, production-grade Large Language Model (LLM) applications is a complex task fraught with challenges, including: 

  • Non-deterministic Outputs: LLM models operate probabilistically, often yielding varying outputs for the same input prompt. This unpredictability persists even when utilizing a temperature setting of 0, as model weights are not static over time. 
  • API Opacity: Models underpinning APIs undergo changes and updates, making it imperative to assess their evolving behavior. 
  • Security Concerns: LLMs are susceptible to prompt injections, posing potential security risks. 
  • Latency Requirements: Many applications demand swift response times. 

These challenges underscore the critical need for rigorous testing and evaluation in the development of LLM applications. 

Step-by-step LLM evaluation process 

1. Define an LLM chain 

Begin by defining an LLM and creating a simple LLM chain aimed at generating concise responses to specific queries. This LLM will serve as the subject of evaluation and testing. 



2. Create a dataset 

Generate a compact dataset comprising question-and-answer pairs related to computer science abbreviations and terms. This data set, containing both questions and their corresponding answers, will be used to evaluate and test the model. 




After executing the code, navigate to LangSmith. Within the “Datasets & Testing” section, you’ll find the dataset you’ve created. By expanding it under “examples,” you’ll encounter the six specific examples you’ve defined for evaluation.  



LangSmith - Datasets and testing 13  


3. Evaluation 

For our evaluations, we’ll make use of the LangChain evaluator, specifically focusing on the ‘Correctness: QA evaluation.’ QA evaluators play a vital role in assessing the accuracy of responses to user queries, especially when you have a dataset with reference labels or context documents. Our approach incorporates all three QA evaluators: 

  • “context_qa”: This evaluator directs the LLM chain to utilize reference “context” (supplied through example outputs) to ascertain correctness. 
  • “qa”: It prompts an LLMChain to directly appraise a response as either “correct” or “incorrect,” based on the reference answer. 
  • “cot_qa”: This evaluator closely resembles “context_qa” but introduces a chain of thought “reasoning” before delivering a final verdict. This approach generally leads to responses that align more closely with human judgments, albeit with a slightly increased token and runtime cost. 

Below is the code to kick start the evaluation on the dataset. 




4. Reviewing evaluation outcomes 

Upon completing the evaluation, LangSmith provides a platform to examine the results. Navigate to the “Dataset & Testing” section, select the dataset used for the evaluation, and access “Test Runs.” You’ll find the designated Test Run Name and feedback from the evaluator. 

By clicking on the Test Run Name, you can delve deeper, inspect feedback for individual examples, and view side-by-side comparisons. Clicking on any reference example reveals detailed information. 


LangSmith traces 14


For instance, the first example received a perfect score of 1 from all three evaluators. The generated and expected outputs are presented side by side, accompanied by feedback and comments from the evaluator. 



LangSmith - Run 15





However, in a different example, one evaluator issued a score of 1, while the other two scored it as 0. Upon closer examination, it becomes apparent that there exists a disparity between the generated and expected outputs 

LangSmith Run - 16

LLM chain LangSmith - 17



The “cot-qa” evaluator assigned a score of 1, and further exploration of the comments reveals that, although the generated output was correct, discrepancies in the dataset contextually influenced the evaluation. It’s worth noting that the “cot-qa” evaluator spotted this, demonstrating its ability to notice context-related subtleties that other evaluators might miss. 

Run - LangSmith 18


Varied evaluation choices (Delve deeper) 

The evaluator showcased in the previous example is but one of several available within LangSmith. Each option serves specific purposes and holds its unique value. For a detailed understanding of each evaluator’s specific functions and to explore illustrative examples, we encourage you to explore LangChain Evaluators where in-depth coverage of these available options is provided. 


Implement the power of tracing and evaluation with LangSmith 

In summary, our journey through LangSmith has underscored the critical importance of evaluating and tracing Large Language Model applications. These processes are the cornerstone of reliability and high performance, ensuring that your models meet rigorous standards. 

With LangSmith, we’ve explored the power of precise tracing and evaluation, empowering you to optimize your models confidently. As you continue your exploration, remember that your LLM applications hold limitless potential, and LangSmith is your guiding light on this path of discovery. Thank you for joining us on this transformative journey through the world of LLM Evaluation and Tracing with LangSmith. 

October 7, 2023

In the dynamic realm of language models and data-driven apps, efficient orchestration frameworks are key. Explore LangChain and Llama Index, simplifying LLM-app interactions.

Large language models (LLMs) are becoming increasingly popular for a variety of tasks, such as natural language understanding, question answering, and text generation. However, LLMs can be complex and difficult to use, which is where orchestration frameworks come in.

Orchestration frameworks provide a way to manage and control LLMs. They can help to simplify the development and deployment of LLM-based applications, and they can also help to improve the performance and reliability of these applications.

There are a number of orchestration frameworks available, two of the most popular being LangChain and Llama Index.

LangChain and Orchestration Frameworks

LangChain is an open-source orchestration framework that is designed to be easy to use and scalable. It provides a number of features that make it well-suited for managing LLMs, such as:

  • A simple API that makes it easy to interact with LLMs
  • A distributed architecture that can scale to handle large numbers of LLMs
  • A variety of features for managing LLMs, such as load balancing, fault tolerance, and security

Llama Index is another open-source orchestration framework that is designed for managing LLMs. It provides a number of features that are similar to LangChain, such as:

  • A simple API
  • A distributed architecture
  • A variety of features for managing LLMs

However, Llama Index also has some unique features that make it well-suited for certain applications, such as:

  • The ability to query LLMs in a distributed manner
  • The ability to index LLMs so that they can be searched more efficiently

Both LangChain and Llama Index are powerful orchestration frameworks that can be used to manage LLMs. The best framework for a particular application will depend on the specific requirements of that application.

In addition to LangChain and Llama Index, there are a number of other orchestration frameworks available, such as Bard, Megatron, Megatron-Turing NLG and OpenAI Five. These frameworks offer a variety of features and capabilities, so it is important to choose the one that best meets the needs of your application.

LangChain and Orchestration Frameworks
LangChain and Orchestration Frameworks – Source: TheNewsStack

LlamaIndex and LangChain: Orchestrating LLMs


The venture capital firm Andreessen Horowitz (a16z) identifies both LlamaIndex and LangChain as orchestration frameworks that abstract away the complexities of prompt chaining, enabling seamless data querying and management between applications and LLMs. This orchestration process encompasses interactions with external APIs, retrieval of contextual data from vector databases, and maintaining memory across multiple LLM calls.

LlamaIndex: A data framework for the future

LlamaIndex distinguishes itself by offering a unique approach to combining custom data with LLMs, all without the need for fine-tuning or in-context learning. It defines itself as a “simple, flexible data framework for connecting custom data sources to large language models.” Moreover, it accommodates a wide range of data types, making it an inclusive solution for diverse data needs.

Continuous evolution: LlamaIndex 0.7.0

LlamaIndex is a dynamic and evolving framework. Its creator, Jerry Liu, recently released version 0.7.0, which focuses on enhancing modularity and customizability to facilitate the development of LLM applications that leverage your data effectively. This release underscores the commitment to providing developers with tools to architect data structures for LLM applications.

The LlamaIndex Ecosystem: LlamaHub

At the core of LlamaIndex lies LlamaHub, a data ingestion platform that plays a pivotal role in getting started with the framework. LlamaHub offers a library of data loaders and readers, making data ingestion a seamless process. Notably, LlamaHub is not exclusive to LlamaIndex; it can also be integrated with LangChain, expanding its utility.



Navigating the LlamaIndex workflow

Users of LlamaIndex typically follow a structured workflow:

  1. Parsing Documents into Nodes
  2. Constructing an Index (from Nodes or Documents)
  3. Optional Advanced Step: Building Indices on Top of Other Indices
  4. Querying the Index

The querying aspect involves interactions with an LLM, where a “query” serves as an input. While this process can be complex, it forms the foundation of LlamaIndex’s functionality.

In essence, LlamaIndex empowers users to feed pertinent information into an LLM prompt selectively. Instead of overwhelming the LLM with all custom data, LlamaIndex allows users to extract relevant information for each query, streamlining the process.


Power of LlamaIndex and LangChain

LlamaIndex seamlessly integrates with LangChain, offering users flexibility in data retrieval and query management. It extends the functionality of data loaders by treating them as LangChain Tools and providing Tool abstractions to use LlamaIndex’s query engine alongside a LangChain agent.

Real-world applications: Context-augmented chatbots

LlamaIndex and LangChain join forces to create context-rich chatbots. Learn how these frameworks can be leveraged to build chatbots that provide enhanced contextual responses.

This comprehensive exploration unveils the potential of LlamaIndex, offering insights into its evolution, features, and practical applications.

Why are orchestration frameworks needed?

Data orchestration frameworks are essential for building applications on enterprise data because they help to:

  • Eliminate the need for foundation model retraining: Foundation models are large language models that are trained on massive datasets of text and code. They can be used to perform a variety of tasks, such as generating text, translating languages, and answering questions. However, foundation models can be expensive to train and retrain. Orchestration frameworks can help to reduce the need for retraining by allowing you to reuse trained models across multiple applications.


  • Overcome token limits: Foundation models often have token limits, which restrict the number of words or tokens that can be processed in a single request. Orchestration frameworks can help to overcome token limits by breaking down large tasks into smaller subtasks that can be processed separately.

  • Provide connectors for data sources: Orchestration frameworks typically provide connectors for a variety of data sources, such as databases, cloud storage, and APIs. This makes it easy to connect your data pipeline to the data sources that you need.

  • Reduce boilerplate code: Orchestration frameworks can help to reduce boilerplate code by providing a variety of pre-built components for common tasks, such as data extraction, transformation, and loading. This allows you to focus on the business logic of your application.

Popular orchestration frameworks

There are a number of popular orchestration frameworks available, including:

  • Prefect is an open-source orchestration framework that is written in Python. It is known for its ease of use and flexibility.

  • Airflow is an open-source orchestration framework that is written in Python. It is widely used in the enterprise and is known for its scalability and reliability.

  • Luigi is an open-source orchestration framework that is written in Python. It is known for its simplicity and performance.

  • Dagster is an open-source orchestration framework that is written in Python. It is known for its extensibility and modularity.


Read more –> FraudGPT: Evolution of ChatGPT into an AI weapon for cybercriminals in 2023


Choosing the right orchestration framework

When choosing an orchestration framework, there are a number of factors to consider, such as:

  1. Ease of use: The framework should be easy to use and learn, even for users with no prior experience with orchestration.
  2. Flexibility: The framework should be flexible enough to support a wide range of data pipelines and workflows.
  3. Scalability: The framework should be able to scale to meet the needs of your organization, even as your data volumes and processing requirements grow.
  4. Reliability: The framework should be reliable and stable, with minimal downtime.
  5. Community support: The framework should have a large and active community of users and contributors.


Orchestration frameworks are essential for building applications on enterprise data. They can help to eliminate the need for foundation model retraining, overcome token limits, connect to data sources, and reduce boilerplate code. When choosing an orchestration framework, consider factors such as ease of use, flexibility, scalability, reliability, and community support.

September 14, 2023

Before we understand LlamaIndex, let’s step back a bit. Imagine a futuristic landscape where machines possess an extraordinary ability to understand and produce human-like text effortlessly. LLMs have made this vision a reality. Armed with a vast ocean of training data, these marvels of innovation have become the crown jewels of the tech world.

There is no denying that LLMs (Large Language Models) are currently the talk of the town! From revolutionizing text generation and reasoning, LLMs are trained on massive datasets and have been making waves in the tech vicinity.

One particular LLM has emerged as a true superstar. Back in November 2022, ChatGPT, an LLM developed by OpenAI, attracted a staggering one million users within 5 days of its beta launch.

Source: Chart: ChatGPT Sprints to One Million Users | Statista  

When researchers and developers saw these stats they started thinking on how we can best feed/augment these LLMs with our own private data. They started thinking about different solutions.

Finetune your own LLM. You adapt an existing LLM by training your data. But, this is very costly and time-consuming.

Combining all the documents into a single large prompt for an LLM might be possible now with the increased token limit of 100k for models. However, this approach could result in slower processing times and higher computational costs.

Instead of inputting all the data, selectively provide relevant information to the LLM prompt. Choose the useful bits for each query instead of including everything.

Option 3 appears to be both relevant and feasible, but it requires the development of a specialized toolkit. Recognizing this need, efforts have already begun to create the necessary tools.

Introducing LlamaIndex

Recently a toolkit was launched for building applications using LLM, known as Langchain. LlamaIndex is built on top of Langchain to provide a central interface to connect your LLMs with external data.

Key Components of LlamaIndex:

The key components of LlamaIndex are as follows

  • Data Connectors: The data connector, known as the Reader, collects data from various sources and formats, converting it into a straightforward document format with textual content and basic metadata.
  • Data Index: It is a data structure facilitating efficient retrieval of pertinent information in response to user queries. At a broad level, Indices are constructed using Documents and serve as the foundation for Query Engines and Chat Engines, enabling seamless interactions and question-and-answer capabilities based on the underlying data. Internally, Indices store data within Node objects, which represent segments of the original documents.
  • Retrievers: Retrievers play a crucial role in obtaining the most pertinent information based on user queries or chat messages. They can be constructed based on Indices or as standalone components and serve as a fundamental element in Query Engines and Chat Engines for retrieving contextually relevant data.
  • Query Engines: A query engine is a versatile interface that enables users to pose questions regarding their data. By accepting natural language queries, the query engine provides comprehensive and informative responses.
  • Chat Engines: A chat engine serves as an advanced interface for engaging in interactive conversations with your data, allowing for multiple exchanges instead of a single question-and-answer format. Similar to ChatGPT but enhanced with access to a knowledge base, the chat engine maintains a contextual understanding by retaining the conversation history and can provide answers that consider the relevant past context.

Difference between query engine and chat engine:

It is important to note that there is a significant distinction between a query engine and a chat engine. Although they may appear similar at first glance, they serve different purposes:

A query engine operates as an independent system that handles individual questions over the data without maintaining a record of the conversation history.

On the other hand, a chat engine is designed to keep track of the entire conversation history, allowing users to query both the data and previous responses. This functionality resembles ChatGPT, where the chat engine leverages the context of past exchanges to provide more comprehensive and contextually relevant answers

  • Customization: LlamaIndex offers customization options where you can modify the default settings, such as the utilization of OpenAI’s text-davinci-003 model. Users have the flexibility to customize the underlying language model (LLM) and other settings used in LlamaIndex, with support for various integrations and LangChain’s LLM modules.
  • Analysis: LlamaIndex offers a diverse range of analysis tools for examining indices and queries. These tools include features for analyzing token usage and associated costs. Additionally, LlamaIndex provides a Playground module, which presents a visual interface for analyzing token usage across different index structures and evaluating performance metrics.
  • Structured Outputs: LlamaIndex offers an assortment of modules that empower language models (LLMs) to generate structured outputs. These modules are available at various levels of abstraction, providing flexibility and versatility in producing organized and formatted results.
  • Evaluation: LlamaIndex provides essential modules for assessing the quality of both document retrieval and response synthesis. These modules enable the evaluation of “hallucination,” which refers to situations where the generated response does not align with the retrieved sources. A hallucination occurs when the model generates an answer without effectively grounding it in the given contextual information from the prompt.
  • Integrations: LlamaIndex offers a wide array of integrations with various toolsets and storage providers. These integrations encompass features such as utilizing vector stores, integrating with ChatGPT plugins, compatibility with Langchain, and the capability to trace with Graphsignal. These integrations enhance the functionality and versatility of LlamaIndex by allowing seamless interaction with different tools and platforms.
  • Callbacks: LlamaIndex offers a callback feature that assists in debugging, tracking, and tracing the internal operations of the library. The callback manager allows for the addition of multiple callbacks as required. These callbacks not only log event-related data but also track the duration and frequency of each event occurrence. Moreover, a trace map of events is recorded, providing valuable information that callbacks can utilize in a manner that best suits their specific needs.
  • Storage: LlamaIndex offers a user-friendly interface that simplifies the process of ingesting, indexing, and querying external data. By abstracting away complexities, LlamaIndex allows users to query their data with just a few lines of code. Behind the scenes, LlamaIndex provides the flexibility to customize storage components for different purposes. This includes document stores for storing ingested documents (represented as Node objects), index stores for storing index metadata, and vector stores for storing embedding vectors.The document and index stores utilize a shared key-value store abstraction, providing a common framework for efficient storage and retrieval of data

Now that we have explored the key components of LlamaIndex, let’s delve into its operational mechanisms and understand how it functions.

How Llama-Index Works:

To begin, the first step is to import the documents into LlamaIndex, which provides various pre-existing readers for sources like databases, Discord, Slack, Google Sheets, Notion, and the one we will utilize today, the Simple Directory Reader, among others.[Text Wrapping Break][Text Wrapping Break]You can check for more here: Llama Hub (

Once the documents are loaded, LlamaIndex proceeds to parse them into nodes, which are essentially segments of text. Subsequently, an index is constructed to enable quick retrieval of relevant data when querying the documents. The index can be stored in different formats, but we will opt for a Vector Store as it is typically the most useful when querying text documents without specific limitations.

LlamaIndex is built upon LangChain, which serves as the foundational framework for a wide range of LLM applications. While LangChain provides the fundamental building blocks, LlamaIndex is specifically designed to streamline the workflow described above.

Here is an example code showcasing the utilization of the SimpleDirectoryReader data loader in LlamaIndex, along with the integration of the OpenAI language model for natural language processing.

Installing the necessary libraries required to run the code.

Importing openai library and setting the secret API (Application Programming Interface) key.

Importing the SimpleDirectoryReader class from llama_index library and loading the data from it.

Importing SimpleNodeParser class from llama_index and parsing the documents into nodes – basically in chunks of text.

Importing VectorStoreIndex class from llama_index to create index from the chunks of text so that each time when a query is placed only relevant data is sent to OpenAI. In short, for the sake of cost effectiveness.


LlamaIndex, built on top of Langchain, offers a powerful toolkit for integrating external data with LLMs. By parsing documents into nodes, constructing an efficient index, and selectively querying relevant information, LlamaIndex enables cost-effective exploration of text data.

The provided code example demonstrates the utilization of LlamaIndex’s data loader and query engine, showcasing its potential for next-generation text exploration. For the notebook of the above code, refer to the source code available here.

July 10, 2023

Large language models (LLMs) like GPT-3 and GPT-4. revolutionized the landscape of NLP. These models have laid a strong foundation for creating powerful, scalable applications. However, the potential of these models isaffected by the quality of the prompt. This highlights the importance of prompt engineering.



Furthermore, real-world NLP applications often require more complexity than a single ChatGPT session can provide. This is where LangChain comes into play! 



Get more information on Large Language models and its applications and tools by clicking below:

Harrison Chase’s brainchild, LangChain, is a Python library designed to help you leverage the power of LLMs to build custom NLP applications. As of May 2023, this game-changing library has already garnered almost 40,000 stars on GitHub. 



Interested in learning about Large Language Models and building custom ChatGPT like applications for your business? Click below

This comprehensive beginner’s guide provides a thorough introduction to LangChain, offering a detailed exploration of its core features. It walks you through the process of building a basic application using LangChain and shares valuable tips and industry best practices to make the most of this powerful framework. Whether you’re new to Language Learning Models (LLMs) or looking for a more efficient way to develop language generation applications, this guide serves as a valuable resource to help you leverage the capabilities of LLMs with LangChain. 

Overview of LangChain modules 

These modules are essential for any application using the Language Model (LLM).


LangChain offers standardized and adaptable interfaces for each module. Additionally, LangChain provides external integrations and even ready-made implementations for seamless usage. Let’s delve deeper into these modules. 

Overview of LangChain Modules
Overview of LangChain Modules


LLM is the fundamental component of LangChain. It is essentially a wrapper around a large language model that helps use the functionality and capability of a specific large language model. 


As stated earlier, LLM (Language Model) serves as the fundamental unit within LangChain. However, in line with the “LangChain” concept, it offers the ability to link together multiple LLM calls to address specific objectives. 

For instance, you may have a need to retrieve data from a specific URL, summarize the retrieved text, and utilize the resulting summary to answer questions. 

On the other hand, chains can also be simpler in nature. For instance, you might want to gather user input, construct a prompt using that input, and generate a response based on the constructed prompt. 


Prompts have become a popular modeling approach in programming. It simplifies prompt creation and management with specialized classes and functions, including the essential PromptTemplate. 


Document loaders and Utils 

LangChain’s Document Loaders and Utils modules simplify data access and computation. Document loaders convert diverse data sources into text for processing, while the utils module offers interactive system sessions and code snippets for mathematical computations. 

Vector stores 

The widely used index type involves generating numerical embeddings for each document using an embedding model. These embeddings, along with the associated documents, are stored in a vector store. This vector store enables efficient retrieval of relevant documents based on their embeddings. 


LangChain offers a flexible approach for tasks where the sequence of language model calls is not deterministic. Its “Agents” can act based on user input and previous responses. The library also integrates with vector databases and has memory capabilities to retain the state between calls, enabling more advanced interactions. 


Building our App 

Now that we’ve gained an understanding of LangChain, let’s build a PDF Q/A Bot app using LangChain and OpenAI. Let me first show you the architecture diagram for our app and then we will start with our app creation. 


QA Chatbot Architecture
QA Chatbot Architecture


Below is an example code that demonstrates the architecture of a PDF Q&A chatbot. This code utilizes the OpenAI language model for natural language processing, the FAISS database for efficient similarity search, PyPDF2 for reading PDF files, and Streamlit for creating a web application interface.


The chatbot leverages LangChain’s Conversational Retrieval Chain to find the most relevant answer from a document based on the user’s question. This integrated setup enables an interactive and accurate question-answering experience for the users. 

Importing necessary libraries 

Import Statements: These lines import the necessary libraries and functions required to run the application. 

  • PyPDF2: Python library used to read and manipulate PDF files. 
  • langchain: a framework for developing applications powered by language models. 
  • streamlit: A Python library used to create web applications quickly. 
Importing necessary libraries
Importing necessary libraries

If the LangChain and OpenAI are not installed already, you first need to run the following commands in the terminal. 

Install LangChain


Setting openAI API key 

You will replace the placeholder with your OpenAI API key which you can access from OpenAI API. The above line sets the OpenAI API key, which you need to use OpenAI’s language models. 

Setting OpenAI API Key

Streamlit UI 

These lines of code create the web interface using Streamlit. The user is prompted to upload a PDF file.

Streamlit UI
Streamlit UI

Reading the PDF file 

If a file has been uploaded, this block reads the PDF file, extracts the text from each page, and concatenates it into a single string. 

Reading the PDF File
Reading the PDF File

Text splitting 

Language Models are often limited by the amount of text that you can pass to them. Therefore, it is necessary to split them up into smaller chunks. It provides several utilities for doing so. 

Text Splitting 
Text Splitting

Using a Text Splitter can also help improve the results from vector store searches, as eg. smaller chunks may sometimes be more likely to match a query. Here we are splitting the text into 1k tokens with 200 tokens overlap. 


Here, the OpenAIEmbeddings function is used to download embeddings, which are vector representations of the text data. These embeddings are then used with FAISS to create an efficient search index from the chunks of text.  


Creating conversational retrieval chain 

The chains developed are modular components that can be easily reused and connected. They consist of predefined sequences of actions encapsulated in a single line of code. With these chains, there’s no need to explicitly call the GPT model or define prompt properties. This specific chain allows you to engage in conversation while referencing documents and retains a history of interactions. 

Creating Conversational Retrieval Chain
Creating Conversational Retrieval Chain

Streamlit for generating responses and displaying in the App 

This block prepares a response that includes the generated answer and the source documents and displays it on the web interface. 

Streamlit for Generating Responses and Displaying in the App
Streamlit for Generating Responses and Displaying in the App

Let’s run our App 

QA Chatbot
QA Chatbot

Here we uploaded a PDF, asked a question, and got our required answer with the source document. See, that is how the magic of LangChain works.  

You can find the code for this app on my GitHub repository LangChain-Custom-PDF-Chatbot.

Build your own conversational AI applications 

Concluding the journey! Mastering LangChain for creating a basic Q&A application has been a success. I trust you have acquired a fundamental comprehension of LangChain’s potential. Now, take the initiative to delve into LangChain further and construct even more captivating applications. Enjoy the coding adventure.


May 22, 2023

