For a hands-on learning experience to develop LLM applications, join our LLM Bootcamp today.
First 4 seats get an early bird discount of 30%! So hurry up!
As the world becomes more interconnected and data-driven, the demand for real-time applications has never been higher. Artificial intelligence (AI) and natural language processing (NLP) technologies are evolving rapidly to manage live data streams.
They power everything from chatbots and predictive analytics to dynamic content creation and personalized recommendations. Moreover, LangChain is a robust framework that simplifies the development of advanced, real-time AI applications.
In this blog, we’ll explore the concept of streaming Langchain, how to set it up, and why it’s essential for building responsive AI systems that react instantly to user input and real-time data.
What is Streaming Langchain?
In the context of Langchain, streaming refers to the continuous and real-time processing of data as it is received, rather than processing data in large batches at scheduled intervals. This approach is essential for applications that require immediate, context-aware responses or real-time insights.
Streaming enables developers to build applications that react dynamically to ever-changing inputs.For example, Langchain can be used to stream live data such as real-time queries from users, sensor data, financial market movements, or even continuous social media posts.
Unlike batch processing systems, which require collecting data over a period of time before generating output, streaming allows applications to process data instantly as it arrives, ensuring up-to-the-minute responses and analyses.
By leveraging Langchain’s streaming functionality, developers can build systems for:
Real-time Chatbots: AI-powered chatbots that can continuously process user input and deliver immediate, contextually relevant responses without delay.
Live Data Analysis: Applications that can analyze and act on continuously flowing data, such as financial market updates, weather reports, or social media feeds, in real-time.
Interactive Experiences: Dynamic, real-time interactions in gaming, virtual assistants, or customer service applications, where the system provides instant feedback and adapts to user queries as they happen.
Thus, it empowers developers to build dynamic, real-time applications capable of instant processing and adaptive interactions. LangChain’s streaming functionality ensures timely, context-aware responses, enabling smarter and more responsive systems, positioning LangChain as an invaluable tool for building innovative AI solutions.
Why does Streaming Matter in Langchain?
Traditional batch processing workflows often introduce delays in response time. In many modern AI applications, where user interaction is central, this delay can hinder performance. Streaming in Langchain allows for instant feedback as it processes data in real-time, ensuring that applications are more interactive and efficient.
Here’s why streaming is particularly important in Langchain:
Lower Latency
Streaming drastically reduces the time it takes to process incoming data. In real-time applications, such as a customer service chatbot or live data monitoring system, reducing latency is crucial for providing quick, on-demand responses. With Langchain, you can process data as it arrives, minimizing delays and ensuring faster interactions.
Continuous Learning
Real-time data streams allow AI models to adapt and evolve as new data becomes available. This ability to continuously learn means that Langchain-powered systems can better respond to emerging trends, shifts in user behavior, or changing market conditions.
This is especially useful for applications like recommendation engines or predictive analytics systems, where the model must adjust to new patterns over time.
Whether it’s engaging with customers, analyzing live events, or responding to user queries, streaming enables more natural, responsive interactions. This capability is particularly valuable in customer service applications, virtual assistants, or interactive digital experiences where users expect instant, contextually aware responses.
Scalability in Dynamic Environments
Langchain’s streaming functionality is well-suited for applications that need to scale and handle large volumes of data in real-time. Whether you’re processing high-frequency data streams or managing a growing number of concurrent user interactions, streaming ensures your system can handle the increased load without compromising performance.
Hence, streaming LangChain ensures scalable performance, handling large data volumes and concurrent interactions efficiently. Let’s dig deeper into setting up the streaming process.
How to Set Up Streaming in Langchain?
Setting up streaming in Langchain is straightforward and designed to seamlessly integrate real-time data processing into your AI models. Langchain provides two main APIs for streaming outputs in real-time, making it easy to handle dynamic, real-time workflows.
These APIs are supported by any component that implements the RunnableInterface, including Large Language Models (LLMs) and LangGraph workflows.
sync stream and async astream: Stream outputs from individual Runnables (like a chatbot model) as they are generated or stream entire workflows created with LangGraph.
async astream_events: This API provides access to custom events and intermediate outputs from LLM applications built with LCEL (Langchain Expression Language).
Here’s a basic example that implements streaming on the LLM response:
Prerequisite:
Install Python: Make sure you have installed Python 3.8 or later
Install Langchain: Ensure that Langchain is installed in your Python environment. You can install it by pip install langchain_community
Install OpenAi: This is optional and required only in case you want to use OpenAi API
Setting up LLM for streaming:
Begin by importing the required libraries
Set up your OpenAI API key (if you wish to use an OpenAI API)
Make sure the model you want to use supports streaming. Import your model with the “streaming” attribute set to “True”.
Create a function to stream the responses chunk by chunk using the LangChain stream()
Finally, use the function by invoking it on a query/prompt for streaming.
Challenges and Considerations in Streaming Langchain
While Langchain’s streaming capabilities offer powerful features, it’s essential to be aware of a few challenges when implementing real-time data processing.
Below are a few challenges and considerations to highlight when streaming LangChain:
Performance
Streaming real-time data can place significant demands on system resources. To ensure smooth operation, it’s critical to optimize your infrastructure, especially when handling high data throughput. Efficient resource management will help you avoid overloading your servers and ensure consistent performance.
Latency
While streaming promises real-time processing, it can introduce latency, particularly with large or complex data streams. To reduce delays, you may need to fine-tune your data pipeline, optimize processing algorithms, and leverage techniques like batching and caching for better responsiveness.
Error Handling
Real-time streaming data can occasionally experience interruptions or incomplete data, which can affect the stability of your application. Implementing robust error-handling mechanisms is vital to ensure that your AI agents can recover gracefully from disruptions, providing a smooth experience even in the face of network or data issues.
Read more about design patterns for AI agents in LLMs
Summing It Up
Streaming with Langchain opens exciting new possibilities for building dynamic, real-time AI applications. Whether you are developing intelligent chatbots, analyzing live data, or creating interactive user experiences, Langchain’s streaming capabilities empower you to build more responsive and adaptive LLM systems.
The ability to process and react to data in real-time gives you a significant edge in creating smarter applications that can evolve as they interact with users or other data sources.
As Langchain continues to evolve, we can expect even more robust tools to handle streaming data efficiently. Future updates may include advanced integrations with various streaming services, enhanced memory management, and better scalability for large-scale, high-performance applications.
If you’re ready to explore the world of real-time data processing and leverage Langchain’s streaming power, now is the time to dive in and start creating next-gen AI solutions.
RESTful APIs (Application Programming Interfaces) are an integral part of modern web services, and yet as the popularity of large language models (LLMs) increases, we have not seen enough APIs being made accessible to users at the scale that LLMs can enable.
Imagine verbally telling your computer, “Get me weather data for Seattle” and have it magically retrieve the correct and latest information from a trusted API. With LangChain, a Requests Toolkit, and a ReAct agent, talking to your API with natural language is easier than ever.
This blog post will walk you through the process of setting up and utilizing the Requests Toolkit with LangChain in Python. The key steps of the process include acquiring OpenAPI specifications for your selected API, selecting tools, and creating and invoking a LangGraph-based ReAct agent.
Pre-Requisites
To get started you’ll need to install LangChain and LangGraph. While installing LangChain you will also end up installing the Requests Toolkit which comes bundled with the community-developed set of LangChain toolkits. Before you can use LangChain to interact with an API, you need to obtain the OpenAPI specification for your API.
This spec provides details about the available endpoints, request methods, and data formats. Most modern APIs use OpenAPI (formerly Swagger) specifications, which are often available in JSON or YAML format. For this example, we will just be using the JSON Placeholder API.
It is recommended you familiarize yourself a little with the API yourself by sending a few sample queries to the API using Postman or otherwise.
To get started we’ll first import the relevant LangChain classes.
Then you can select the HTTP tools from the requests Toolkit.These tools include RequestsGetTool, RequestsPostTool, RequestsPatchTool, and so on. One for each of the 5 HTTP requests that you can make to a RESTful API.
Since some of these requests can lead to dangerous irreversible changes, like the deletion of critical data, we have had to actively pass the allow_dangerous_requestsparameter to enable these. The requests wrapper parameters include any authentication headers or otherwise that the API may require.
You can find more details about necessary headers in your API documentation. For the JSON Placeholder API, we’re good to go without any authentication headers.
Just to stay safe we’ll also only choose to use the POST and GET tools, which we can select by simply choosing the first 2 elements of the tools list.
Import API Specifications
Next up, we’ll get the file for our API specifications and import them into the JsonSpec format from the Langchain community.
While the JSON Placeholder API spec is small, certain API specs can be massive, and you may benefit from adjusting the max_value_length in your code accordingly. Find the JSON Placeholder spec here.
Setup ReAct Agent
A ReAct agent in LangChain is a specialized tool that combines reasoning and action. It uses a combination of a large language model’s ability to “reason” through natural language with the capability to execute actions based on that reasoning. And when it gets the results of its actions it can react to them (pun intended) and choose the next appropriate action.
We’ll get started with a simple ReAct agent pre-provided within LangGraph.
The create_react_agent prebuilt function generates a LangGraph agent which prompted by the user query starts interactions with the AI agent and keeps on looping between tools as long as every AI agent call generates a tool request (i.e. requires a tool to be used).
Typically, the AI agent will end the process with the responses from tools (API requests in our case) containing the response to the user’s query.
Invoking your ReAct Agent
Once your ReAct agent is set up, you can invoke it to perform API requests. This is a simple step.
eventsis a Python generator object which you can invoke step by step in a for-loop, as it executes the next step in its process, every time the loop completes one iteration.
You can also receive the response more simply to be passed onto another API or interface by storing the final result from the LLM call into a single variable this way:
Conclusion
Using LangChain’s Requests toolkit to execute API requests with natural language opens up new possibilities for interacting with data. By understanding your API spec, carefully selecting tools, and leveraging a ReAct agent, you can streamline how you interact with APIs, making data access and manipulation more intuitive and efficient.
I have managed to test this functionality with a variety of other APIs and approaches. While other approaches like OpenAPI toolkit, Gorilla, RestGPT, and API chains exist, the Requests Toolkit leveraging a LangGraph-based ReAct agent seems to be the most effective, and reliable way to integrate natural language processing with API interactions.
In my usage, it has worked for various APIs including but not limited to APIs from Slack, ClinicalTrials.gov, TMDB, and OpenAI. Feel free to initiate discussions below and share your experiences with other APIs.
Applications powered by large language models (LLMs) are revolutionizing the way businesses operate, from automating customer service to enhancing data analysis. In today’s fast-paced technological landscape, staying ahead means leveraging these powerful tools to their full potential.
For instance, a global e-commerce company striving to provide exceptional customer support around the clock can implement LangChain to develop an intelligent chatbot. It will ensure seamless integration of the business’s internal knowledge base and external data sources.
As a result, the enterprise can build a chatbot capable of understanding and responding to customer inquiries with context-aware, accurate information, significantly reducing response times and enhancing customer satisfaction.
LangChain stands out by simplifying the development and deployment of LLM-powered applications, making it easier for businesses to integrate advanced AI capabilities into their processes.
In this blog, we will explore what is LangChain, its key features, benefits, and practical use cases. We will also delve into related tools like LlamaIndex, LangGraph, and LangSmith to provide a comprehensive understanding of this powerful framework.
What is LangChain?
LangChain is an innovative open-source framework crafted for developing powerful applications using LLMs. These advanced AI systems, trained on massive datasets, can produce human-like text with remarkable accuracy.
It makes it easier to create LLM-driven applications by providing a comprehensive toolkit that simplifies the integration and enhances the functionality of these sophisticated models.
LangChain was launched by Harrison Chase and Ankush Gola in October 2022. It has gained popularity among developers and AI enthusiasts for its robust features and ease of use.
Its initial goal was to link LLMs with external data sources, enabling the development of context-aware, reasoning applications. Over time, LangChain has advanced into a useful toolkit for building LLM-powered applications.
By integrating LLMs with real-time data and external knowledge bases, LangChain empowers businesses to create more sophisticated and responsive AI applications, driving innovation and improving service delivery across various sectors.
What are the Features of LangChain?
LangChain is revolutionizing the development of AI applications with its comprehensive suite of features. From modular components that simplify complex tasks to advanced prompt engineering and seamless integration with external data sources, LangChain offers everything developers need to build powerful, intelligent applications.
1. Modular Components
LangChain stands out with its modular design, making it easier for developers to build applications.
Imagine having a box of LEGO bricks, each representing a different function or tool. With LangChain, these bricks are modular components, allowing you to snap them together to create sophisticated applications without needing to write everything from scratch.
For example, if you’re building a chatbot, you can combine modules for natural language processing (NLP), data retrieval, and user interaction. This modularity ensures that you can easily add, remove, or swap out components as your application’s needs change.
Ease of Experimentation
This modular design makes the development an enjoyable and flexible process. The LangChain framework is designed to facilitate easy experimentation and prototyping.
For instance, if you’re uncertain which language model will give you the best results, LangChain allows you to quickly swap between different models without rewriting your entire codebase. This ease of experimentation is useful in AI development where rapid iteration and testing are crucial.
Thus, by breaking down complex tasks into smaller, manageable components and offering an environment conducive to experimentation, LangChain empowers developers to create innovative, high-quality applications efficiently.
2. Integration with External Data Sources
LangChain excels in integrating with external data sources, creating context-aware applications that are both intelligent and responsive. Let’s dive into how this works and why it’s beneficial.
Data Access
The framework is designed to support extensive data access from external sources. Whether you’re dealing with file storage services like Dropbox, Google Drive, and Microsoft OneDrive, or fetching information from web content such as YouTube and PubMed, LangChain has you covered.
It also connects effortlessly with collaboration tools like Airtable, Trello, Figma, and Notion, as well as databases including Pandas, MongoDB, and Microsoft databases. All you need to do is configure the necessary connections. LangChain takes care of data retrieval and providing accurate responses.
Rich Context-Aware Responses
Data access is not the only focal point, it is also about enhancing the response quality using the context of information from external sources. When your application can tap into a wealth of external data, it can provide answers that are not only accurate but also contextually relevant.
By enabling rich and context-aware responses, LangChain ensures that applications are informative, highly relevant, and useful to their users. This capability transforms simple data retrieval tasks into powerful, intelligent interactions, making LangChain an invaluable tool for developers across various industries.
For instance, a healthcare application could integrate patient data from a secure database with the latest medical research. When a doctor inquires about treatment options, the application provides suggestions based on the patient’s history and the most recent studies, ensuring that the doctor has the best possible information.
3. Prompt Engineering
Prompt engineering is one of the coolest aspects of working with LangChain. It’s all about crafting the right instructions to get the best possible responses from LLMs. Let’s unpack this with two key elements: advanced prompt engineering and the use of prompt templates.
Advanced Prompt Engineering
LangChain takes prompt engineering to the next level by providing robust support for creating and refining prompts. It helps you fine-tune the questions or commands you give to your LLMs to get the most accurate and relevant responses, ensuring your prompts are clear, concise, and tailored to the specific task at hand.
For example, if you’re developing a customer service chatbot, you can create prompts that guide the LLM to provide helpful and empathetic responses. You might start with a simple prompt like, “How can I assist you today?” and then refine it to be more specific based on the types of queries your customers commonly have.
LangChain makes it easy to continuously tweak and improve these prompts until they are just right.
Bust some major myths about prompt engineering here
Prompt Templates
Prompt templates are pre-built structures that you can use to consistently format your prompts. Instead of crafting each prompt from scratch, you can use a template that includes all the necessary elements and just fill in the blanks.
For instance, if you frequently need your LLM to generate fun facts about different animals, you could create a prompt template like, “Tell me an {adjective} fact about {animal}.”
When you want to use it, you simply plug in the specifics: “Tell me an interesting fact about zebras.” This ensures that your prompts are always well-structured and ready to go, without the hassle of constant rewriting.
These templates are especially handy because they can be shared and reused across different projects, making your workflow much more efficient. LangChain’s prompt templates also integrate smoothly with other components, allowing you to build complex applications with ease.
Whether you’re a seasoned developer or just starting out, these tools make it easier to harness the full power of LLMs.
4. Retrieval Augmented Generation (RAG)
RAG combines the power of retrieving relevant information from external sources with the generative capabilities of large language models (LLMs). Let’s explore why this is so important and how LangChain makes it all possible.
RAG Workflows
RAG is a technique that helps LLMs fetch relevant information from external databases or documents to ground their responses in reality. This reduces the chances of “hallucinations” – those moments when the AI just makes things up – and improves the overall accuracy of its responses.
Imagine you’re using an AI assistant to get the latest financial market analysis. Without RAG, the AI might rely solely on outdated training data, potentially giving you incorrect or irrelevant information. But with RAG, the AI can pull in the most recent market reports and data, ensuring that its analysis is accurate and up-to-date.
integrating various document sources, databases, and APIs to retrieve the latest information
uses advanced search algorithms to query the external data sources
processing of retrieved information and its incorporation into the LLM’s generative process
Hence, when you ask the AI a question, it doesn’t just rely on what it already “knows” but also brings in fresh, relevant data to inform its response. It transforms simple AI responses into well-informed, trustworthy interactions, enhancing the overall user experience.
5. Memory Capabilities
LangChain excels at handling memory, allowing AI to remember previous conversations. This is crucial for maintaining context and ensuring relevant and coherent responses over multiple interactions. The conversation history is retained by recalling recent exchanges or summarizing past interactions.
It makes the interactions with AI more natural and engaging. This makes LangChain particularly useful for customer support chatbots, enhancing user satisfaction by maintaining context over multiple interactions.
6. Deployment and Monitoring
With the integration of LangSmith and LangServe, the LangChain framework has the potential to assist you in the deployment and monitoring of AI applications.
LangSmith is essential for debugging, testing, and monitoring LangChain applications through a unified platform for inspecting chains, tracking performance, and continuously optimizing applications. It allows you to catch issues early and ensure smooth operation.
Meanwhile, LangServe simplifies deployment by turning any LangChain application into a REST API, facilitating integration with other systems and platforms and ensuring accessibility and scalability.
Collectively, these features make LangChain a useful tool to build and develop AI applications using LLMs.
Benefits of Using LangChain
LangChain offers a multitude of benefits that make it an invaluable tool for developers working with large language models (LLMs). Let’s dive into some of these key advantages and understand how they can transform your AI projects.
Enhanced Language Understanding and Generation
LangChain enhances language understanding and generation by integrating various models, allowing developers to leverage the strengths of each. It leads to improved language processing, resulting in applications that can comprehend and generate human-like language in a natural and meaningful manner.
Customization and Flexibility
LangChain’s modular structure allows developers to mix and match building blocks to create tailored solutions for a wide range of applications.
Whether developing a simple FAQ bot or a complex system integrating multiple data sources, LangChain’s components can be easily added, removed, or replaced, ensuring the application can evolve over time without requiring a complete overhaul, thus saving time and resources.
Streamlined Development Process
It streamlines the development process by simplifying the chaining of various components, offering pre-built modules for common tasks like data retrieval, natural language processing, and user interaction.
This reduces the complexity of building AI applications from scratch, allowing developers to focus on higher-level design and logic. This chaining construct not only accelerates development but also makes the codebase more manageable and less prone to errors.
Improved Efficiency and Accuracy
The framework enhances efficiency and accuracy in language tasks by combining multiple components, such as using a retrieval module to fetch relevant data and a language model to generate responses based on that data. Moreover, the ability to fine-tune each component further boosts overall performance, making LangChain-powered applications highly efficient and reliable.
Versatility Across Sectors
LangChain is a versatile framework that can be used across different fields like content creation, customer service, and data analytics. It can generate high-quality content and social media posts, power intelligent chatbots, and assist in extracting insights from large datasets to predict trends. Thus, it can meet diverse business needs and drive innovation across industries.
These benefits make LangChain a powerful tool for developing advanced AI applications. Whether you are a developer, a product manager, or a business leader, leveraging LangChain can significantly elevate your AI projects and help you achieve your goals more effectively.
Supporting Frameworks in the LangChain Ecosystem
Different frameworks support the LangChain system to harness the full potential of the toolkit. Among these are LangGraph, LangSmith, and LangServe, each one offering unique functionalities. Here’s a quick overview of their place in the LangChain ecosystem.
LangServe: Deploys runnables and chains as REST APIs, enabling scalable, real-time integrations for LangChain-based applications.
LangGraph: Extends LangChain by enabling the creation of complex, multi-agent workflows, allowing for more sophisticated and dynamic agent interactions.
LangSmith: Complements LangChain by offering tools for debugging, testing, evaluating, and monitoring, ensuring that LLM applications are robust and perform reliably in production.
Now let’s explore each tool and its characteristics.
LangServe
It is a component of the LangChain framework that is designed to convert LangChain runnables and chains into REST APIs. This makes applications easy to deploy and access for real-time interactions and integrations.
By handling the deployment aspect, LangServe allows developers to focus on optimizing their applications without worrying about the complexities of making them production-ready. It also assists in deploying applications as accessible APIs.
This integration capability is particularly beneficial for creating robust, real-time AI solutions that can be easily incorporated into existing infrastructures, enhancing the overall utility and reach of LangChain-based applications.
LangGraph
It is a framework that works with the LangChain ecosystem to enable workflows to revisit previous steps and adapt based on new information, assisting in the design of complex multi-agent systems. By allowing developers to use cyclical graphs, it brings a level of sophistication and adaptability that’s hard to achieve with traditional methods.
LangGraph offers built-in state persistence and real-time streaming, allowing developers to capture and inspect the state of an agent at any specific point, facilitating debugging and ensuring traceability. It enables human intervention in agent workflows for the approval, modification, or rerouting of actions planned by agents.
LangGraph’s advanced features make it ideal for building sophisticated AI workflows where multiple agents need to collaborate dynamically, like in customer service bots, research assistants, and content creation pipelines.
LangSmith
It is a developer platform that integrates with LangChain to create a unified development environment, simplifying the management and optimization of your LLM applications. It offers everything you need to debug, test, evaluate, and monitor your AI applications, ensuring they run smoothly in production.
LangSmith is particularly beneficial for teams looking to enhance the accuracy, performance, and reliability of their AI applications by providing a structured approach to development and deployment.
For a quick review, below is a table summarizing the unique features of each component and other characteristics.
Addressing the LlamaIndex vs LangChain Debate
LlamaIndex and LangChain are two important frameworks for deploying AI applications. Let’s take a comparative lens to compare the two tools across key aspects to understand their unique strengths and applications.
Focused Approach vs. Flexibility
LlamaIndex is designed for search and retrieval applications. Its simplified interface allows straightforward interactions with LLMs for efficient document retrieval. LlamaIndex excels in handling large datasets with high accuracy and speed, making it ideal for tasks like semantic search and summarization.
LangChain, on the other hand, offers a comprehensive and modular framework for building diverse LLM-powered applications. Its flexible and extensible structure supports a variety of data sources and services. LangChain includes tools like Model I/O, retrieval systems, chains, and memory systems for granular control over LLM integration. This makes LangChain particularly suitable for constructing more complex, context-aware applications.
Use Cases and Integrations
LlamaIndex is suitable for use cases that require efficient data indexing and retrieval. Its engines connect multiple data sources with LLMs, enhancing data interaction and accessibility. It also supports data agents that manage both “read” and “write” operations, automate data management tasks, and integrate with various external service APIs.
Whereas, LangChain excels in extensive customization and multimodal integration. It supports a wide range of data connectors for effortless data ingestion and offers tools for building sophisticated applications like context-aware query engines. Its flexibility supports the creation of intricate workflows and optimized performance for specific needs, making it a versatile choice for various LLM applications.
Performance and Optimization
LlamaIndex is optimized for high throughput and fast processing, ensuring quick and accurate search results. Its design focuses on maximizing efficiency in data indexing and retrieval, making it a robust choice for applications with significant data processing demands.
Meanwhile, with features like chains, agents, and RAG, LangChain allows developers to fine-tune components and optimize performance for specific tasks. This ensures that applications built with LangChain can efficiently handle complex queries and provide customized results.
Hence, the choice between these two frameworks is dependent on your specific project needs. While LlamaIndex is the go-to framework for applications that require efficient data indexing and retrieval, LangChain stands out for its flexibility and ability to build complex, context-aware applications with extensive customization options.
Both frameworks offer unique strengths, and understanding these can help developers align their needs with the right tool, leading to the construction of more efficient, powerful, and accurate LLM-powered applications.
Let’s look at some examples and use cases of LangChain in today’s digital world.
Customer Service
Advanced chatbots and virtual assistants can manage everything from basic FAQs to complex problem-solving. By integrating LangChain with LLMs like OpenAI’s GPT-4, businesses can develop chatbots that maintain context, offering personalized and accurate responses.
This improves customer experience and reduces the workload on human representatives. With AI handling routine inquiries, human agents can focus on complex issues that require a personal touch, enhancing efficiency and satisfaction in customer service operations.
Healthcare
It automates repetitive administrative tasks like scheduling appointments, managing medical records, and processing insurance claims. This automation streamlines operations, ensuring healthcare providers deliver timely and accurate services to patients.
Several companies have successfully implemented LangChain to enhance their operations and achieve remarkable results. Some notable examples include:
Retool
The company leveraged LangSmith to improve the accuracy and performance of its fine-tuned models. As a result, Retool delivered a better product and introduced new AI features to their users much faster than traditional methods would have allowed. It highlights that LangChain’s suite of tools can speed up the development process while ensuring high-quality outcomes.
Elastic AI Assistant
They used both LangChain and LangSmith to accelerate development and enhance the quality of their AI-powered products. The integration allowed Elastic AI Assistant to manage complex workflows and deliver a superior product experience to their customers highlighting the impact of LangChain in real-world applications to streamline operations and optimize performance.
Hence, by providing a structured approach to development and deployment, LangChain ensures that companies can build, run, and manage sophisticated AI applications, leading to improved operational efficiency and customer satisfaction.
Frequently Asked Questions (FAQs)
Q1: How does it help in developing AI applications?
LangChain provides a set of tools and components that help integrate LLMs with other data sources and computation tools, making it easier to build sophisticated AI applications like chatbots, content generators, and data retrieval systems.
Q2: Can LangChain be used with different LLMs and tools?
Absolutely! LangChain is designed to be model-agnostic as it can work with various LLMs such as OpenAI’s GPT models, Google’s Flan-T5, and others. It also integrates with a wide range of tools and services, including vector databases, APIs, and external data sources.
Q3: How can I get started with LangChain?
Getting started with LangChain is easy. You can install it via pip or conda and access comprehensive documentation, tutorials, and examples on its official GitHub page. Whether you’re a beginner or an advanced developer, LangChain provides all the resources you need to build your first LLM-powered application.
Q4: Where can I find more resources and community support for LangChain?
You can find more resources, including detailed documentation, how-to guides, and community support, on the LangChain GitHub page and official website. Joining the LangChain Discord community is also a great way to connect with other developers, share ideas, and get help with your projects.
Feel free to explore LangChain and start building your own LLM-powered applications today! The possibilities are endless, and the community is here to support you every step of the way.
To start your learning journey, join our LLM bootcamp today for a deeper dive into LangChain and LLM applications!
In the rapidly evolving world of artificial intelligence and large language models, developers are constantly seeking ways to create more flexible, powerful, and intuitive AI agents.
While LangChain has been a game-changer in this space, allowing for the creation of complex chains and agents, there’s been a growing need for even more sophisticated control over agent runtimes.
Enter LangGraph, a cutting-edge module built on top of LangChain that’s set to revolutionize how we design and implement AI workflows.
In this blog, we present a detailed LangGraph tutorial on building a chatbot, revolutionizing AI agent workflows.
Understanding LangGraph
LangGraph is an extension of the LangChain ecosystem that introduces a novel approach to creating AI agent runtimes. At its core, LangGraph allows developers to represent complex workflows as cyclical graphs, providing a more intuitive and flexible way to design agent behaviors.
The primary motivation behind LangGraph is to address the limitations of traditional directed acyclic graphs (DAGs) in representing AI workflows. While DAGs are excellent for linear processes, they fall short when it comes to implementing the kind of iterative, decision-based flows that advanced AI agents often require.
LangGraph solves this by enabling the creation of workflows with cycles, where an AI can revisit previous steps, make decisions, and adapt its behavior based on intermediate results. This is particularly useful in scenarios where an agent might need to refine its approach or gather additional information before proceeding.
Key Components of LangGraph
To effectively use LangGraph, it’s crucial to understand its fundamental components:
Nodes
Nodes in LangGraph represent individual functions or tools that your AI agent can use. These can be anything from API calls to complex reasoning tasks performed by language models. Each node is a discrete step in your workflow that processes input and produces output.
Edges
Edges connect the nodes in your graph, defining the flow of information and control. LangGraph supports two types of edges:
Simple Edges: These are straightforward connections between nodes, indicating that the output of one node should be passed as input to the next.
Conditional Edges: These are more complex connections that allow for dynamic routing based on the output of a node. This is where LangGraph truly shines, enabling adaptive workflows.
State is the information that can be passed between nodes in a whole graph. If you want to keep track of specific information during the workflow then you can use state.
There are 2 types of graphs which you can make in LangGraph:
Basic Graph: The basic graph will only pass the output of the first node to the next node because it can’t contain states.
Stateful Graph: This graph can contain a state which will be passed between nodes and you can access this state at any node.
LangGraph Tutorial Using a Simple Example: Build a Basic Chatbot
We’ll create a simple chatbot using LangGraph. This chatbot will respond directly to user messages. Though simple, it will illustrate the core concepts of building with LangGraph. By the end of this section, you will have a built rudimentary chatbot.
Start by creating a StateGraph. A StateGraph object defines the structure of our chatbot as a state machine. We’ll add nodes to represent the LLM and functions our chatbot can call and edges to specify how the bot should transition between these functions.
Every node we define will receive the current State as input and return a value that updates that state.
messages will be appended to the current list, rather than directly overwritten. This is communicated via the prebuilt add_messages function in the Annotated syntax.
Next, add a chatbot node. Nodes represent units of work. They are typically regular Python functions.
Notice how the chatbot node function takes the current State as input and returns a dictionary containing an updated messages list under the key “messages”. This is the basic pattern for all LangGraph node functions.
The add_messages function in our State will append the LLM’s response messages to whatever messages are already in the state.
Next, add an entry point. This tells our graph where to start its work each time we run it.
Similarly, set a finish point. This instructs the graph “Any time this node is run, you can exit.”
Finally, we’ll want to be able to run our graph. To do so, call “compile()” on the graph builder. This creates a “CompiledGraph” we can use invoke on our state.
You can visualize the graph using the get_graph method and one of the “draw” methods, like draw_ascii or draw_png. The draw methods each require additional dependencies.
Now let’s run the chatbot!
Tip: You can exit the chat loop at any time by typing “quit”, “exit”, or “q”.
Advanced LangGraph Techniques
LangGraph’s true potential is realized when dealing with more complex scenarios. Here are some advanced techniques:
Multi-step reasoning: Create graphs where the AI can make multiple decisions, backtrack, or explore different paths based on intermediate results.
Tool integration: Seamlessly incorporate various external tools and APIs into your workflow, allowing the AI to gather and process diverse information.
Human-in-the-loop workflows: Design graphs that can pause execution and wait for human input at critical decision points.
Dynamic graph modification: Alter the structure of the graph at runtime based on the AI’s decisions or external factors.
LangGraph’s flexibility makes it suitable for a wide range of applications:
Customer Service Bots: Create intelligent chatbots that can handle complex queries, access multiple knowledge bases, and escalate to human operators when necessary.
Research Assistants: Develop AI agents that can perform literature reviews, synthesize information from multiple sources, and generate comprehensive reports.
Automated Troubleshooting: Build expert systems that can diagnose and solve technical problems by following complex decision trees and accessing various diagnostic tools.
Content Creation Pipelines: Design workflows for AI-assisted content creation, including research, writing, editing, and publishing steps.
LangGraph represents a significant leap forward in the design and implementation of AI agent workflows. Enabling cyclical, state-aware graphs, opens up new possibilities for creating more intelligent, adaptive, and powerful AI systems.
As the field of AI continues to evolve, tools like LangGraph will play a crucial role in shaping the next generation of AI applications.
Whether you’re building simple chatbots or complex AI-powered systems, LangGraph provides the flexibility and power to bring your ideas to life. As we continue to explore the potential of this tool, we can expect to see even more innovative and sophisticated AI applications emerging in the near future.
Large language models (LLMs) have taken the world by storm with their ability to understand and generate human-like text. These AI marvels can analyze massive amounts of data, answer your questions in comprehensive detail, and even create different creative text formats, like poems, code, scripts, musical pieces, emails, letters, etc.
It’s like having a conversation with a computer that feels almost like talking to a real person!
However, LLMs on their own exist within a self-contained world of text. They can’t directly interact with external systems or perform actions in the real world. This is where LLM agents come in and play a transformative role.
LLM agents act as powerful intermediaries, bridging the gap between the LLM’s internal world and the vast external world of data and applications. They essentially empower LLMs to become more versatile and take action on their behalf. Think of an LLM agent as a personal assistant for your LLM, fetching information and completing tasks based on your instructions.
For instance, you might ask an LLM, “What are the next available flights to New York from Toronto?” The LLM can access and process information but cannot directly search the web – it is reliant on its training data.
An LLM agent can step in, retrieve the data from a website, and provide the available list of flights to the LLM. The LLM can then present you with the answer in a clear and concise way.
By combining LLMs with agents, we unlock a new level of capability and versatility. In the following sections, we’ll dive deeper into the benefits of using LLM agents and explore how they are revolutionizing various applications.
Benefits and Use-cases of LLM Agents
Let’s explore in detail the transformative benefits of LLM agents and how they empower LLMs to become even more powerful.
Enhanced Functionality: Beyond Text Processing
LLMs excel at understanding and manipulating text, but they lack the ability to directly access and interact with external systems. An LLM agent bridges this gap by allowing the LLM to leverage external tools and data sources.
Imagine you ask an LLM, “What is the weather forecast for Seattle this weekend?” The LLM can understand the question but cannot directly access weather data. An LLM agent can step in, retrieve the forecast from a weather API, and provide the LLM with the information it needs to respond accurately.
This empowers LLMs to perform tasks that were previously impossible, like:
Accessing and processing data from databases and APIs
Executing code
Interacting with web services
Increased Versatility: A Wider Range of Applications
By unlocking the ability to interact with the external world, LLM agents significantly expand the range of applications for LLMs. Here are just a few examples:
Data Analysis and Processing: LLMs can be used to analyze data from various sources, such as financial reports, social media posts, and scientific papers. LLM agents can help them extract key insights, identify trends, and answer complex questions.
Content Generation and Automation: LLMs can be empowered to create different kinds of content, like articles, social media posts, or marketing copy. LLM agents can assist them by searching for relevant information, gathering data, and ensuring factual accuracy.
Custom Tools and Applications: Developers can leverage LLM agents to build custom tools that combine the power of LLMs with external functionalities. Imagine a tool that allows an LLM to write and execute Python code, search for information online, and generate creative text formats based on user input.
Improved Performance: Context and Information for Better Answers
LLM agents don’t just expand what LLMs can do, they also improve how they do it. By providing LLMs with access to relevant context and information, LLM agents can significantly enhance the quality of their responses:
More Accurate Responses: When an LLM agent retrieves data from external sources, the LLM can generate more accurate and informative answers to user queries.
Enhanced Reasoning: LLM agents can facilitate a back-and-forth exchange between the LLM and external systems, allowing the LLM to reason through problems and arrive at well-supported conclusions.
Reduced Bias: By incorporating information from diverse sources, LLM agents can mitigate potential biases present in the LLM’s training data, leading to fairer and more objective responses.
Enhanced Efficiency: Automating Tasks and Saving Time
LLM agents can automate repetitive tasks that would otherwise require human intervention. This frees up human experts to focus on more complex problems and strategic initiatives. Here are some examples:
Data Extraction and Summarization: LLM agents can automatically extract relevant data from documents and reports, saving users time and effort.
Research and Information Gathering: LLM agents can be used to search for information online, compile relevant data points, and present them to the LLM for analysis.
Content Creation Workflows: LLM agents can streamline content creation workflows by automating tasks like data gathering, formatting, and initial drafts.
In conclusion, LLM agents are a game-changer, transforming LLMs from powerful text processors to versatile tools that can interact with the real world. By unlocking enhanced functionality, increased versatility, improved performance, and enhanced efficiency, LLM agents pave the way for a new wave of innovative applications across various domains.
In the next section, we’ll explore how LangChain, a framework for building LLM applications, can be used to implement LLM agents and unlock their full potential.
Implementing LLM Agents with LangChain
Now, let’s explore how LangChain, a framework specifically designed for building LLM applications, empowers us to implement LLM agents.
What is LangChain?
LangChain is a powerful toolkit that simplifies the process of building and deploying LLM applications. It provides a structured environment where you can connect your LLM with various tools and functionalities, enabling it to perform actions beyond basic text processing. Think of LangChain as a Lego set for building intelligent applications powered by LLMs.
Implementing LLM Agents with LangChain: A Step-by-Step Guide
Let’s break down the process of implementing LLM agents with LangChain into manageable steps:
Setting Up the Base LLM
The foundation of your LLM agent is the LLM itself. You can either choose an open-source model like Llama2 or Mixtral, or a proprietary model like OpenAI’s GPT or Cohere.
Defining the Tools
Identify the external functionalities your LLM agent will need. These tools could be:
APIs: Services that provide programmatic access to data or functionalities (e.g., weather API, stock market API)
Databases: Collections of structured data your LLM can access and query (e.g., customer database, product database)
Web Search Tools: Tools that allow your LLM to search the web for relevant information (e.g., duckduckgo, serper API)
Coding Tools: Tools that allow your LLM to write and execute actual code (e.g., Python REPL Tool)
You can check out LangChain’s documentation to find a comprehensive list of tools and toolkits provided by LangChain that you can easily integrate into your agent, or you can easily define your own custom tool such as a calculator tool.
Creating an Agent
This is the brain of your LLM agent, responsible for communication and coordination. The agent understands the user’s needs, selects the appropriate tool based on the task, and interprets the retrieved information for response generation.
Defining the Interaction Flow
Establish a clear sequence for how the LLM, agent, and tools interact. This flow typically involves:
Receiving a user query
The agent analyzes the query and identifies the necessary tools
The agent passes in the relevant parameters to the chosen tool(s)
The LLM processes the retrieved information from the tools
The agent formulates a response based on the retrieved information
Integration with LangChain
LangChain provides the platform for connecting all the components. You’ll integrate your LLMand chosen tools within LangChain, creating an agent that can interact with the external environment.
Testing and Refining
Once everything is set up, it’s time to test your LLM agent! Put it through various scenarios to ensure it functions as expected. Based on the results, refine the agent’s logic and interactions to improve its accuracy and performance.
By following these steps and leveraging LangChain’s capabilities, you can build versatile LLM agents that unlock the true potential of LLMs.
LangChain Implementation of an LLM Agent with tools
In the next section, we’ll delve into a practical example, walking you through a Python Notebook that implements a LangChain-based LLM agent with retrieval (RAG) and web search tools. OpenAI’s GPT-4 has been used as the LLM of choice here. This will provide you with a hands-on understanding of the concepts discussed here.
The agent has been equipped with two tools:
A retrieval tool that can be used to fetch information from a vector store of Data Science Dojo blogs on the topic of RAG. LangChain’s PyPDFLoader is used to load and chunk the PDF blog text, OpenAI embeddings are used to embed the chunks of data, and Weaviate client is used for indexing and storage of data.
A web search tool that can be used to query the web and bring up-to-date and relevant search results based on the user’s question. Google Serper API is used here as the search wrapper – you can also use duckduckgo search or Tavily API.
Below is a diagram depicting the agent flow:
Let’s now start going through the code step-by-step.
Installing Libraries
Let’s start by downloading all the necessary libraries that we’ll need. This includes libraries for handling language models, API clients, and document processing.
Importing and Setting API Keys
Now, we’ll ensure our environment has access to the necessary API keys for OpenAI and Serper by importing them and setting them as environment variables.
Documents Preprocessing: Mounting Google Drive and Loading Documents
Let’s connect to Google Drive and load the relevant documents. I‘ve stored PDFs of various Data Science Dojo blogs related to RAG, which we’ll use for our tool.Following are the links to the blogs I have used:
Using the PyPDFLoader from Langchain, we’ll extract text from each PDF by breaking them down into individual pages. This helps in processing and indexing them separately.
Embedding and Indexing through Weaviate: Embedding Text Chunks
Now we’ll use Weaviate client to turn our text chunks into embeddings using OpenAI’s embedding model. This prepares our text for efficient querying and retrieval.
Setting Up the Retriever
With our documents embedded, let’s set up the retriever which will be crucial for fetching relevant information based on user queries.
Defining Tools: Retrieval and Search Tools Setup
Next, we define two key tools: one for retrieving information from our indexed blogs, and another for performing web searches for queries that extend beyond our local data.
Adding Tools to the List
We then add both tools to our tool list, ensuring our agent can access these during its operations.
Setting up the Agent: Creating the Prompt Template
Let’s create a prompt template that guides our agent on how to handle different types of queries using the tools we’ve set up.
Initializing the LLM with GPT-4
For the best performance, I used GPT-4 as the LLM of choice as GPT-3.5 seemed to struggle with routing to tools correctly and would go back and forth between the two tools needlessly.
Creating and Configuring the Agent
With the tools and prompt template ready, let’s construct the agent. This agent will use our predefined LLM and tools to handle user queries.
Invoking the Agent: Agent Response to a RAG-related Query
Let’s put our agent to the test by asking a question about RAG and observing how it uses the tools to generate an answer.
Agent Response to an Unrelated Query
Now, let’s see how our agent handles a question that’s not about RAG. This will demonstrate the utility of our web search tool.
That’s all for the implementation of an LLM Agent through LangChain. You can find the full code here.
This is, of course, a very basic use case but it is a starting point. There is a myriad of stuff you can do using agents and LangChain has several cookbooks that you can check out. The best way to get acquainted with any technology is to actually get your hands dirty and use the technology in some way.
I’d encourage you to look up further tutorials and notebooks using agents and try building something yourself. Why not try delegating a task to an agent that you yourself find irksome – perhaps an agent can take off its burden from your shoulders!
LLM agents: A building block for LLM applications
To sum it up, LLM agents are a crucial element for building LLM applications. As you navigate through the process, make sure to consider the role and assistance they have to offer.
Large language models (LLMs), such as OpenAI’s GPT-4, are swiftly metamorphosing from mere text generators into autonomous, goal-oriented entities displaying intricate reasoning abilities. This crucial shift carries the potential to revolutionize the manner in which humans connect with AI, ushering us into a new frontier.
This blog will break down the working of these agents, illustrating the impact they impart on what is known as the ‘Lang Chain’.
Working of the agents
Our exploration into the realm of LLM agents begins with understanding the key elements of their structure, namely the LLM core, the Prompt Recipe, the Interface and Interaction, and Memory. The LLM core forms the fundamental scaffold of an LLM agent. It is a neural network trained on a large dataset, serving as the primary source of the agent’s abilities in text comprehension and generation.
The functionality of these agents heavily relies on prompt engineering. Prompt recipes are carefully crafted sets of instructions that shape the agent’s behaviors, knowledge, goals, and persona and embed them in prompts.
The agent’s interaction with the outer world is dictated by its user interface, which could vary from command-line, graphical, to conversational interfaces. In the case of fully autonomous agents, prompts are programmatically received from other systems or agents.
Another crucial aspect of their structure is the inclusion of memory, which can be categorized into short-term and long-term. While the former helps the agent be aware of recent actions and conversation histories, the latter works in conjunction with an external database to recall information from the past.
Creating robust and capable LLM agents demands integrating the core LLM with additional components for knowledge, memory, interfaces, and tools.
The LLM forms the foundation, while three key elements are required to allow these agents to understand instructions, demonstrate essential skills, and collaborate with humans: the underlying LLM architecture itself, effective prompt engineering, and the agent’s interface.
Tools
Tools are functions that an agent can invoke. There are two important design considerations around tools:
Giving the agent access to the right tools
Describing the tools in a way that is most helpful to the agent
Without thinking through both, you won’t be able to build a working agent. If you don’t give the agent access to a correct set of tools, it will never be able to accomplish the objectives you give it. If you don’t describe the tools well, the agent won’t know how to use them properly. Some of the vital tools a working agent needs are:
1. SerpAPI: This page covers how to use the SerpAPI search APIs within Lang Chain. It is broken into two parts: installation and setup, and then references to the specific SerpAPI wrapper. Here are the details for its installation and setup:
Install requirements with pip install google-search-results
Get a SerpAPI API key and either set it as an environment variable (SERPAPI_API_KEY)
You can also easily load this wrapper as a tool (to use with an agent). You can do this with:
2. Math-tool: The llm-math tool wraps an LLM to do math operations. It can be loaded into the agent tools like:
Python-REPL tool: Allows agents to execute Python code. To load this tool, you can use:
The action of python REPL allows agent to execute the input code and provide the response.
The impact of agents:
A noteworthy advantage of LLM agents is their potential to exhibit self-initiated behaviors ranging from purely reactive to highly proactive. This can be harnessed to create versatile AI partners capable of comprehending natural language prompts and collaborating with human oversight.
LLM agents leverage LLMs innate linguistic abilities to understand instructions, context, and goals, operate autonomously and semi-autonomously based on human prompts, and harness a suite of tools such as calculators, APIs, and search engines to complete assigned tasks, making logical connections to work towards conclusions and solutions to problems. Here are few of the services that are highly dominated by the use of Lang Chain agents:
Facilitating language services
Agents play a critical role in delivering language services such as translation, interpretation, and linguistic analysis. Ultimately, this process steers the actions of the agent through the encoding of personas, instructions, and permissions within meticulously constructed prompts.
Users effectively steer the agent by offering interactive cues following the AI’s responses. Thoughtfully designed prompts facilitate a smooth collaboration between humans and AI. Their expertise ensures accurate and efficient communication across diverse languages.
Quality assurance and validation
Ensuring the accuracy and quality of language-related services is a core responsibility. Agents verify translations, validate linguistic data, and maintain high standards to meet user expectations. Agents can manage relatively self-contained workflows with human oversight.
Use internal validation to verify the accuracy and coherence of their generated content. Agents undergo rigorous testing against various datasets and scenarios. These tests validate the agent’s ability to comprehend queries, generate accurate responses, and handle diverse inputs.
Types of agents
Agents use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning a response to the user. Here are the agents available in Lang Chain.
Zero-Shot ReAct: This agent uses the ReAct framework to determine which tool to use based solely on the tool’s description. Any number of tools can be provided. This agent requires that a description is provided for each tool. Below is how we can set up this Agent:
Let’s invoke this agent and check if it’s working in chain
This will invoke the agent.
Structured-Input ReAct: The structured tool chat agent is capable of using multi-input tools. Older agents are configured to specify an action input as a single string, but this agent can use a tool’s argument schema to create a structured action input. This is useful for more complex tool usage, like precisely navigating around a browser. Here is how one can setup the React agent:
The further necessary imports required are:
Setting up parameters:
Creating the agent:
Improving performance of an agent
Enhancing the capabilities of agents in Large Language Models (LLMs) necessitates a multi-faceted approach. Firstly, it is essential to keep refining the art and science of prompt engineering, which is a key component in directing these systems securely and efficiently. As prompt engineering improves, so does the competencies of LLM agents, allowing them to venture into new spheres of AI assistance.
Secondly, integrating additional components can expand agents’ reasoning and expertise. These components include knowledge banks for updating domain-specific vocabularies, lookup tools for data gathering, and memory enhancement for retaining interactions.
Thus, increasing the autonomous capabilities of agents requires more than just improved prompts; they also need access to knowledge bases, memory, and reasoning tools.
Lastly, it is vital to maintain a clear iterative prompt cycle, which is key to facilitating natural conversations between users and LLM agents. Repeated cycling allows the LLM agent to converge on solutions, reveal deeper insights, and maintain topic focus within an ongoing conversation.
Conclusion
The advent of large language model agents marks a turning point in the AI domain. With increasing advances in the field, these agents are strengthening their footing as autonomous, proactive entities capable of reasoning and executing tasks effectively.
The application and impact of Large Language Model agents are vast and game-changing, from conversational chatbots to workflow automation. The potential challenges or obstacles include ensuring the consistency and relevance of the information the agent processes, and the caution with which personal or sensitive data should be treated. The promising future outlook of these agents is the potentially increased level of automated and efficient interaction humans can have with AI.
In this blog, we are enhancing our Language Model (LLM) experience by adopting the Retrieval-Augmented Generation (RAG) approach! Let’s explore RAG in LLM for enhanced results!
We’ll explore the fundamental architecture of RAG conceptually and delve deeper by implementing it through the LangChain orchestration framework and leveraging an open-source model from Hugging Face for both question-answering and text embedding.
So, let’s get started!
Common Hallucinations in Large Language Models
The most common problem faced by state-of-the-art LLMs is that they produce inaccurate or hallucinated responses. This mostly occurs when prompted with information not present in their training set, despite being trained on extensive data.
This discrepancy between the general knowledge embedded in the LLM’s weights and newer information can be bridged using RAG. The solution provided by RAG eliminates the need for computationally intensive and expertise-dependent fine-tuning, offering a more flexible approach to adapting to evolving information.
Retrieval Augmented Generation involves enhancing the output of Large Language Models (LLMs) by providing them with additional information from an external knowledge source.
Explore LLM context augmentation techniques like RAG and fine-tuning in detail with out podcast now!
This method aims to improve the accuracy and contextuality of LLM-generated responses while minimizing factual inaccuracies. RAG empowers language models to sidestep the need for retraining, facilitating access to the most up-to-date information to produce trustworthy outputs through retrieval-based generation.
The Architecture of RAG Approach
Figure from Lang chain documentation
Prerequisites for Code Implementation
1. HuggingFace Account and LLAMA2 Model Access:
Create a Hugging Face account (free sign-up available) to access open-source Llama 2 and embedding models.
Request access to LLAMA2 models using this form (access is typically granted within a few hours).
After gaining access to Llama 2 models, please proceed to the provided link, select the checkbox to indicate your agreement to the information, and then click ‘Submit’.
2. Google Colab Account:
Create a Google account if you don’t already have one.
In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4 for faster execution of code.
4. Library and Dependency Installation:
Install necessary libraries and dependencies using the following command:
5. Authentication with HuggingFace:
Integrate your Hugging Face token into Colab’s environment:
When prompted, enter your Hugging Face token obtained from the “Access Token” tab in your Hugging Face settings.
A 5-Step Guide to Implement RAG in LLM
Step 1: Document Loading
Loading a document refers to the process of retrieving and storing data as documents in memory from a specified source. This process is typically facilitated by document loaders, which provide a “load” method for accessing and loading documents into the memory.
Lang chain has a number of document loaders in this example we will be using the “WebBaseLoader” class from the “langchain.document_loaders” module to load content from a specific web page.
The code extracts content from the web page “https://lilianweng.github.io/posts/2023-06-23-agent/“. BeautifulSoup (`bs4`) is employed for HTML parsing, focusing on elements with the classes “post-content”, “post-title”, and “post-header.” The loaded content is stored in the variable `docs`.
After loading the data, it can be transformed to fit the application’s requirements or to extract relevant portions. This involves splitting lengthy documents into smaller chunks that are compatible with the model and produce accurate and clear results.
LangChain offers various text splitters, in this implementation we chose the “RecursiveCharacterTextSplitter” for generic text processing.
The code breaks documents into chunks of 1000 characters with a 200-character overlap. This chunking is employed for embedding and vector storage, enabling more focused retrieval of relevant content during runtime.
The recursive splitter ensures chunks maintain contextual integrity by using common separators, like new lines until the desired chunk size is achieved.
Step 3: Storage in Vector Database
After extracting text chunks, we store and index them for future searches using the RAG application. A common approach involves embedding the content of each split and storing these embeddings in a vector store.
When searching, we embed the search query and perform a similarity search to identify stored splits with embeddings most similar to the query embedding. Cosine similarity, which measures the angle between embeddings, is a simple similarity measure.
Using the Chroma vector store and open source “HuggingFaceEmbeddings” in the Langchain, we can embed and store all document splits in a single command.
Text Embedding:
Text embedding converts textual data into numerical vectors that capture the semantic meaning of the text. This enables efficient identification of similar text pieces. An embedding model is a variant of Language Models (LLMs) specifically designed for this purpose.
LangChain’s Embeddings class facilitates interaction with various text embedding models. While any model can be used, we opted for “HuggingFaceEmbeddings”.
This code initializes an instance of the HuggingFaceEmbeddings class, configuring it with an open-source pre-trained model located at “sentence-transformers/all-MiniLM-l6-v2“. By doing this text embedding is created for converting textual data into numerical vectors.
Vector Stores:
Vector stores are specialized databases designed to efficiently store and search for high-dimensional vectors, such as text embeddings. They enable the retrieval of the most similar embedding vectors based on a given query vector. LangChain integrates with various vector stores, and we are using the “Chroma” vector store for this task.
This code utilizes the Chroma class to create a vector store (vectorstore) from the previously split documents (splits) using the specified embeddings (embeddings). The Chroma vector store facilitates efficient storage and retrieval of document vectors for further processing.
Step 4: Retrieval of Text Chunks
After storing the data, preparing the LLM model, and constructing the pipeline, we need to retrieve the data. Retrievers serve as interfaces that return documents based on a query.
Retrievers cannot store documents; they can only retrieve them. Vector stores form the foundation of retrievers. LangChain offers a variety of retriever algorithms, here is the one we implement.
Step 5: Generation of Answer with RAG Approach
Preparing the LLM Model:
In the context of Retrieval Augmented Generation (RAG), an LLM model plays a crucial role in generating comprehensive and informative responses to user queries. By leveraging its ability to process and understand natural language, the LLM model can effectively combine retrieved documents with the given query to produce insightful and relevant outputs.
These lines import the necessary libraries for handling pre-trained models and tokenization. The specific model “meta-llama/Llama-2-7b-chat-hf” is chosen for its question-answering capabilities.
This code defines a transformer pipeline, which encapsulates the pre-trained HuggingFacemodel and its associated configuration. It specifies the task as “text-generation” and sets various parameters to optimize the pipeline’s performance.
This line creates a Lang Chain pipeline (HuggingFace Pipeline) that wraps the transformer pipeline. The model_kwargs parameter adjusts the model’s “temperature” to control its creativity and randomness.
Retrieval QA Chain:
To combine question-answering with a retrieval step, we employ the RetrievalQA chain, which utilizes a language model and a vector database as a retriever. By default, we process all data in a single batch and set the chain type to “stuff” when interacting with the language model.
This code initializes a RetrievalQA instance by specifying a chain type (“stuff”), a HuggingFacePipeline (llm), and a retriever (retriever-initialize previously in the code from vectorstore). The return_source_documents parameter is set to True to include source documents in the output, enhancing contextual information retrieval.
Finally, we call this QA chain with the specific question we want to ask.
The result will be:
We can print source documents to see which document chunks the model used to generate the answer to this specific query.
In this output, only 2 out of 4 document contents are shown as an example, that were retrieved to answer the specific question.
Conclusion
In conclusion, by embracing the Retrieval-Augmented Generation (RAG) approach, we have elevated our Language Model (LLM) experience to new heights.
Through a deep dive into the conceptual foundations of RAG and practical implementation using the Lang Chain orchestration framework, coupled with the power of an open-source model from Hugging Face, we have enhanced the question-answering capabilities of LLMs.
This journey exemplifies the seamless integration of innovative technologies to optimize LLM capabilities, paving the way for a more efficient and powerful language processing experience. Cheers to the exciting possibilities that arise from combining innovative approaches with open-source resources!
In this blog, we delve into Large Language Model Evaluation and Tracing with LangSmith, emphasizing their pivotal role in ensuring application reliability and performance.
You’ll learn to set up LangSmith, connect it with LangChain, and master the process of precise tracing and evaluation, equipping you with the tools to optimize your Large Language Model applications and bring them to production. Discover the key to unlock your model’s full potential.
Whether you’re an experienced developer or just starting your journey, LangSmith’s private beta provides a valuable tool for your toolkit.
Understanding the significance of evaluation and tracing is key to improving Large Language Model applications, ensuring the reliability, correctness, and performance of your models. This is a critical step in the development process, particularly if you’re working towards bringing your LLM application to production.
LangSmith and LangChain in LLM application
In working on Large Language Models (LLMs), LangChain and LangSmith stand as key pillars for developers and AI enthusiasts.
LangChain simplifies the integration of powerful LLMs into applications, streamlining data access, and offering flexibility through concepts like “Chains” and “Agents.” It bridges the gap between these models and external data sources, enabling the creation of robust natural language processing applications.
LangSmith, developed by LangChain, takes LLM application development to the next level. It aids in debugging, monitoring, and evaluating LLM-based applications, with features like logging runs, visualizing components, and facilitating collaboration. It ensures the reliability and efficiency of your LLM applications.
These two tools together form a dynamic duo, unleashing the true potential of large language models in application development. In the upcoming sections, we’ll delve deeper into the mechanics, showcasing how they can elevate your LLM projects to new heights.
Quick start to LangSmith
Prerequisites
Please note that LangSmith is currently in a private beta phase, so we’ll show you how to join the waitlist. Once LangSmith releases new invites, you’ll be at the forefront of this innovative platform.
Configuring LangSmith alongside LangChain is a straightforward procedure. It merely involves a few simple steps to establish LangSmith and start utilizing it for tracing and evaluation.
To initiate your journey, follow the sequential steps provided below:
Begin by creating a LangSmith account, as outlined in the prerequisites
In your working folder, create .env file containing essential environment variables. Although initial placeholders are provided, these will be replaced in subsequent steps:
Substitute the placeholder <your-openai-api-key> with your OpenAI API key obtained from OpenAI.
For the LangChain API key, navigate to the settings page on LangSmith, generate the key, and replace the placeholder.
Return to the home page and create a project with a suitable name. Subsequently, copy the project name and update the placeholder.
Install it and any other necessary dependencies with the following command:
Execute the provided example code to initiate the process:
After running the code, return to the LangSmith home page, and access the project you just created.
Within the “Traces” section, you will find the run that was recently executed. Click on it to access detailed trace information.
Congratulations, your initial run is now visible and traceable within LangSmith!
Scenario # 01: LLM Tracing
What is a trace?
A ‘Run’ signifies a solitary instance of a task or operation within your LLM application. This could be anything from a single call to an LLM, chain, or agent.
A ‘Trace’ encompasses an arrangement of runs structured in a hierarchical or interconnected manner. The highest-level run in a trace, known as the ‘Root Run,’ is the one directly triggered by the user or application. The root run is designated with an execution order of 1, indicating the order in which it was initiated within the trace when considered as a sequence.
Examples of traces
We’ve already examined a straightforward LLM Call trace, where we observed the input provided to the large language model and the resulting output. In this uncomplicated case, a single run was evident, devoid of any hierarchical or multiple run structures.
Now, let’s delve further by tracing the LangChain chain and agent to uncover deeper insights into their operations.
Trace a sequential chain
In this instance, we explore the tracing of a sequential chain within LangChain, a foundational chain of this platform. Sequential chains enable the connection of multiple chains, creating complex pipelines for specific scenarios. Detailed information on this can be found here.
Let’s run this example of a sequential chain and see what we get in the trace.
Upon executing the code for this sequential chain and returning to our project, a new trace, ‘SimpleSequentialChain,’ becomes visible.
Upon examination, this trace reveals a collection of LLM calls, featuring two distinct LLM call runs within its hierarchy.
This delineation of execution order becomes apparent; in our example, the initial run entails extracting a title and constructing a synopsis, as displayed in the provided screenshot.
Subsequently, the second run utilizes the synopsis and the output from the first run to generate a review.
This meticulous tracing mechanism grants us the ability to inspect intermediate results, the messages transmitted to the LLM, and the outputs at each step, all while offering insights into token counts and latency measures. Furthermore, the option to filter traces based on various parameters adds an additional layer of customization and control.
Trace an agent
In this segment, we embark on a journey to trace an agent’s inner workings using LangSmith. For those keen to delve deeper into the world of agents, you’ll find comprehensive documentation in LangChain.
To provide a brief overview, we’ve engineered a ZeroShotAgent, equipping it with tools like DuckDuckGo search and paraphrasing capabilities. The agent interacts with user queries, employing these tools in a ReAct(Reason + Act) manner to generate a response.
Here is the code for the agent:
By tracing the agent’s actions, we gain insights into the sequence and tools utilized by the agent, as well as the intermediate outputs it produces. This tracing capability proves invaluable for agent design and debugging, allowing us to identify and resolve errors efficiently.
The trace reveals that the agent initiates with an LLM call, proceeds to search for DuckDuckGo Results Json, engages the paraphraser, and subsequently executes two additional LLM calls to generate responses, which in our case are the suggested blog topics.
These traces underscore the critical role tracing plays in debugging and designing effective LLM applications. It’s important to note that all this information is meticulously logged in LangSmith, offering a treasure trove of insights for various applications, which we’ll briefly explore in subsequent sections.
Sharing your trace
LangSmith simplifies the process of sharing the logged runs. This feature facilitates easy publishing and replication of your work. For example, if you encounter a bug or unexpected output under specific conditions, you can share it with your team or create an issue on LangChain for collaborative troubleshooting.
By simply clicking the share option located at the top right corner of the page, you can effortlessly distribute your run for analysis and resolution
Scenario # 02: Testing and evaluation
Why is testing and evaluation essential for LLMs?
The development of high-quality, production-grade Large Language Model (LLM) applications is a complex task fraught with challenges, including:
Non-deterministic Outputs: LLM models operate probabilistically, often yielding varying outputs for the same input prompt. This unpredictability persists even when utilizing a temperature setting of 0, as model weights are not static over time.
API Opacity: Models underpinning APIs undergo changes and updates, making it imperative to assess their evolving behavior.
Security Concerns: LLMs are susceptible to prompt injections, posing potential security risks.
Latency Requirements: Many applications demand swift response times.
These challenges underscore the critical need for rigorous testing and evaluation in the development of LLM applications.
Step-by-step LLM evaluation process
1. Define an LLM chain
Begin by defining an LLM and creating a simple LLM chain aimed at generating concise responses to specific queries. This LLM will serve as the subject of evaluation and testing.
2. Create a dataset
Generate a compact dataset comprising question-and-answer pairs related to computer science abbreviations and terms. This data set, containing both questions and their corresponding answers, will be used to evaluate and test the model.
After executing the code, navigate to LangSmith. Within the “Datasets & Testing” section, you’ll find the dataset you’ve created. By expanding it under “examples,” you’ll encounter the six specific examples you’ve defined for evaluation.
3. Evaluation
For our evaluations, we’ll make use of the LangChain evaluator, specifically focusing on the ‘Correctness: QA evaluation.’ QA evaluators play a vital role in assessing the accuracy of responses to user queries, especially when you have a dataset with reference labels or context documents. Our approach incorporates all three QA evaluators:
“context_qa”: This evaluator directs the LLM chain to utilize reference “context” (supplied through example outputs) to ascertain correctness.
“qa”: It prompts an LLMChain to directly appraise a response as either “correct” or “incorrect,” based on the reference answer.
“cot_qa”: This evaluator closely resembles “context_qa” but introduces a chain of thought “reasoning” before delivering a final verdict. This approach generally leads to responses that align more closely with human judgments, albeit with a slightly increased token and runtime cost.
Below is the code to kick-start the evaluation of the dataset.
4. Reviewing evaluation outcomes
Upon completing the evaluation, LangSmith provides a platform to examine the results. Navigate to the “Dataset & Testing” section, select the dataset used for the evaluation, and access “Test Runs.” You’ll find the designated Test Run Name and feedback from the evaluator.
By clicking on the Test Run Name, you can delve deeper, inspect feedback for individual examples, and view side-by-side comparisons. Clicking on any reference example reveals detailed information.
For instance, the first example received a perfect score of 1 from all three evaluators. The generated and expected outputs are presented side by side, accompanied by feedback and comments from the evaluator.
However, in a different example, one evaluator issued a score of 1, while the other two scored it as 0. Upon closer examination, it becomes apparent that there exists a disparity between the generated and expected outputs
The “cot-qa” evaluator assigned a score of 1, and further exploration of the comments reveals that, although the generated output was correct, discrepancies in the dataset contextually influenced the evaluation. It’s worth noting that the “cot-qa” evaluator spotted this, demonstrating its ability to notice context-related subtleties that other evaluators might miss.
Varied evaluation choices (Delve deeper)
The evaluator showcased in the previous example is but one of several available within LangSmith. Each option serves specific purposes and holds its unique value. For a detailed understanding of each evaluator’s specific functions and to explore illustrative examples, we encourage you to explore LangChain Evaluators where in-depth coverage of these available options is provided.
Implement the power of tracing and evaluation with LangSmith
In summary, our journey through LangSmith has underscored the critical importance of evaluating and tracing Large Language Model applications. These processes are the cornerstone of reliability and high performance, ensuring that your models meet rigorous standards.
With LangSmith, we’ve explored the power of precise tracing and evaluation, empowering you to optimize your models confidently. As you continue your exploration, remember that your LLM applications hold limitless potential, and LangSmith is your guiding light on this path of discovery.
Thank you for joining us on this transformative journey through the world of LLM Evaluation and Tracing with LangSmith.
In the dynamic realm of language models and data-driven apps, efficient orchestration frameworks are key. Explore LangChain and Llama Index, simplifying LLM-app interactions.
Large language models (LLMs) are becoming increasingly popular for a variety of tasks, such as natural language understanding, question answering, and text generation. However, LLMs can be complex and difficult to use, which is where orchestration frameworks come in.
Orchestration frameworks provide a way to manage and control LLMs. They can help to simplify the development and deployment of LLM-based applications, and they can also help to improve the performance and reliability of these applications.
There are a number of orchestration frameworks available, two of the most popular being LangChain and Llama Index.
LangChain and Orchestration Frameworks
LangChain is an open-source orchestration framework that is designed to be easy to use and scalable. It provides a number of features that make it well-suited for managing LLMs, such as:
A simple API that makes it easy to interact with LLMs
A distributed architecture that can scale to handle large numbers of LLMs
A variety of features for managing LLMs, such as load balancing, fault tolerance, and security
Llama Index is another open-source orchestration framework that is designed for managing LLMs. It provides a number of features that are similar to LangChain, such as:
A simple API
A distributed architecture
A variety of features for managing LLMs
However, Llama Index also has some unique features that make it well-suited for certain applications, such as:
The ability to query LLMs in a distributed manner
The ability to index LLMs so that they can be searched more efficiently
Both LangChain and Llama Index are powerful orchestration frameworks that can be used to manage LLMs. The best framework for a particular application will depend on the specific requirements of that application.
In addition to LangChain and Llama Index, there are a number of other orchestration frameworks available, such as Bard, Megatron, Megatron-Turing NLG and OpenAI Five. These frameworks offer a variety of features and capabilities, so it is important to choose the one that best meets the needs of your application.
LlamaIndex and LangChain: Orchestrating LLMs
The venture capital firm Andreessen Horowitz (a16z) identifies both LlamaIndex and LangChain as orchestration frameworks that abstract away the complexities of prompt chaining, enabling seamless data querying and management between applications and LLMs. This orchestration process encompasses interactions with external APIs, retrieval of contextual data from vector databases, and maintaining memory across multiple LLM calls.
LlamaIndex: A data framework for the future
LlamaIndex distinguishes itself by offering a unique approach to combining custom data with LLMs, all without the need for fine-tuning or in-context learning. It defines itself as a “simple, flexible data framework for connecting custom data sources to large language models.” Moreover, it accommodates a wide range of data types, making it an inclusive solution for diverse data needs.
Continuous evolution: LlamaIndex 0.7.0
LlamaIndex is a dynamic and evolving framework. Its creator, Jerry Liu, recently released version 0.7.0, which focuses on enhancing modularity and customizability to facilitate the development of LLM applications that leverage your data effectively. This release underscores the commitment to providing developers with tools to architect data structures for LLM applications.
The LlamaIndex Ecosystem: LlamaHub
At the core of LlamaIndex lies LlamaHub, a data ingestion platform that plays a pivotal role in getting started with the framework. LlamaHub offers a library of data loaders and readers, making data ingestion a seamless process. Notably, LlamaHub is not exclusive to LlamaIndex; it can also be integrated with LangChain, expanding its utility.
Navigating the LlamaIndex workflow
Users of LlamaIndex typically follow a structured workflow:
Parsing Documents into Nodes
Constructing an Index (from Nodes or Documents)
Optional Advanced Step: Building Indices on Top of Other Indices
Querying the Index
The querying aspect involves interactions with an LLM, where a “query” serves as an input. While this process can be complex, it forms the foundation of LlamaIndex’s functionality.
In essence, LlamaIndex empowers users to feed pertinent information into an LLM prompt selectively. Instead of overwhelming the LLM with all custom data, LlamaIndex allows users to extract relevant information for each query, streamlining the process.
Power of LlamaIndex and LangChain
LlamaIndex seamlessly integrates with LangChain, offering users flexibility in data retrieval and query management. It extends the functionality of data loaders by treating them as LangChain Tools and providing Tool abstractions to use LlamaIndex’s query engine alongside a LangChain agent.
LlamaIndex and LangChain join forces to create context-rich chatbots. Learn how these frameworks can be leveraged to build chatbots that provide enhanced contextual responses.
This comprehensive exploration unveils the potential of LlamaIndex, offering insights into its evolution, features, and practical applications.
Why are orchestration frameworks needed?
Data orchestration frameworks are essential for building applications on enterprise data because they help to:
Eliminate the need for foundation model retraining: Foundation models are large language models that are trained on massive datasets of text and code. They can be used to perform a variety of tasks, such as generating text, translating languages, and answering questions. However, foundation models can be expensive to train and retrain. Orchestration frameworks can help to reduce the need for retraining by allowing you to reuse trained models across multiple applications.
Overcome token limits: Foundation models often have token limits, which restrict the number of words or tokens that can be processed in a single request. Orchestration frameworks can help to overcome token limits by breaking down large tasks into smaller subtasks that can be processed separately.
Provide connectors for data sources: Orchestration frameworks typically provide connectors for a variety of data sources, such as databases, cloud storage, and APIs. This makes it easy to connect your data pipeline to the data sources that you need.
Reduce boilerplate code: Orchestration frameworks can help to reduce boilerplate code by providing a variety of pre-built components for common tasks, such as data extraction, transformation, and loading. This allows you to focus on the business logic of your application.
Popular orchestration frameworks
There are a number of popular orchestration frameworks available, including:
Prefect is an open-source orchestration framework that is written in Python. It is known for its ease of use and flexibility.
Airflow is an open-source orchestration framework that is written in Python. It is widely used in the enterprise and is known for its scalability and reliability.
Luigi is an open-source orchestration framework that is written in Python. It is known for its simplicity and performance.
Dagster is an open-source orchestration framework that is written in Python. It is known for its extensibility and modularity.
When choosing an orchestration framework, there are a number of factors to consider, such as:
Ease of use: The framework should be easy to use and learn, even for users with no prior experience with orchestration.
Flexibility: The framework should be flexible enough to support a wide range of data pipelines and workflows.
Scalability: The framework should be able to scale to meet the needs of your organization, even as your data volumes and processing requirements grow.
Reliability: The framework should be reliable and stable, with minimal downtime.
Community support: The framework should have a large and active community of users and contributors.
Conclusion
Orchestration frameworks are essential for building applications on enterprise data. They can help to eliminate the need for foundation model retraining, overcome token limits, connect to data sources, and reduce boilerplate code. When choosing an orchestration framework, consider factors such as ease of use, flexibility, scalability, reliability, and community support.
Before we understand LlamaIndex, let’s step back a bit. Imagine a futuristic landscape where machines possess an extraordinary ability to understand and produce human-like text effortlessly. LLMs have made this vision a reality. Armed with a vast ocean of training data, these marvels of innovation have become the crown jewels of the tech world.
There is no denying that LLMs (Large Language Models) are currently the talk of the town! From revolutionizing text generation and reasoning, LLMs are trained on massive datasets and have been making waves in the tech vicinity.
One particular LLM has emerged as a true superstar. Back in November 2022, ChatGPT, an LLM developed by OpenAI, attracted a staggering one million users within 5 days of its beta launch.
When researchers and developers saw these stats they started thinking on how we can best feed/augment these LLMs with our own private data. They started thinking about different solutions.
Finetune your own LLM. You adapt an existing LLM by training your data. But, this is very costly and time-consuming.
Combining all the documents into a single large prompt for an LLM might be possible now with the increased token limit of 100k for models. However, this approach could result in slower processing times and higher computational costs.
Instead of inputting all the data, selectively provide relevant information to the LLM prompt. Choose the useful bits for each query instead of including everything.
Option 3 appears to be both relevant and feasible, but it requires the development of a specialized toolkit. Recognizing this need, efforts have already begun to create the necessary tools.
Introducing LlamaIndex
Recently a toolkit was launched for building applications using LLM, known as Langchain. LlamaIndex is built on top of Langchain to provide a central interface to connect your LLMs with external data.
Key Components of LlamaIndex:
The key components of LlamaIndex are as follows
Data Connectors: The data connector, known as the Reader, collects data from various sources and formats, converting it into a straightforward document format with textual content and basic metadata.
Data Index: It is a data structure facilitating efficient retrieval of pertinent information in response to user queries. At a broad level, Indices are constructed using Documents and serve as the foundation for Query Engines and Chat Engines, enabling seamless interactions and question-and-answer capabilities based on the underlying data. Internally, Indices store data within Node objects, which represent segments of the original documents.
Retrievers: Retrievers play a crucial role in obtaining the most pertinent information based on user queries or chat messages. They can be constructed based on Indices or as standalone components and serve as a fundamental element in Query Engines and Chat Engines for retrieving contextually relevant data.
Query Engines: A query engine is a versatile interface that enables users to pose questions regarding their data. By accepting natural language queries, the query engine provides comprehensive and informative responses.
Chat Engines: A chat engine serves as an advanced interface for engaging in interactive conversations with your data, allowing for multiple exchanges instead of a single question-and-answer format. Similar to ChatGPT but enhanced with access to a knowledge base, the chat engine maintains a contextual understanding by retaining the conversation history and can provide answers that consider the relevant past context.
Difference between query engine and chat engine:
It is important to note that there is a significant distinction between a query engine and a chat engine. Although they may appear similar at first glance, they serve different purposes:
A query engine operates as an independent system that handles individual questions over the data without maintaining a record of the conversation history.
On the other hand, a chat engine is designed to keep track of the entire conversation history, allowing users to query both the data and previous responses. This functionality resembles ChatGPT, where the chat engine leverages the context of past exchanges to provide more comprehensive and contextually relevant answers
Customization: LlamaIndex offers customization options where you can modify the default settings, such as the utilization of OpenAI’s text-davinci-003 model. Users have the flexibility to customize the underlying language model (LLM) and other settings used in LlamaIndex, with support for various integrations and LangChain’s LLM modules.
Analysis: LlamaIndex offers a diverse range of analysis tools for examining indices and queries. These tools include features for analyzing token usage and associated costs. Additionally, LlamaIndex provides a Playground module, which presents a visual interface for analyzing token usage across different index structures and evaluating performance metrics.
Structured Outputs: LlamaIndex offers an assortment of modules that empower language models (LLMs) to generate structured outputs. These modules are available at various levels of abstraction, providing flexibility and versatility in producing organized and formatted results.
Evaluation: LlamaIndex provides essential modules for assessing the quality of both document retrieval and response synthesis. These modules enable the evaluation of “hallucination,” which refers to situations where the generated response does not align with the retrieved sources. A hallucination occurs when the model generates an answer without effectively grounding it in the given contextual information from the prompt.
Integrations: LlamaIndex offers a wide array of integrations with various toolsets and storage providers. These integrations encompass features such as utilizing vector stores, integrating with ChatGPT plugins, compatibility with Langchain, and the capability to trace with Graphsignal. These integrations enhance the functionality and versatility of LlamaIndex by allowing seamless interaction with different tools and platforms.
Callbacks: LlamaIndex offers a callback feature that assists in debugging, tracking, and tracing the internal operations of the library. The callback manager allows for the addition of multiple callbacks as required. These callbacks not only log event-related data but also track the duration and frequency of each event occurrence. Moreover, a trace map of events is recorded, providing valuable information that callbacks can utilize in a manner that best suits their specific needs.
Storage: LlamaIndex offers a user-friendly interface that simplifies the process of ingesting, indexing, and querying external data. By abstracting away complexities, LlamaIndex allows users to query their data with just a few lines of code. Behind the scenes, LlamaIndex provides the flexibility to customize storage components for different purposes. This includes document stores for storing ingested documents (represented as Node objects), index stores for storing index metadata, and vector stores for storing embedding vectors.The document and index stores utilize a shared key-value store abstraction, providing a common framework for efficient storage and retrieval of data
Now that we have explored the key components of LlamaIndex, let’s delve into its operational mechanisms and understand how it functions.
How Llama-Index Works:
To begin, the first step is to import the documents into LlamaIndex, which provides various pre-existing readers for sources like databases, Discord, Slack, Google Sheets, Notion, and the one we will utilize today, the Simple Directory Reader, among others.[Text Wrapping Break][Text Wrapping Break]You can check for more here: Llama Hub (llama-hub-ui.vercel.app)
Once the documents are loaded, LlamaIndex proceeds to parse them into nodes, which are essentially segments of text. Subsequently, an index is constructed to enable quick retrieval of relevant data when querying the documents. The index can be stored in different formats, but we will opt for a Vector Store as it is typically the most useful when querying text documents without specific limitations.
LlamaIndex is built upon LangChain, which serves as the foundational framework for a wide range of LLM applications. While LangChain provides the fundamental building blocks, LlamaIndex is specifically designed to streamline the workflow described above.
Here is an example code showcasing the utilization of the SimpleDirectoryReader data loader in LlamaIndex, along with the integration of the OpenAI language model for natural language processing.
Installing the necessary libraries required to run the code.
Importing openai library and setting the secret API (Application Programming Interface) key.
Importing the SimpleDirectoryReader class from llama_index library and loading the data from it.
Importing SimpleNodeParser class from llama_index and parsing the documents into nodes – basically in chunks of text.
Importing VectorStoreIndex class from llama_index to create index from the chunks of text so that each time when a query is placed only relevant data is sent to OpenAI. In short, for the sake of cost effectiveness.
Conclusion:
LlamaIndex, built on top of Langchain, offers a powerful toolkit for integrating external data with LLMs. By parsing documents into nodes, constructing an efficient index, and selectively querying relevant information, LlamaIndex enables cost-effective exploration of text data.
The provided code example demonstrates the utilization of LlamaIndex’s data loader and query engine, showcasing its potential for next-generation text exploration. For the notebook of the above code, refer to the source code available here.
Large language models (LLMs) like GPT-3 and GPT-4. revolutionized the landscape of NLP. These models have laid a strong foundation for creating powerful, scalable applications. However, the potential of these models isaffected by the quality of the prompt. This highlights the importance of prompt engineering.
Furthermore, real-world NLP applications often require more complexity than a single ChatGPT session can provide. This is where LangChain comes into play!
Get more information on Large Language models and its applications and tools by clicking below:
Harrison Chase’s brainchild, LangChain, is a Python library designed to help you leverage the power of LLMs to build custom NLP applications. As of May 2023, this game-changing library has already garnered almost 40,000 stars on GitHub.
Interested in learning about Large Language Models and building custom ChatGPT like applications for your business? Click below
This comprehensive beginner’s guide provides a thorough introduction to LangChain, offering a detailed exploration of its core features. It walks you through the process of building a basic application using LangChain and shares valuable tips and industry best practices to make the most of this powerful framework. Whether you’re new to Language Learning Models (LLMs) or looking for a more efficient way to develop language generation applications, this guide serves as a valuable resource to help you leverage the capabilities of LLMs with LangChain.
Overview of LangChain modules
These modules are essential for any application using the Language Model (LLM).
LangChain offers standardized and adaptable interfaces for each module. Additionally, LangChain provides external integrations and even ready-made implementations for seamless usage. Let’s delve deeper into these modules.
LLM
LLM is the fundamental component of LangChain. It is essentially a wrapper around a large language model that helps use the functionality and capability of a specific large language model.
Chains
As stated earlier, LLM (Language Model) serves as the fundamental unit within LangChain. However, in line with the “LangChain” concept, it offers the ability to link together multiple LLM calls to address specific objectives.
For instance, you may have a need to retrieve data from a specific URL, summarize the retrieved text, and utilize the resulting summary to answer questions.
On the other hand, chains can also be simpler in nature. For instance, you might want to gather user input, construct a prompt using that input, and generate a response based on the constructed prompt.
Prompts
Prompts have become a popular modeling approach in programming. It simplifies prompt creation and management with specialized classes and functions, including the essential PromptTemplate.
Document loaders and Utils
LangChain’s Document Loaders and Utils modules simplify data access and computation. Document loaders convert diverse data sources into text for processing, while the utils module offers interactive system sessions and code snippets for mathematical computations.
Vector stores
The widely used index type involves generating numerical embeddings for each document using an embedding model. These embeddings, along with the associated documents, are stored in a vector store. This vector store enables efficient retrieval of relevant documents based on their embeddings.
Agents
LangChain offers a flexible approach for tasks where the sequence of language model calls is not deterministic. Its “Agents” can act based on user input and previous responses. The library also integrates with vector databases and has memory capabilities to retain the state between calls, enabling more advanced interactions.
Building our App
Now that we’ve gained an understanding of LangChain, let’s build a PDF Q/A Bot app using LangChain and OpenAI.Let me first show you the architecture diagram for our app and then we will start with our app creation.
Below is an example code that demonstrates the architecture of a PDF Q&A chatbot. This code utilizes the OpenAI language model for natural language processing, the FAISS database for efficient similarity search, PyPDF2 for reading PDF files, and Streamlit for creating a web application interface.
The chatbot leverages LangChain’s Conversational Retrieval Chain to find the most relevant answer from a document based on the user’s question. This integrated setup enables an interactive and accurate question-answering experience for the users.
Importing necessary libraries
Import Statements: These lines import the necessary libraries and functions required to run the application.
PyPDF2: Python library used to read and manipulate PDF files.
langchain: a framework for developing applications powered by language models.
streamlit: A Python library used to create web applications quickly.
If the LangChain and OpenAI are not installed already, you first need to run the following commands in the terminal.
Setting openAI API key
You will replace the placeholder with your OpenAI API key which you can access from OpenAI API. The above line sets the OpenAI API key, which you need to use OpenAI’s language models.
Streamlit UI
These lines of code create the web interface using Streamlit. The user is prompted to upload a PDF file.
Reading the PDF file
If a file has been uploaded, this block reads the PDF file, extracts the text from each page, and concatenates it into a single string.
Text splitting
Language Models are often limited by the amount of text that you can pass to them. Therefore, it is necessary to split them up into smaller chunks. It provides several utilities for doing so.
Using a Text Splitter can also help improve the results from vector store searches, as eg. smaller chunks may sometimes be more likely to match a query. Here we are splitting the text into 1k tokens with 200 tokens overlap.
Embeddings
Here, the OpenAIEmbeddings function is used to download embeddings, which are vector representations of the text data. These embeddings are then used with FAISS to create an efficient search index from the chunks of text.
Creating conversational retrieval chain
The chains developed are modular components that can be easily reused and connected. They consist of predefined sequences of actions encapsulated in a single line of code. With these chains, there’s no need to explicitly call the GPT model or define prompt properties. This specific chain allows you to engage in conversation while referencing documents and retains a history of interactions.
Streamlit for generating responses and displaying in the App
This block prepares a response that includes the generated answer and the source documents and displays it on the web interface.
Let’s run our App
Here we uploaded a PDF, asked a question, and got our required answer with the source document. See, that is how the magic of LangChain works.
Concluding the journey! Mastering LangChain for creating a basic Q&A application has been a success. I trust you have acquired a fundamental comprehension of LangChain’s potential. Now, take the initiative to delve into LangChain further and construct even more captivating applications. Enjoy the coding adventure.