Unleashing the power of LangChain: A comprehensive guide to building custom Q&A chatbots 
Syed Hyder Ali Zaidi
| May 22, 2023

The NLP landscape has been revolutionized by the advent of large language models (LLMs) like GPT-3 and GPT-4. These models have laid a strong foundation for creating powerful, scalable applications. However, the potential of these models is greatly influenced by the quality of the prompt, highlighting the importance of prompt engineering. Furthermore, real-world NLP applications often require more complexity than a single ChatGPT session can provide. This is where LangChain comes into play! 

Harrison Chase’s brainchild, LangChain, is a Python library designed to help you leverage the power of LLMs to build custom NLP applications. As of May 2023, this game-changing library has already garnered almost 40,000 stars on GitHub. 


Buckle up to see how generative AI can help your business grow. Book a free assessment today and take your business to the next level.

This comprehensive beginner’s guide provides a thorough introduction to LangChain, offering a detailed exploration of its core features. It walks you through the process of building a basic application using LangChain and shares valuable tips and industry best practices to make the most of this powerful framework. Whether you’re new to Language Learning Models (LLMs) or looking for a more efficient way to develop language generation applications, this guide serves as a valuable resource to help you leverage the capabilities of LLMs with LangChain. 

Overview of LangChain Modules 

These modules serve as fundamental abstractions that form the foundation of any application powered by the Language Model (LLM). LangChain offers standardized and adaptable interfaces for each module. Additionally, LangChain provides external integrations and even ready-made implementations for seamless usage. Let’s delve deeper into these modules. 

Overview of LangChain Modules
Overview of LangChain Modules


LLM is the fundamental component of LangChain. It is essentially a wrapper around a large language model that helps use the functionality and capability of a specific large language model. 


As stated earlier, LLM (Language Model) serves as the fundamental unit within LangChain. However, in line with the “LangChain” concept, it offers the ability to link together multiple LLM calls to address specific objectives. 

For instance, you may have a need to retrieve data from a specific URL, summarize the retrieved text, and utilize the resulting summary to answer questions. 

On the other hand, chains can also be simpler in nature. For instance, you might want to gather user input, construct a prompt using that input, and generate a response based on the constructed prompt. 


Prompts have become a popular modeling approach in programming. It simplifies prompt creation and management with specialized classes and functions, including the essential PromptTemplate. 

Document Loaders and Utils: 

LangChain’s Document Loaders and Utils modules simplify data access and computation. Document loaders convert diverse data sources into text for processing, while the utils module offers interactive system sessions and code snippets for mathematical computations. 


The widely used index type involves generating numerical embeddings for each document using an Embedding Model. These embeddings, along with the associated documents, are stored in a vectorstore. This vectorstore enables efficient retrieval of relevant documents based on their embeddings. 


LangChain offers a flexible approach for tasks where the sequence of language model calls is not deterministic. Its “Agents” can act based on user input and previous responses. The library also integrates with vector databases and has memory capabilities to retain the state between calls, enabling more advanced interactions. 

Building Our App 

Now that we’ve gained an understanding of LangChain, let’s build a PDF Q/A Bot app using LangChain and OpenAI. Let me first show you the architecture diagram for our app and then we will start with our app creation. 

QA Chatbot Architecture
QA Chatbot Architecture

Below is an example code that demonstrates the architecture of a PDF Q&A chatbot powered by the new technology. This code utilizes the OpenAI language model for natural language processing, FAISS database for efficient similarity search, PyPDF2 for reading PDF files, and Streamlit for creating a web application interface. The chatbot leverages LangChain’s Conversational Retrieval Chain to find the most relevant answer from a document based on the user’s question. This integrated setup enables an interactive and accurate question-answering experience for the users. 

Get expert advice on how generative AI can improve your marketing, sales, or operations. Schedule a call now!

Importing necessary libraries 

Import Statements: These lines import the necessary libraries and functions required to run the application. 

  • PyPDF2: Python library used to read and manipulate PDF files. 
  • langchain: a framework for developing applications powered by language models. 
  • streamlit: A Python library used to create web applications quickly. 
Importing necessary libraries
Importing necessary libraries

If the LangChain and OpenAI are not installed already, you first need to run the following commands in the terminal. 

Install LangChain

Setting OpenAI API Key 

You will replace the placeholder with your OpenAI API key which you can access from OpenAI API. The above line sets the OpenAI API key, which you need to use OpenAI’s language models. 

Setting OpenAI API Key

Streamlit UI 

These lines of code create the web interface using Streamlit. The user is prompted to upload a PDF file.

Streamlit UI
Streamlit UI

Reading the PDF File 

If a file has been uploaded, this block reads the PDF file, extracts the text from each page, and concatenates it into a single string. 

Reading the PDF File
Reading the PDF File

Text Splitting 

Language Models are often limited by the amount of text that you can pass to them. Therefore, it is necessary to split them up into smaller chunks. It provides several utilities for doing so. 

Text Splitting 
Text Splitting

Using a Text Splitter can also help improve the results from vector store searches, as eg. smaller chunks may sometimes be more likely to match a query. Here we are splitting the text into 1k tokens with 200 tokens overlap. 


Here, the OpenAIEmbeddings function is used to download embeddings, which are vector representations of the text data. These embeddings are then used with FAISS to create an efficient search index from the chunks of text.  


Creating Conversational Retrieval Chain 

The chains developed are modular components that can be easily reused and connected. They consist of predefined sequences of actions encapsulated in a single line of code. With these chains, there’s no need to explicitly call the GPT model or define prompt properties. This specific chain allows you to engage in conversation while referencing documents and retains a history of interactions. 

Creating Conversational Retrieval Chain
Creating Conversational Retrieval Chain

Streamlit for Generating Responses and Displaying in the App 

This block prepares a response that includes the generated answer and the source documents and displays it on the web interface. 

Streamlit for Generating Responses and Displaying in the App
Streamlit for Generating Responses and Displaying in the App

Let’s Run Our App 

QA Chatbot
QA Chatbot

Here we uploaded a PDF, asked a question, and got our required answer with the source document. See, that is how the magic of LangChain works.  

You can find the code for this app on my GitHub repository LangChain-Custom-PDF-Chatbot.

Wrapping Up 

Concluding the journey! Mastering LangChain for creating a basic Q&A application has been a success. I trust you have acquired a fundamental comprehension of LangChain’s potential. Now, take the initiative to delve into LangChain further and construct even more captivating applications. Enjoy the coding adventure.

Want to see how generative AI can help your business grow? Get a free consultation today and give your business the edge it needs to succeed.

Sabrina Dominguez
| May 15, 2018

Do you know what can be done with your telecom data? Who decides how it should be used?

Telecommunications isn’t going anywhere. In fact, your telecom data is becoming even more important than ever.

From the first smoke signals to current, cutting-edge smartphones, the objective of telecommunications has remained the same:

Telecom transmits data across distances farther than the human voice can carry.

Telecommunications (or telecom), as an industry with data ingrained into its very DNA, has benefited a great deal from the advent of modern data science. Here are 7 ways that telecommunications companies (otherwise known as telcos) are making the most of your telecom data, with machine learning purposes.

1: Aiding in infrastructure repair

A person analyzing data reports

Even as communication becomes more decentralized, signal towers remain an unfortunate remnant of an analog past in telecommunications. Companies can’t exactly send their in-house software engineers to climb up the towers and routinely check on the infrastructure. This task still requires field workers to carry out routine inspections, even if no problem visibly exists. AT&T is looking to change that through machine learning models that will analyze video footage captured by drones. The company can then passively detect potential risks, allowing human workers to fix structural issues before they affect customers. Read more about AT&T’s drones here.

2: Email management and lead identification

A number of mails / e-mails

Mass email marketing is a vital asset of the modern corporation, but even as the sending process becomes more automated, someone is still required to sift through the responses and interpret the interests and questions from potential customers.

To make your life easier, you could instead offload that task to AI. In 2016, CenturyLink began using its automated assistant “Angie” to handle 30,000 monthly emails. Of these, 99% could be properly interpreted without handing them off to a human manager. Imagine how much time the human manager would save, without having to sift through that telecom data.

The company behind Angie, California-based tech developer Conversica, advertises machine learning models as a way to identify promising leads from the dense noise of email communication, which enables telcos to efficiently redirect their marketing follow-up efforts to the right representatives.

3: Rise of the chat bots

Chat bots sending automated message

Dealing with chat bots can be a frustrating (or hilarious) experience. Despite the generally negative perception that precedes them, it hasn’t slowed down bot implementation into the customer service side of most telecom companies. Spectrum and AT&T are among the corporations that utilize chat bots at some level of their customer service pipeline, and others are quickly following suit. As the algorithms behind these programs grow more nuanced, human customer service, which brings its own set of frustrations, is beginning to be reduced or phased out.

4: Working with language

The advancement of natural language processing has made interacting with technology easier than ever. Telcos like DISH and Comcast have made use of this branch of artificial intelligence to improve the user interface of their products. One example of this is allowing customers to navigate channels and save shows as “favorites” using only their natural speech. Visually impaired customers can make use of vocal relay features to hear titles and time-slots read back to them in response to spoken commands, widening the user base of the company.

5: Content customization

Content customization concept on different channels

If you’re a Netflix user, I’m sure you’ve seen the “Recommended for you” and “Because you watched (insert show title)” recommendations. They used to be embarrassingly bad, but these suggestions have noticeably improved over the years.

Netflix has succeeded partly on the back of its recommendation engine, which tailors displayed content based on user behavior (in other words, your telecom data). Comcast is making moves towards a similar system, utilizing machine vision algorithms and user metadata to craft a personalized experience for the customer.

As companies begin to create increasingly precise user profiles, we are approaching the point of your telco knowing more about your behavior than you do, solely from the telecom data you put out.This can have a lot of advantages, one of the more obvious ones include being introduced to a new favorite show.

6: Variable data caps

Nobody likes data caps that restrict them, but paying for data usage you’re not actually using is nearly as bad. Some telecom companies are moving towards a system that calculates data caps based on user behavior and adjusts the price accordingly, in an effort to be as fair as possible. Whether or not you think corporations will use tiered pricing in a reasonable way depends on your opinion of said corporations. On paper, big data may be able to determine what kind of data consumer you are and adjust your data restrictions to fit your specific needs. This could potentially save you hundreds of dollars a year.

For as long as data could be extracted from phone calls, the telecommunications industry has been collecting your telecom data. “Call detail records” (CDRs) are a treasure trove of user information.

CDRs are accompanied by metadata which includes parameters such as the numbers of both speakers on the call, the route the call took to connect, any faulty conditions the call experienced, and more. Machine learning models are already working to translate CDRs into valuable insights on improving call quality and customer interactions.

It’s important to note that phone companies aren’t the only ones making use of this specific data. Since this metadata contains limited personal information, the Supreme Court ruled that it does not fall under the 4th Amendment, and as such, CDRs are used by law enforcement almost as much as by telcos.


Sabrina Dominguez: Sabrina holds a B.S. in Business Administration with a specialization in Marketing Management from Central Washington University. She has a passion for Search engine optimization and marketing.

James Kennedy: James holds a B.A. in Biology with a Creative Writing minor from Whitman College. He is a lifelong writer with a curiosity for the sciences.

This is the first part in a series identifying the practical uses of data science in various industries. Stay tuned for the second part, which will cover data in the healthcare sector.


Usman Shahid
| April 8, 2020

In the first part of this introductory series to chatbots, we talk about what this revolutionary technology is and why it has suddenly become so popular.

It took less than 24 hours of interaction with humans for an innocent, self-learning AI chatbot to turn into a chaotic, racist Nazi.

In March 2016, Microsoft unveiled Tay; a twitter-based, friendly, self-learning chatbot modeled to behave like a teenage girl. The AI chatbot was supposed to be an experiment in “conversational understanding”, as described by Microsoft. The bot was designed to learn from interacting with people online through casual conversation, slowly developing its personality.

What Microsoft didn’t consider, however, was the effect of negative inputs on Tay’s learning. Tay started off by declaring “humans are super cool” and that it was “a nice person”. Unfortunately, the conversations didn’t stay casual for too long.

In less than 24 hours Tay was tweeting racist, sexist and extremely inflammatory remarks after learning from all sorts of misogynistic, racist garbage tweeted at it by internet trolls.

This entire experiment, despite becoming a proper PR disaster for Microsoft, proved to be an excellent study into the inherently negative human bias and its effect on self-learning Artificial Intelligence.

So, what are Chatbots?

A chatbot is a specialized software that allows conversational interaction between a computer and a human. Modern chatbots are versatile enough to carry out complete conversations with their human users and even carry out tasks given during conversations.

Having become mainstream because of personal assistants from the likes of Google, Amazon, and Apple, chatbots have become a vital part of our everyday lives whether we realize it or not.

Why the sudden popularity?

The use of chatbots has skyrocketed recently. They have found a strong foothold in almost every task that requires text-based public dealing. They have become so critical in the customer support industry, for example, that almost 25% of all customer service operations are expected to use them by 2020.

projected growth rate
Use of Chatbots among Service Organizations (Source)

This is mainly because people have all but moved on to chat as the primary mode of communication. Couple that with the huge number of conversational platforms (Skype, WhatsApp, Slack, Kik, etc.) available, and the environment makes complete sense to use AI and the cloud to connect with people.

At the other end of the support chain, businesses love chatbots because they’re available 24×7, have near-immediate response times and are very easy to scale without the huge human resource bill that normally comes with having a decent customer support operations team.

Outside of business environments, smart virtual assistants dominate almost every aspect of modern life. We depend on these smart assistants for everything; from controlling our smart homes to helping us manage our day-to-day tasks. They have, slowly, become a vital part of our lives and their usefulness will only increase as they keep becoming smarter.

Types of Chatbots

Chatbots can be broadly classified into two different types:

Rule-Based Chatbots

The very first bots to see the light of day, rule-based chatbots relied on pattern-matching methodologies to ‘guess’ appropriate responses from an existing database. These bots started with the release of ELIZA in 1966 and continued till around 2001 with the release of SmarterChild developed by ActiveBuddy.

eliza interface
Welcome screen of ELIZA

The simplest rule-based chatbots have one-to-one tables of inputs and their responses. These bots are extremely limited and can only respond to queries if they are an exact match with the inputs defined in their database. This means the conversation can only follow a number of predefined flows. In a lot of cases, the chatbot doesn’t even allow users to type in queries, relying, instead on, preset inputs that the bot understands.

This doesn’t necessarily limit their use though. Rule-based Chatbots are widely used in modern businesses for customer support tasks. A Customer Support Chatbot has an extremely limited job description. A customer support chatbot for a bank, for example, would need to answer some operational queries about the bank (timings, branch locations) and complete some basic tasks (authenticate users, block stolen credit cards, activate new credit cards, register complaints).

In almost all of these cases, the conversation would follow a pattern. The flow of conversation, once defined, would stay mostly the same for a majority of the users. The small number of customers who need more specialized support could be forwarded to a human agent.

A lot of modern customer-facing chatbots are AI-based Chatbots that use Retrieval based Models (which we’ll be discussing below). They are primarily rule-based but employ some form of Artificial Intelligence (AI) to help them understand the flow of human conversations.

AI-based chatbots

These are a relatively newer class of chatbots, having come out after the proliferation of artificial intelligence in recent years. These bots (like Microsoft’s Tay) learn by being trained on conversational datasets, instead of having hard-coded rules like their rule-based kin.

AI-based chatbots are based on complex machine learning models that enable them to self-learn. These types of chatbots can be broadly classified into two main types depending on the types of models they use.

1.    Retrieval-based models

As their name suggests, Chatbots using retrieval-based models are provided with a database of answers and are trained to retrieve the most relevant answer based on the input question. These bots already have a provided list of responses and are trained to rank each response based on the input/question. They cannot generate their own answers but with an extensive database of answers and proper training, they can be very productive and useful.

Usually easier to develop and customize, retrieval-based chatbots are mostly used in customer support and feedback applications where the conversation is limited to a topic (either a product, a service, or an entity).

2.     Generative models

Generative models, unlike Retrieval based models, can generate their own responses by analyzing the input word by word to understand the query. These models are more ‘human’ during their interactions but at the same time also more prone to errors as they need to build sentence responses themselves.

Chatbots based on Generative Models are quite complex to build and are usually overkill for customer-facing applications. They are mostly used in applications where conversations are expected to be general/not limited to a specific topic. Take Google Assistant as an example. The Assistant is an always listening chatbot that can answer questions, tell jokes and carry out very ‘human’ conversations. The one thing it can’t do. Provide customer support for Google products.

google assistant message
Google Assistant

Modern Virtual Assistants are a very good example of AI-based Chatbots

History of chatbots

history of chatbots infographic
History of Chatbots Infographic

Modern chatbots: Where are they used?

Customer services

The use of chatbots has been growing exponentially in the Customer Services industry. The chatbot market is projected to grow from $2.6 billion in 2019 to $9.4 billion by 2024. This really isn’t surprising when you look at the immense benefits chatbots bring to businesses. According to a study by IBM, chatbots can reduce customer services cost by up to 30%. Couple that with customers being open to interacting with bots for support and purchases, it’s a win-win scenario for both parties involved.

In the customer support industry, chatbots are mostly used to automate redundant queries that would normally be handled by a human agent. Businesses are also starting to use them in automating order-booking applications. The most successful example being Pizza Hut’s automated ordering platform.


Despite not being a substitute for healthcare professionals, chatbots are gaining popularity in the healthcare industry. They are mostly used as self-care assistants, helping patients manage their medications and help them track and monitor their fitness.

Financial assistants

Financial chatbots usually come bundled with apps from leading banks. Once linked to your bank account, they can extend the functionality of the app by providing you with a conversational (text or voice) interface to your bank. Besides these, there are quite a few financial assistant chatbots available. These are able to track your expenses, budget your resources and help you manage your finances. Charlie is a very good example of a financial assistant. The chatbot is designed to help you budget your expenses and track your finances so that you end up saving more.


Chatbots have become a very popular way of interacting with the modern smart home. These bots are, at their core, complex. They don’t need to have contextual awareness but need to be trained properly to be able to extract an actionable command from an input statement. This is not always an easy task as the chatbot is required to understand the flow of natural language. Modern virtual assistants (such as the Google Assistant, Alexa, Siri) handle these tasks quite well and have become the de facto standard to provide a voice or text-based interface to a smart home.

Tools for building intelligent chatbots

Building a chatbot as powerful as the virtual assistants from Google and Amazon is an almost impossible task. These companies have been able to achieve this feat after spending years and billions of dollars in research, something that not everyone with a use for a chatbot can afford.

Luckily, almost every player in the tech market (including Google and Amazon) allows businesses to buy their technology platforms to design customized chatbots for their own use. These platforms have pre-trained language models and easy-to-use interfaces that make it extremely easy for new users to set up and deploy customized chatbots in no time. If that wasn’t good enough, almost all of these platforms allow businesses to push their custom chatbot apps to the Google Assistant or Amazon Alexa and have them instantly be available to millions of new users.

The most popular of these platforms are:

1.     Google DialogFlow

2.     Amazon Lex

3.     IBM Watson

4.     Microsoft Azure Bot

Coming up next

Now that we’re familiar with the basics of chatbots, we’ll be going into more detail about how to build them. In the second blog of the series, we’ll be talking about how to create a simple Rule-based chatbot in Python. Stay tuned!

Related Topics

Machine Learning
Generative AI
DSD Insights
Development and Operations
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Artificial Intelligence

Finding our reads interesting?

Become a contributor today and share your data science insights with the community

Up for a Weekly Dose of Data Science?

Subscribe to our weekly newsletter & stay up-to-date with current data science news, blogs, and resources.