GPT-3.5 and other large language models (LLMs) have transformed natural language processing (NLP). Trained on massive datasets, LLMs can generate text that is both coherent and relevant to the context, making them invaluable for a wide range of applications.
Learning about LLMs is essential in today’s fast-changing technological landscape. These models are at the forefront of AI and NLP research, and understanding their capabilities and limitations can empower people in diverse fields.
This blog lists steps and several tutorials that can help you get started with large language models. From understanding large language models to building your own ChatGPT, this roadmap covers it all.
Building a large language model application on custom data can help improve your business in a number of ways. This means that LLMs can be tailored to your specific needs. For example, you could train a custom LLM on your customer data to improve your customer service experience.
The talk below will give an overview of different real-world applications of large language models and how these models can assist with different routine or business activities.
Step 2: Introduction to fundamentals and architectures of LLM applications
Applications like Bard, ChatGPT, Midjourney, and DallE have entered some applications like content generation and summarization. However, there are inherent challenges for a lot of tasks that require a deeper understanding of trade-offs like latency, accuracy, and consistency of responses.
Any serious applications of LLMs require an understanding of nuances in how LLMs work, including embeddings, vector databases, retrieval augmented generation (RAG), orchestration frameworks, and more.
This talk will introduce you to the fundamentals of large language models and their emerging architectures. This video is perfect for anyone who wants to learn more about Large Language Models and how to use LLMs to build real-world applications.
Step 3: Understanding vector similarity search
Traditional keyword-based methods have limitations, leaving us searching for a better way to improve search. But what if we could use deep learning to revolutionize search?
Imagine representing data as vectors, where the distance between vectors reflects similarity, and using Vector Similarity Search algorithms to search billions of vectors in milliseconds. It’s the future of search, and it can transform text, multimedia, images, recommendations, and more.
The challenge of searching today is indexing billions of entries, which makes it vital to learn about vector similarity search. This talk below will help you learn how to incorporate vector search and vector databases into your own applications to harness deep learning insights at scale.
Step 4: Explore the power of embedding with vector search
The total amount of digital data generated worldwide is increasing at a rapid rate. Simultaneously, approximately 80% (and growing) of this newly generated data is unstructured data—data that does not conform to a table- or object-based model.
Examples of unstructured data include text, images, protein structures, geospatial information, and IoT data streams. Despite this, the vast majority of companies and organizations do not have a way of storing and analyzing these increasingly large quantities of unstructured data.
Embeddings—high-dimensional, dense vectors that represent the semantic content of unstructured data can remedy this issue. This makes it significant to learn about embeddings.
The talk below will provide a high-level overview of embeddings, discuss best practices around embedding generation and usage, build two systems (semantic text search and reverse image search), and see how we can put our application into production using Milvus.
Step 5: Discover the key challenges in building LLM applications
As enterprises move beyond ChatGPT, Bard, and ‘demo applications’ of large language models, product leaders and engineers are running into challenges. The magical experience we observe on content generation and summarization tasks using ChatGPT is not replicated on custom LLM applications built on enterprise data.
Enterprise LLM applications are easy to imagine and build a demo out of, but somewhat challenging to turn into a business application. The complexity of datasets, training costs, cost of token usage, response latency, context limit, fragility of prompts, and repeatability are some of the problems faced during product development.
Delve deeper into these challenges with the below talk:
Step 6: Building Your Own ChatGPT
Learn how to build your own ChatGPT or a custom large language model using different AI platforms like Llama Index, LangChain, and more. Here are a few talks that can help you to get started:
Step 7: Learn about Retrieval Augmented Generation (RAG)
Learn the common design patterns for LLM applications, especially the Retrieval Augmented Generation (RAG) framework; What is RAG and how it works, how to use vector databases and knowledge graphs to enhance LLM performance, and how to prioritize and implement LLM applications in your business.
The discussion below will not only inspire organizational leaders to reimagine their data strategies in the face of LLMs and generative AI but also empower technical architects and engineers with practical insights and methodologies.
Step 8: Understanding AI observability
AI observability is the ability to monitor and understand the behavior of AI systems. It is essential for responsible AI, as it helps to ensure that AI systems are safe, reliable, and aligned with human values.
The talk below will discuss the importance of AI observability for responsible AI and offer fresh insights for technical architects, engineers, and organizational leaders seeking to leverage Large Language Model applications and generative AI through AI observability.
>
Step 9: Prevent large language models hallucination
It important to evaluate user interactions to monitor prompts and responses, configure acceptable limits to indicate things like malicious prompts, toxic responses, llm hallucinations, and jailbreak attempts, and set up monitors and alerts to help prevent undesirable behaviour. Tools like WhyLabs and Hugging Face play a vital role here.
The talk below will use Hugging Face + LangKit to effectively monitor Machine Learning and LLMs like GPT from OpenAI. This session will equip you with the knowledge and skills to use LangKit with Hugging Face models.
Step 10: Learn to fine-tune LLMs
Fine-tuning GPT-3.5 Turbo allows you to customize the model to your specific use case, improving performance on specialized tasks, achieving top-tier performance, enhancing steerability, and ensuring consistent output formatting. It important to understand what fine-tuning is, why it’s important for GPT-3.5 Turbo, how to fine-tune GPT-3.5 Turbo for specific use cases, and some of the best practices for fine-tuning GPT-3.5 Turbo.
Whether you’re a data scientist, machine learning engineer, or business user, this talk below will teach you everything you need to know about fine-tuning GPT-3.5 Turbo to achieve your goals and using a fine tuned GPT3.5 Turbo model to solve a real-world problem.
Step 11: Become ChatGPT prompting expert
Learn advanced ChatGPT prompting techniques essential to upgrading your prompt engineering experience. Use ChatGPT prompts in all formats, from freeform to structured, to get the most out of large language models. Explore the latest research on prompting and discover advanced techniques like chain-of-thought, tree-of-thought, and skeleton prompts.
Explore scientific principles of research for data-driven prompt design and master prompt engineering to create effective prompts in all formats.
Start mastering LLMs for tasks that can ease up your business activities.
To learn more about large language models, checkout this playlist; from tutorials to crash courses, it is your one-stop learning spot for LLMs and Generative AI.
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab.
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs).
Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely.
Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
Prequisites
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
Setting up Google Colab environment
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
Installing necessary libraries and dependencies
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
Authenticating with HuggingFace
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
Import relevant libraries
Initializing the HuggingFace pipeline
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Load HuggingFace open-source embeddings models
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
Load CSV data using LangChain CSV loader
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
Creating vectorstore
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
Initializing retrieval QA chain and testing sample query
We are now going to use Retrieval QA chain of LangChain which combines vector store with a question answering chain to do question answering.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
Building a Gradio App
Now we are going to merge the above code snippets to create a gradio application
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
Output
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
Conclusion
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab.
By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.
Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs:
Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context.
Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model.
Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features.
Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets.
Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language.
LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment.
For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters.
BUILDING A GRADIO APP
Now we are going to merge the above code snippets to create a gradio application
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs])
demo.launch(debug=True)
The above code sets up a Gradio interface on Colab for a question-answering application using LLAMA2 and FAISS. Here’s a brief overview:
Function Definitions:
main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display.
dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset.
Gradio Interface Setup:
with gr.Blocks() as demo: Initializes a Gradio interface block.
with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer.
with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset.
submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box.
data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly.
gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input.
Launching the Gradio Interface:
demo.launch(debug=True): Launches the Gradio interface in debug mode.
In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance.
OUTPUT
Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer.
CONCLUSION
In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab. By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets.
As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.
Converse with Your Data: Chatting with CSV Files Using Open-Source Tools
Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab
“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs). Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely. Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.”
PREREQUISITES
A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one).
Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours).
A Google account for using Google Colab.
Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’.
SETTING UP GOOGLE COLAB ENVIRONMENT
If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM.
INSTALLING NECESSARY LIBRARIES AND DEPENDENCIES
The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project
To integrate your Hugging Face token into Colab’s environment, follow these steps.
Execute the following code in a Colab cell:
!huggingface–cli login
After running the cell, a prompt will appear, requesting your Hugging Face token.
Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token.
IMPORTING RELEVANT LIBRARIES
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
importtransformers
importtorch
importgradio
importtextwrap
INITIALIZING THE HUGGING FACE PIPELINE
The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:
An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf.
The respective tokenizer for the model.
We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model.