Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab.  

“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs).

Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely.



Learn to build LLM applications


Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.” 


  • A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one). 
  • Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours). 
  • A Google account for using Google Colab. 


Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’. 


Huggingface 1



Setting up Google Colab environment 

If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM. 


Google colab - 2 


Installing necessary libraries and dependencies

The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project 



Authenticating with HuggingFace

To integrate your Hugging Face token into Colab’s environment, follow these steps. 

  • Execute the following code in a Colab cell: 



  • After running the cell, a prompt will appear, requesting your Hugging Face token. 
  • Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token. 


Import relevant libraries



Initializing the HuggingFace pipeline

The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are: 

  • An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf. 
  • The respective tokenizer for the model. 


Large language model bootcamp


We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model. 



Load HuggingFace open-source embeddings models

Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs: 

  • Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context. 
  • Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model. 
  • Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features. 
  • Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets. 
  • Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language. 




Load CSV data using LangChain CSV loader 

LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment. 




Creating vectorstore

For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. 



Initializing retrieval QA chain and testing sample query

We are now going to use Retrieval QA chain of LangChain which combines vector store with a question answering chain to do question answering. 

The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters. 



Building a Gradio App 

Now we are going to merge the above code snippets to create a gradio application 



Function Definitions: 

  • main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display. 
  • dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset. 


Gradio Interface Setup: 

  • with gr.Blocks() as demo: Initializes a Gradio interface block. 
  • with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer. 
  • with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset. 
  • submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box. 
  • data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly. 
  • gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input. 


Launching the Gradio Interface: 

  • demo.launch(debug=True): Launches the Gradio interface in debug mode. 


In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance. 



Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer. 


LLM output LLM output LLM output LLM output LLM output LLM output



In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab.

By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets. 

As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.