Price as low as $4499 | Learn to build custom large language model applications

Explore a step-by-step journey in crafting dynamic chatbot experiences tailored to your CSV data using Gradio, LLAMA2, and Hugging Face on Google Colab.  

“When diving into the world of Language Model usage, one often encounters barriers such as the necessity for a paid API or the need for a robust computing system when working with open-source Language Models (LLMs).

Eager to overcome these constraints, I embarked on a journey to develop a Gradio App using open-source tools completely.

 

 

Learn to build LLM applications

 

Harnessing the power of the free Colab T4 GPU and an open-source LLM, this blog will guide you through the process, empowering you to effortlessly chat with your own CSV data, breaking free from the traditional limitations associated with LLMs.” 

Prequisites 

  • A Hugging Face account to access open-source Llama 2 and embedding models (free sign up available if you don’t have one). 
  • Access to LLAMA2 models, obtainable through this form (access is typically granted within a few hours). 
  • A Google account for using Google Colab. 

 

Once you have been granted access to Llama 2 models visit the following link and select the checkbox shown in the image below and hit ‘Submit’. 

 

Huggingface 1

 

 

Setting up Google Colab environment 

If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4. Our code will require ~15GB of GPU RAM. 

 

Google colab - 2  

 

Installing necessary libraries and dependencies

The following snippet streamlines the installation process, ensuring that all necessary components are readily available for our project 

 

 

Authenticating with HuggingFace

To integrate your Hugging Face token into Colab’s environment, follow these steps. 

  • Execute the following code in a Colab cell: 

 

 

  • After running the cell, a prompt will appear, requesting your Hugging Face token. 
  • Obtain your Hugging Face token by navigating to the Hugging Face settings. Look for the “Access Token” tab, where you can easily copy your token. 

 

Import relevant libraries

 

 

Initializing the HuggingFace pipeline

The first thing we need to do is initialize a text-generation pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are: 

  • An LLM, in this case it will be meta-llama/Llama-2-7b-chat-hf. 
  • The respective tokenizer for the model. 

 

Large language model bootcamp

 

We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 2-5 minutes to download and initialize the model. 

  

 

Load HuggingFace open-source embeddings models

Embeddings are crucial for Language Model (LM) because they transform words or tokens into numerical vectors, enabling the model to understand and process them mathematically. In the context of LLMs: 

  • Semantic Representation: Embeddings encode semantic relationships, placing similar words close in vector space for the model to understand nuanced language context. 
  • Numerical Input for Models: Transforming words into numerical vectors, embeddings provide a mathematical foundation for neural networks, ensuring effective processing within the model. 
  • Dimensionality Reduction: Embeddings condense high-dimensional word representations, enhancing computational efficiency while preserving essential linguistic features. 
  • Transfer Learning: Pre-trained embeddings capture general language patterns, facilitating knowledge transfer to specific tasks, boosting model performance on diverse datasets. 
  • Contextual Information: Embeddings, considering adjacent words, capture contextual nuances, enabling Language Models to generate coherent and contextually relevant language. 

 

 

 

Load CSV data using LangChain CSV loader 

LangChain CSV loader loads csv data with a single row per document. For this demo we are using employee sample data csv file which is uploaded in colab’s environment. 

 

 

 

Creating vectorstore

For this demonstration, we are going to use FAISS vectorstore. Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. 

 

 

Initializing retrieval QA chain and testing sample query

We are now going to use Retrieval QA chain of LangChain which combines vector store with a question answering chain to do question answering. 

The above code utilizes the RetrievalQA module to answer a specific query about the annual salary of Sophie Silva, including the retrieval of source documents. The result is then formatted for better readability by wrapping the text to a maximum width of 500 characters. 

 

 

Building a Gradio App 

Now we are going to merge the above code snippets to create a gradio application 

 

 

Function Definitions: 

  • main: Takes a dataset and a question as input, initializes a RetrievalQA chain, retrieves the answer, and formats it for display. 
  • dataset_change: Changes in the dataset trigger this function, loading the dataset, creating a FAISS vector store, and returning the first 5 rows of the dataset. 

 

Gradio Interface Setup: 

  • with gr.Blocks() as demo: Initializes a Gradio interface block. 
  • with gr.Row(): and with gr.Column():: Defines the layout of the interface with file input, text input for the question, a button to submit the question, and a text box to display the answer. 
  • with gr.Row(): and dataframe = gr.Dataframe(): Includes a row for displaying the first 5 rows of the dataset. 
  • submit_btn.click(main, inputs=[data,qs], outputs=[answer]): Associates the main function with the click event of the submit button, taking inputs from the file and question input and updating the answer text box. 
  • data.change(fn=dataset_change,inputs=data,outputs=[dataframe]): Calls the dataset_change function when the dataset changes, updating the dataframe display accordingly. 
  • gr.Examples([[“What is the Annual Salary of Theodore Dinh?”], [“What is the Department of Parker James?”]], inputs=[qs]): Provides example questions for users to input. 

 

Launching the Gradio Interface: 

  • demo.launch(debug=True): Launches the Gradio interface in debug mode. 

 

In summary, this code creates a user-friendly Gradio interface for interacting with a question-answering system. Users can input a CSV dataset, ask questions about the data, and receive answers displayed in real-time. The interface also showcases a sample dataset and questions for user guidance. 

 

Output 

Attached below are some screenshots of the app and the responses of LLM. The process kicks off by uploading a csv file, which is then passed through the embeddings model to generate embeddings. Once this process is done the first 5 rows of the file are displayed for preview. Now the user can input the question and Hit ‘Submit’ to generate answer. 

 

LLM output LLM output LLM output LLM output LLM output LLM output

 

Conclusion 

In conclusion, this blog has demonstrated the empowerment of language models through the integration of LLAMA2, Gradio, and Hugging Face on Google Colab.

By overcoming the limitations of paid APIs and compute-intensive open-source models, we’ve successfully created a dynamic Gradio app for personalized interactions with CSV data. Leveraging LangChain question-answering chains and Hugging Face’s model integration, this hands-on guide enables users to build chatbots that comprehend and respond to their own datasets. 

As technology evolves, this blog encourages readers to explore, experiment, and continue pushing the boundaries of what can be achieved in the realm of natural language processing.

 

In this step-by-step guide, learn how to deploy a web app for Gradio on Azure with Docker. This blog covers everything from Azure Container Registry to Azure Web Apps, with a step-by-step tutorial for beginners.

I was searching for ways to deploy a Gradio application on Azure, but there wasn’t much information to be found online. After some digging, I realized that I could use Docker to deploy custom Python web applications, which was perfect since I had neither the time nor the expertise to go through the “code” option on Azure. 

The process of deploying a web app begins by creating a Docker image, which contains all of the application’s code and its dependencies. This allows the application to be packaged and pushed to the Azure Container Registry, where it can be stored until needed. From there, it can be deployed to the Azure App Service, where it is run as a container and can be managed from the Azure Portal. In this portal, users can adjust the settings of their app, as well as grant access to roles and services when needed. 

Once everything is set and the necessary permissions have been granted, the web app should be able to properly run on Azure. Deploying a web app on Azure using Docker is an easy and efficient way to create and deploy applications, and can be a great solution for those who lack the necessary coding skills to create a web app from scratch!’

Comprehensive overview of creating a web app for Gradio

Gradio application 

Gradio is a Python library that allows users to create interactive demos and share them with others. It provides a high-level abstraction through the Interface class, while the Blocks API is used for designing web applications.

Blocks provide features like multiple data flows and demos, control over where components appear on the page, handling complex data flows, and the ability to update properties and visibility of components based on user interaction. With Gradio, users can create a web application that allows their users to interact with their machine learning model, API, or data science workflow. 

The two primary files in a Gradio Application are:

  1. App.py: This file contains the source code for the application.
  2. Requirements.txt: This file lists the Python libraries required for the source code to function properly.

Docker 

Docker is an open-source platform for automating the deployment, scaling, and management of applications, as containers. It uses a container-based approach to package software, which enables applications to be isolated from each other, making it easier to deploy, run, and manage them in a variety of environments. 

A Docker container is a lightweight, standalone, and executable software package that includes everything needed to run a specific application, including the code, runtime, system tools, libraries, and settings. Containers are isolated from each other and the host operating system, making them ideal for deploying microservices and applications that have multiple components or dependencies. 

Docker also provides a centralized way to manage containers and share images, making it easier to collaborate on application development, testing, and deployment. With its growing ecosystem and user-friendly tools, Docker has become a popular choice for developers, system administrators, and organizations of all sizes. 

Azure Container Registry 

Azure Container Registry (ACR) is a fully managed, private Docker registry service provided by Microsoft as part of its Azure cloud platform. It allows you to store, manage, and deploy Docker containers in a secure and scalable way, making it an important tool for modern application development and deployment. 

With ACR, you can store your own custom images and use them in your applications, as well as manage and control access to them with role-based access control. Additionally, ACR integrates with other Azure services, such as Azure Kubernetes Service (AKS) and Azure DevOps, making it easy to deploy containers to production environments and manage the entire application lifecycle. 

ACR also provides features such as image signing and scanning, which helps ensure the security and compliance of your containers. You can also store multiple versions of images, allowing you to roll back to a previous version if necessary. 

Azure Web App 

Azure Web Apps is a fully managed platform for building, deploying, and scaling web applications and services. It is part of the Azure App Service, which is a collection of integrated services for building, deploying, and scaling modern web and mobile applications. 

With Azure Web Apps, you can host web applications written in a variety of programming languages, such as .NET, Java, PHP, Node.js, and Python. The platform automatically manages the infrastructure, including server resources, security, and availability, so that you can focus on writing code and delivering value to your customers. 

Azure Web Apps supports a variety of deployment options, including direct Git deployment, continuous integration and deployment with Visual Studio Team Services or GitHub, and deployment from Docker containers. It also provides built-in features such as custom domains, SSL certificates, and automatic scaling, making it easy to deliver high-performing, secure, and scalable web applications. 

A step-by-step guide to deploying a Gradio application on Azure using Docker

This guide assumes a foundational understanding of Azure and the presence of Docker on your desktop. Refer to the Mac,  Windows , or Linux getting started instructions for Docker. 

Step 1: Create an Azure Container Registry resource 

Go to Azure Marketplace, search for ‘container registry’ and hit ‘Create’. 

STEP 1: Create an Azure Container Registry resource
Create an Azure Container Registry resource

Under the “Basics” tab, complete the required information and leave the other settings as the default. Then, click “Review + Create.” 

Web App for Gradio Step 1A
Web App for Gradio Step 1A

 

Step 2: Create a Web App resource in Azure 

In Azure Marketplace, search for “Web App”, select the appropriate resource as depicted in the image, and then click “Create”. 

STEP 2: Create a Web App resource in Azure
Create a Web App resource in Azure

 

Under the “Basics” tab, complete the required information, choose the appropriate pricing plan, and leave the other settings as the default. Then, click “Review + Create.”  

Web App for Gradio Step 2B
Web App for Gradio Step 2B

 

Web App for Gradio Step 2C
Web App for Gradio Step 2c

 

Upon completion of all deployments, the following three resources will be in your resource group. 

Web App for Gradio Step 2D
Web App for Gradio Step 2D

Step 3: Create a folder containing the “App.py” file and its corresponding “requirements.txt” file 

To begin, we will utilize an emotion detector application, the model for which can be found at https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion. 

APP.PY 

REQUIREMENTS.TXT 

Step 4: Launch Visual Studio Code and open the folder

Step 4: Launch Visual Studio Code and open the folder. 
Step 4: Launch Visual Studio Code and open the folder.

Step 5: Launch Docker Desktop to start Docker. 

STEP 5: Launch Docker Desktop to start Docker
STEP 5: Launch Docker Desktop to start Docker.

Step 6: Create a Dockerfile 

A Dockerfile is a script that contains instructions to build a Docker image. This file automates the process of setting up an environment, installing dependencies, copying files, and defining how to run the application. With a Dockerfile, developers can easily package their application and its dependencies into a Docker image, which can then be run as a container on any host with Docker installed. This makes it easy to distribute and run the application consistently in different environments. The following contents should be utilized in the Dockerfile: 

DOCKERFILE 

STEP 6: Create a Dockerfile
STEP 6: Create a Dockerfile

Step 7: Build and run a local Docker image 

Run the following commands in the VS Code terminal. 

1. docker build -t demo-gradio-app 

  • The “docker build” command builds a Docker image from a Docker file. 
  • The “-t demo-gradio-app” option specifies the name and optionally a tag to the name of the image in the “name:tag” format. 
  • The final “.” specifies the build context, which is the current directory where the Dockerfile is located.

 

2. docker run -it -d –name my-app -p 7000:7000 demo-gradio-app 

  • The “docker run” command starts a new container based on a specified image. 
  • The “-it” option opens an interactive terminal in the container and keeps the standard input attached to the terminal. 
  • The “-d” option runs the container in the background as a daemon process. 
  • The “–name my-app” option assigns a name to the container for easier management. 
  • The “-p 7000:7000” option maps a port on the host to a port inside the container, in this case, mapping the host’s port 7000 to the container’s port 7000. 
  • The “demo-gradio-app” is the name of the image to be used for the container. 

This command will start a new container with the name “my-app” from the “demo-gradio-app” image in the background, with an interactive terminal attached, and port 7000 on the host mapped to port 7000 in the container. 

Web App for Gradio Step 7A
Web App for Gradio Step 7A

 

Web App for Gradio Step 7B
Web App for Gradio Step 7B

 

To view your local app, navigate to the Containers tab in Docker Desktop, and click on link under Port. 

Web App for Gradio Step 7C
Web App for Gradio Step 7C

Step 8: Tag & Push the Image to Azure Container Registry 

First, enable ‘Admin user’ from the ‘Access Keys’ tab in Azure Container Registry. 

STEP 8: Tag & Push Image to Azure Container Registry
Tag & Push Images to Azure Container Registry

 

Login to your container registry using the following command, login server, username, and password can be accessed from the above step. 

docker login gradioappdemos.azurecr.io

Web App for Gradio Step 8B
Web App for Gradio Step 8B

 

Tag the image for uploading to your registry using the following command. 

 

docker tag demo-gradio-app gradioappdemos.azurecr.io/demo-gradio-app 

  • The command “docker tag demo-gradio-app gradioappdemos.azurecr.io/demo-gradio-app” is used to tag a Docker image. 
  • “docker tag” is the command used to create a new tag for a Docker image. 
  • “demo-gradio-app” is the source image name that you want to tag. 
  • “gradioappdemos.azurecr.io/demo-gradio-app” is the new image name with a repository name and optionally a tag in the “repository:tag” format. 
  • This command will create a new tag “gradioappdemos.azurecr.io/demo-gradio-app” for the “demo-gradio-app” image. This new tag can be used to reference the image in future Docker commands. 

Push the image to your registry. 

docker push gradioappdemos.azurecr.io/demo-gradio-app 

  • “docker push” is the command used to upload a Docker image to a registry. 
  • “gradioappdemos.azurecr.io/demo-gradio-app” is the name of the image with the repository name and tag to be pushed. 
  • This command will push the Docker image “gradioappdemos.azurecr.io/demo-gradio-app” to the registry specified by the repository name. The registry is typically a place where Docker images are stored and distributed to others. 
Web App for Gradio Step 8C
Web App for Gradio Step 8C

 

In the Repository tab, you can observe the image that has been pushed. 

Web App for Gradio Step 8D
Web App for Gradio Step 8B

Step 9: Configure the Web App 

Under the ‘Deployment Center’ tab, fill in the registry settings then hit ‘Save’. 

STEP 9: Configure the Web App
Configure the Web App

 

In the Configuration tab, create a new application setting for the website port 7000, as specified in the app.py file and the hit ‘Save’. 

Web App for Gradio Step 9B
Web App for Gradio Step 9B
Web App for Gradio Step 9C
Web App for Gradio Step 9C

 

Web App for Gradio Step 9D
Web App for Gradio Step 9D

 

In the Configuration tab, create a new application setting for the website port 7000, as specified in the app.py file and the hit ‘Save’. 

Web App for Gradio Step 9E
Web App for Gradio Step 9E

 

After the image extraction is complete, you can view the web app URL from the Overview page. 

 

Web App for Gradio Step 9F
Web App for Gradio Step 9F

 

Web App for Gradio Step 9G
Web App for Gradio Step 9G

Step 1O: Pushing Image to Docker Hub (Optional) 

Here are the steps to push a local Docker image to Docker Hub: 

  • Login to your Docker Hub account using the following command: 

docker login

  • Tag the local image using the following command, replacing [username] with your Docker Hub username and [image_name] with the desired image name: 

docker tag [image_name] [username]/[image_name]

  • Push the image to Docker Hub using the following command: 

docker push [username]/[image_name] 

  • Verify that the image is now available in your Docker Hub repository by visiting https://hub.docker.com/ and checking your repositories. 
Web App for Gradio Step 10A
Web App for Gradio Step 10A

 

Web App for Gradio Step 10B
Web App for Gradio Step 10B

Wrapping it up

In conclusion, deploying a web application using Docker on Azure is an easy and efficient way to create and deploy applications. This method is suitable for those who lack the necessary coding skills to create a web app from scratch. Docker is an open-source platform for automating the deployment, scaling, and management of applications, as containers.

Azure Container Registry is a fully managed, private Docker registry service provided by Microsoft as part of its Azure cloud platform. Azure Web Apps is a fully managed platform for building, deploying, and scaling web applications and services. By following the step-by-step guide provided in this article, users can deploy a Gradio application on Azure using Docker.

Google OR-Tools is a software suite for optimization and constraint programming. It includes several optimization algorithms such as linear programming, mixed-integer programming, and constraint programming. These algorithms can be used to solve a wide range of problems, including scheduling problems, such as nurse scheduling.

(more…)

A hands-on guide to collect and store twitter data for timeseries analysis 

“A couple of weeks back, I was working on a project in which I had to scrape tweets from twitter and after storing them in a csv file, I had to plot some graphs for timeseries analysis. I requested Twitter for Twitter developer API, but unfortunately my request was not fulfilled. Then I started searching for python libraries which can allow me to scrape tweets without the official Twitter API.

To my amazement, there were several libraries through which you can scrape tweets easily but for my project I found ‘Snscrape’ to be the best library, which met my requirements!” 

What is SNScrape? 

A scraper for social networking platforms known as snscrape (SNS). It retrieves objects, such as pertinent posts, by scraping things like user profiles, hashtags, or searches. 

 

Install Snscrape 

Snscrape requires Python 3.8 or higher. The Python package dependencies are installed automatically when you install Snscrape. You can install using the following commands. 

  • pip3 install snscrape 

  • pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git (Development Version) 

 

For this tutorial we will be using the development version of Snscrape. Paste the second command in command prompt(cmd), make sure you have git installed on your system. 

 

Code walkthrough for scraping

Before starting make sure you have the following python libraries: 

  • Pandas 
  • Numpy 
  • Snscrape 
  • Tqdm 
  • Seaborn 
  • Matplotlit 

Importing Relevant Libraries 

To run the scraping program, you will first need to import the libraries 

import pandas as pd 

import numpy as np 

import snscrape.modules.twitter as sntwitter 

import datetime 

from tqdm.notebook import tqdm_notebook 

import seaborn as sns 

import matplotlib.pyplot as plt 

sns.set_theme(style="whitegrid") 

 

 

Taking User Input 

To scrape tweets, you can provide many filters such as the username or start date or end date etc. We will be taking the following user inputs which will then be used in Snscrape. 

  • Text: The query to be matched. (Optional) 
  • Username: Specific username from twitter account. (Required) 
  • Since: Start Date in this format yyyy-mm-dd. (Optional) 
  • Until: End Date in this format yyyy-mm-dd. (Optional) 
  • Count: Max number of tweets to retrieve. (Required) 
  • Retweet: Include or Exclude Retweets. (Required) 
  • Replies: Include or Exclude Replies. (Required) 

 

For this tutorial we used the following inputs: 

text = input('Enter query text to be matched (or leave it blank by pressing enter)') 

username = input('Enter specific username(s) from a twitter account without @ (or leave it blank by pressing enter): ') 

since = input('Enter startdate in this format yyyy-mm-dd (or leave it blank by pressing enter): ') 

until = input('Enter enddate in this format yyyy-mm-dd (or leave it blank by pressing enter): ') 

count = int(input('Enter max number of tweets or enter -1 to retrieve all possible tweets: ')) 

retweet = input('Exclude Retweets? (y/n): ') 

replies = input('Exclude Replies? (y/n): ') 

 

Which field can we Scrape? 

Here is the list of fields which we can scrape using Snscrape Library. 

  • url: str 
  • date: datetime.datetime 
  • rawContent: str 
  • renderedContent: str 
  • id: int 
  • user: ‘User’ 
  • replyCount: int 
  • retweetCount: int 
  • likeCount: int 
  • quoteCount: int 
  • conversationId: int 
  • lang: str 
  • source: str 
  • sourceUrl: typing.Optional[str] = None 
  • sourceLabel: typing.Optional[str] = None 
  • links: typing.Optional[typing.List[‘TextLink’]] = None 
  • media: typing.Optional[typing.List[‘Medium’]] = None 
  • retweetedTweet: typing.Optional[‘Tweet’] = None 
  • quotedTweet: typing.Optional[‘Tweet’] = None 
  • inReplyToTweetId: typing.Optional[int] = None 
  • inReplyToUser: typing.Optional[‘User’] = None 
  • mentionedUsers: typing.Optional[typing.List[‘User’]] = None 
  • coordinates: typing.Optional[‘Coordinates’] = None 
  • place: typing.Optional[‘Place’] = None 
  • hashtags: typing.Optional[typing.List[str]] = None 
  • cashtags: typing.Optional[typing.List[str]] = None 
  • card: typing.Optional[‘Card’] = None 

 

For this tutorial we will not scrape all the fields but a few relevant fields from the above list. 

The search function

Next, we will define a search function which takes in the following inputs as arguments and creates a query string to be passed inside SNS twitter search scraper function. 

  • Text 
  • Username 
  • Since 
  • Until 
  • Retweet 
  • Replies 

 

def search(text,username,since,until,retweet,replies): 

    global filename 

    q = text 

    if username!='': 

        q += f" from:{username}"     

    if until=='': 

        until = datetime.datetime.strftime(datetime.date.today(), '%Y-%m-%d') 

    q += f" until:{until}" 

    if since=='': 

        since = datetime.datetime.strftime(datetime.datetime.strptime(until, '%Y-%m-%d') -  

                                           datetime.timedelta(days=7), '%Y-%m-%d') 

    q += f" since:{since}" 

    if retweet == 'y': 

        q += f" exclude:retweets" 

    if replies == 'y': 

        q += f" exclude:replies" 

    if username!='' and text!='': 

        filename = f"{since}_{until}_{username}_{text}.csv" 

    elif username!="": 

        filename = f"{since}_{until}_{username}.csv" 

    else: 

        filename = f"{since}_{until}_{text}.csv" 

    print(filename) 

    return q 

 

Here we have defined different conditions and based on those conditions we are creating the query string. For example, if variable until (end date) is empty then we are assigning it the current date and appending it in a query string and if the variable since (start date) is empty then we are assigning it a date of past 7 days from the current date. Along with the query string, we are creating filename string which will be used to name our csv file. 

 

 

Calling the Search Function and creating Dataframe 

 

q = search(text,username,since,until,retweet,replies) 

# Creating list to append tweet data  

tweets_list1 = [] 

 

# Using TwitterSearchScraper to scrape data and append tweets to list 

if count == -1: 

    for i,tweet in enumerate(tqdm_notebook(sntwitter.TwitterSearchScraper(q).get_items())): 

        tweets_list1.append([tweet.date, tweet.id, tweet.rawContent, tweet.user.username,tweet.lang, 

        tweet.hashtags,tweet.replyCount,tweet.retweetCount, tweet.likeCount,tweet.quoteCount,tweet.media]) 

else: 

    with tqdm_notebook(total=count) as pbar: 

        for i,tweet in enumerate(sntwitter.TwitterSearchScraper(q).get_items()): #declare a username  

            if i>=count: #number of tweets you want to scrape 

                break 

            tweets_list1.append([tweet. Date, tweet.id, tweet.rawContent, tweet.user.username,tweet.lang,tweet.hashtags,tweet.replyCount, 

                                tweet.retweetCount,tweet.likeCount,tweet.quoteCount,tweet.media]) 

            pbar.update(1) 

# Creating a dataframe from the tweets list above  

tweets_df1 = pd.DataFrame(tweets_list1, columns=['DateTime', 'TweetId', 'Text', 'Username','Language', 

                                'Hashtags','ReplyCount','RetweetCount','LikeCount','QuoteCount','Media']) 

 

 

 

In this snippet we have invoked the search function and the query string is stored inside variable ‘q’. Next, we have defined an empty list which will be used for appending tweet data. If the count is specified as -1 then the for loop will iterate over all the tweets.

TwitterSearchScraper class constructor takes the query string as an argument and then we invoke the get_items() method of TwitterSearchScraper class to retrieve all the tweets. Inside for loop we append scraped data in the tweets_list1 variable which we defined earlier. If count is defined, then we use it to break the for loop. Finally, using this list, we create the pandas dataframe by specifying the column names. 

 

tweets_df1.sort_values(by='DateTime',ascending=False) 
Data frame - Panda's library
Data frame created using Panda’s library

 

Data Preprocessing

Before saving the data frame in a csv file, we will first process the data, so that we can easily perform analysis on it. 

 

 

Data Description 

tweets_df1.info() 
Data frame - Panda's library
Data frame created using Panda’s library

 

Data Transformation 

Now we will add more columns to facilitate timeseries analysis 

tweets_df1['Hour'] = tweets_df1['DateTime'].dt.hour 

tweets_df1['Year'] = tweets_df1['DateTime'].dt.year   

tweets_df1['Month'] = tweets_df1['DateTime'].dt.month 

tweets_df1['MonthName'] = tweets_df1['DateTime'].dt.month_name() 

tweets_df1['MonthDay'] = tweets_df1['DateTime'].dt.day 

tweets_df1['DayName'] = tweets_df1['DateTime'].dt.day_name() 

tweets_df1['Week'] = tweets_df1['DateTime'].dt.isocalendar().week 

 

The Datetime column contains both date and time, therefore it is better to split data and time in separate columns. 

tweets_df1['Date'] = [d.date() for d in tweets_df1['DateTime']] 

tweets_df1['Time'] = [d.time() for d in tweets_df1['DateTime']] 

 

After splitting we will drop the DateTime column. 

tweets_df1.drop('DateTime',axis=1,inplace=True) 

tweets_df1 

 

Finally our data is prepared, we will now save the dataframe as csv using df.to_csv() function which takes filename as an input parameter. 

tweets_df1.to_csv(f"{filename}",index=False)

Visualizing timeseries data using barplot, lineplot, histplot and kdeplot 

It is time to visualize our prepared data so that we can find useful insights. Firstly, we will load the saved csv in a dateframe using the read_csv() function of pandas which take filename as input parameter. 

tweets = pd.read_csv("2018-01-01_2022-09-27_DataScienceDojo.csv") 

tweets 

 

Data frame - Panda's library
Data frame created using Panda’s library

 

Count by Year 

The countplot function of seaborn allows us to plot count of tweets by year. 

f, ax = plt.subplots(figsize=(15, 10)) 

sns.countplot(x= tweets['Year']) 

for p in ax.patches: 

    ax.annotate(int(p.get_height()), (p.get_x()+0.05, p.get_height()+20), fontsize = 12) 

 
Plot count of tweets - Bar graph
Plot count of tweets – Bar graph

 

plt.figure(figsize=(15, 8)) 

 

ax=plt.subplot(221) 

sns.lineplot(tweets.Year.value_counts()) 

ax.set_xlabel("Year") 

ax.set_ylabel('Count') 

plt.xticks(np.arange(2018,2023,1)) 

 

plt.subplot(222) 

sns.histplot(x=tweets.Year,stat='count',binwidth=1,kde='true',discrete=True) 

plt.xticks(np.arange(2018,2023,1)) 

plt.grid() 

 

plt.subplot(223) 

sns.kdeplot(x=tweets.Year,fill=True) 

plt.xticks(np.arange(2018,2023,1)) 

plt.grid() 

 

plt.subplot(224) 

sns.kdeplot(x=tweets.Year,fill=True,bw_adjust=3) 

plt.xticks(np.arange(2018,2023,1)) 

plt.grid() 

 

plt.tight_layout() 

plt.show() 

 

Plot count of tweets - per year
Plot count of tweets – per year

 

Count by Month 

We will follow the same steps for count by month, by week, by day of month and by hour. 

 

f, ax = plt.subplots(figsize=(15, 10)) 

sns.countplot(x= tweets['Month']) 

for p in ax.patches: 

    ax.annotate(int(p.get_height()), (p.get_x()+0.05, p.get_height()+20), fontsize = 12) 

 
Monthly Tweet counts - chart
Monthly Tweet counts – chart

 

plt.figure(figsize=(15, 8)) 

 

ax=plt.subplot(221) 

sns.lineplot(tweets.Month.value_counts()) 

ax.set_xlabel("Month") 

ax.set_ylabel('Count') 

plt.xticks(np.arange(1,13,1)) 

 

plt.subplot(222) 

sns.histplot(x=tweets.Month,stat='count',binwidth=1,kde='true',discrete=True) 

plt.xticks(np.arange(1,13,1)) 

plt.grid() 

 

plt.subplot(223) 

sns.kdeplot(x=tweets.Month,fill=True) 

plt.xticks(np.arange(1,13,1)) 

plt.grid() 

 

plt.subplot(224) 

sns.kdeplot(x=tweets.Month,fill=True,bw_adjust=3) 

plt.xticks(np.arange(1,13,1)) 

plt.grid() 

 

plt.tight_layout() 

plt.show() 

 

Monthly tweets count chart
Monthly tweets count chart

 

 

Count by Week 

f, ax = plt.subplots(figsize=(15, 10)) 

sns.countplot(x= tweets['Week']) 

for p in ax.patches: 

    ax.annotate(int(p.get_height()), (p.get_x()+0.005, p.get_height()+5), fontsize = 10) 

 

Weekly tweets count chart
Weekly tweets count chart

 

 

plt.figure(figsize=(15, 8)) 

 

ax=plt.subplot(221) 

sns.lineplot(tweets.Week.value_counts()) 

ax.set_xlabel("Week") 

ax.set_ylabel('Count') 

 

plt.subplot(222) 

sns.histplot(x=tweets.Week,stat='count',binwidth=1,kde='true',discrete=True) 

plt.grid() 

 

plt.subplot(223) 

sns.kdeplot(x=tweets.Week,fill=True) 

plt.grid() 

 

plt.subplot(224) 

sns.kdeplot(x=tweets.Week,fill=True,bw_adjust=3) 

plt.grid() 

 

plt.tight_layout() 

plt.show()  

 

Weekly tweets count charts
Weekly tweets count charts

 

 

Count by Day of Month 

f, ax = plt.subplots(figsize=(15, 10)) 

sns.countplot(x= tweets['MonthDay']) 

for p in ax.patches: 

    ax.annotate(int(p.get_height()), (p.get_x()+0.05, p.get_height()+5), fontsize = 12) 

 

 

Daily tweets count chart
Daily tweets count chart
plt.figure(figsize=(15, 8)) 

 

ax=plt.subplot(221) 

sns.lineplot(tweets.MonthDay.value_counts()) 

ax.set_xlabel("MonthDay") 

ax.set_ylabel('Count') 

 

plt.subplot(222) 

sns.histplot(x=tweets.MonthDay,stat='count',binwidth=1,kde='true',discrete=True) 

plt.grid() 

 

plt.subplot(223) 

sns.kdeplot(x=tweets.MonthDay,fill=True) 

plt.grid() 

 

plt.subplot(224) 

sns.kdeplot(x=tweets.MonthDay,fill=True,bw_adjust=3) 

plt.grid() 

 

plt.tight_layout() 

plt.show() 

 

 
Daily tweets count charts
Daily tweets count charts

 

 

 

 

 

 

 

Count by Hour 

f, ax = plt.subplots(figsize=(15, 10)) 

sns.countplot(x= tweets['Hour']) 

for p in ax.patches: 

    ax.annotate(int(p.get_height()), (p.get_x()+0.05, p.get_height()+20), fontsize = 12) 
hourly tweets count chart
hourly tweets count chart

 

 

plt.figure(figsize=(15, 8)) 

 

ax=plt.subplot(221) 

sns.lineplot(tweets.Hour.value_counts()) 

ax.set_xlabel("Hour") 

ax.set_ylabel('Count') 

plt.xticks(np.arange(0,24,1)) 

 

plt.subplot(222) 

sns.histplot(x=tweets.Hour,stat='count',binwidth=1,kde='true',discrete=True) 

plt.xticks(np.arange(0,24,1)) 

plt.grid() 

 

plt.subplot(223) 

sns.kdeplot(x=tweets.Hour,fill=True) 

plt.xticks(np.arange(0,24,1)) 

plt.grid() 

 

plt.subplot(224) 

sns.kdeplot(x=tweets.Hour,fill=True,bw_adjust=3) 

#plt.xticks(np.arange(0,24,1)) 

plt.grid() 

 

plt.tight_layout() 

plt.show() 

 

Hourly tweets count charts
Hourly tweets count charts

 

 

Conclusion 

From the above time series visualizations, we can clearly understand that the peak hours of tweets from this account is between 7pm-9pm and from 4am -1pm the twitter handle is quiet. We can also point out that most of the tweets related to that topic are done in the month of August. Similarly, we can identify that the Twitter handle was not very active before 2021.  

Conclusively, we saw how we can easily scrape tweets without using Twitter API through Snscrape. Then we performed some transformations on the scraped data and stored it in csv file. Later, we used that csv file for time-series visualizations and analysis. We appreciate you following along with this hands-on guide. We hope that this guide will make it easy for you to get started on your upcoming data science project. 

<<Link to Complete Code>> 

In this tutorial, you will learn how to create an attractive voice-controlled python chatbot application with a small amount of coding. To build our application we’ll first create a good-looking user interface through the built-in Tkinter library in Python and then we will create some small functions to achieve our task. 

 

Here is a sneak peek of what we are going to create. 

 

Voice controlled chatbot
Voice controlled chatbot using coding in Python – Data Science Dojo

Before kicking off, I hope you already have a brief idea about web scraping, if not then read the following article talking about Python web scraping 

 

PRO-TIP: Join our 5-day instructor-led Python for Data Science training to enhance your deep learning

 

Pre-requirements for building a voice python chatbot

Make sure that you are using Python 3.8+ and the following libraries are installed on it 

  • Pyttsx3 (pyttsx3 is a text-to-speech conversion library in Python) 
  • SpeechRecognition (Library for performing speech recognition) 
  • Requests (The requests module allows you to send HTTP requests using Python) 
  • Bs4 (Beautiful Soup is a library that is used to scrape information from web pages) 
  • pyAudio (With PyAudio, you can easily use Python to play and record audio) 

 

If you are still facing installation errors or incompatibility errors, then you can try downloading specific versions of the above libraries as they are tested and working currently in the application. 

 

  • Python 3.10 
  • pyttsx3==2.90 
  • SpeechRecognition==3.8.1 
  • requests==2.28.1
  • beautifulsoup4==4.11.1 
  • beautifulsoup4==4.11.1 

 

Now that we have set everything it is time to get started. Open a fresh new py file and name it VoiceChatbot.py. Import the following relevant libraries on the top of the file. 

 

  • from tkinter import * 
  • import time
  • import datetime
  • import pyttsx3
  • import speech_recognition as sr
  • from threading import Thread
  • import requests
  • from bs4 import BeautifulSoup 

 

The code is divided into the GUI section, which uses the Tkinter library of python and 7 different functions. We will start by declaring some global variables and initializing instances for text-to-speech and Tkinter. Then we start creating the windows and frames of the user interface. 

 

The user interface 

This part of the code loads images initializes global variables, and instances and then it creates a root window that displays different frames. The program starts when the user clicks the first window bearing the background image. 

 

if __name__ == “__main__”: 

 

#Global Variables 

loading = None
query = None
flag = True
flag2 = True

   

#initalizng text to speech and setting properties 

engine = pyttsx3.init() # Windows voices = engine.getProperty('voices') engine.setProperty('voice', voices[1].id) rate = engine.getProperty('rate') engine.setProperty('rate', rate-10) 

 

#loading images 

    img1= PhotoImage(file='chatbot-image.png') 
    img2= PhotoImage(file='button-green.png') 
    img3= PhotoImage(file='icon.png') 
    img4= PhotoImage(file='terminal.png') 
    background_image=PhotoImage(file="last.png") 
    front_image = PhotoImage(file="front2.png") 

 

#creating root window 

    root=Tk() 
    root.title("Intelligent Chatbot") 
    root.geometry('1360x690+-5+0')
    root.configure(background='white') 

 

#Placing frame on root window and placing widgets on the frame 

    f = Frame(root,width = 1360, height = 690) 
    f.place(x=0,y=0) 
    f.tkraise() 

 

#first window which acts as a button containing the background image 

    okVar = IntVar() 
    btnOK = Button(f, image=front_image,command=lambda: okVar.set(1)) 
    btnOK.place(x=0,y=0) 
    f.wait_variable(okVar) 
    f.destroy()     
    background_label = Label(root, image=background_image) 
    background_label.place(x=0, y=0) 

 

#Frame that displays gif image 

    frames = [PhotoImage(file='chatgif.gif',format = 'gif -index %i' %(i)) for i in range(20)] 
    canvas = Canvas(root, width = 800, height = 596) 
    canvas.place(x=10,y=10) 
    canvas.create_image(0, 0, image=img1, anchor=NW) 

 

#Question button which calls ‘takecommand’ function 

    question_button = Button(root,image=img2, bd=0, command=takecommand) 
    question_button.place(x=200,y=625) 

 

#Right Terminal with vertical scroll 

    frame=Frame(root,width=500,height=596) 
    frame.place(x=825,y=10) 
    canvas2=Canvas(frame,bg='#FFFFFF',width=500,height=596,scrollregion=(0,0,500,900)) 
    vbar=Scrollbar(frame,orient=VERTICAL) 
    vbar.pack(side=RIGHT,fill=Y) 
    vbar.config(command=canvas2.yview) 
    canvas2.config(width=500,height=596, background="black") 
    canvas2.config(yscrollcommand=vbar.set) 
    canvas2.pack(side=LEFT,expand=True,fill=BOTH) 
    canvas2.create_image(0,0, image=img4, anchor="nw") 
    task = Thread(target=main_window) 
    task.start() 
    root.mainloop() 

 

The main window functions 

This is the first function that is called inside a thread. It first calls the wishme function to wish the user. Then it checks whether the query variable is empty or not. If the query variable is empty, then it checks the contents of the query variable. If there is a shutdown or quit or stop word in query, then it calls the shutdown function, and the program exits. Else, it calls the web_scraping function. This function calls another function with the name wishme. 

 

def main_window(): 
    global query 
    wishme() 
    while True: 
        if query != None: 
            if 'shutdown' in query or 'quit' in query or 'stop' in query or 'goodbye' in query: 
                shut_down() 
                break 
            else: 
                web_scraping(query) 
                query = None 

 

The wish me function 

This function checks the current time and greets users according to the hour of the day and it also updates the canvas. The contents in the text variable are passed to the ‘speak’ function. The ‘transition’ function is also invoked at the same time in order to show the movement effect of the bot image, while the bot is speaking. This synchronization is achieved through threads, which is why these functions are called inside threads. 

 

def wishme(): 
    hour = datetime.datetime.now().hour 
    if 0 <= hour < 12: 
        text = "Good Morning sir. I am Jarvis. How can I Serve you?" 
    elif 12 <= hour < 18: 
        text = "Good Afternoon sir. I am Jarvis. How can I Serve you?" 
    else: 
        text = "Good Evening sir. I am Jarvis. How can I Serve you?" 
    canvas2.create_text(10,10,anchor =NW , text=text,font=('Candara Light', -25,'bold italic'), fill="white",width=350) 
    p1=Thread(target=speak,args=(text,)) 
    p1.start() 
    p2 = Thread(target=transition) 
    p2.start() 

 

The speak function 

This function converts text to speech using pyttsx3 engine. 

def speak(text): 
    global flag 
    engine.say(text) 
    engine.runAndWait() 
    flag=False 

 

The transition functions 

The transition function is used to create the GIF image effect, by looping over images and updating them on canvas. The frames variable contains a list of ordered image names.  

 

def transition(): 
    global img1 
    global flag 
    global flag2 
    global frames 
    global canvas 
    local_flag = False 
    for k in range(0,5000): 
        for frame in frames: 
            if flag == False: 
                canvas.create_image(0, 0, image=img1, anchor=NW) 
                canvas.update() 
                flag = True 
                return 
            else: 
                canvas.create_image(0, 0, image=frame, anchor=NW) 
                canvas.update() 
                time.sleep(0.1) 

 

The web scraping function 

This function is the heart of this application. The question asked by the user is then searched on google using the ‘requests’ library of python. The ‘beautifulsoap’ library extracts the HTML content of the page and checks for answers in four particular divs. If the webpage does not contain any of the four divs, then it searches for answers on Wikipedia links, however, if that is also not successful, then the bot apologizes.  

 

def web_scraping(qs): 
    global flag2 
    global loading 
    URL = 'https://www.google.com/search?q=' + qs 
    print(URL) 
    page = requests.get(URL) 
    soup = BeautifulSoup(page.content, 'html.parser') 
    div0 = soup.find_all('div',class_="kvKEAb") 
    div1 = soup.find_all("div", class_="Ap5OSd") 
    div2 = soup.find_all("div", class_="nGphre") 
    div3  = soup.find_all("div", class_="BNeawe iBp4i AP7Wnd") 

    links = soup.findAll("a") 
    all_links = [] 
    for link in links: 
       link_href = link.get('href') 
       if "url?q=" in link_href and not "webcache" in link_href: 
           all_links.append((link.get('href').split("?q=")[1].split("&sa=U")[0])) 

    flag= False 
    for link in all_links: 
       if 'https://en.wikipedia.org/wiki/' in link: 
           wiki = link 
           flag = True 
           break
    if len(div0)!=0: 
        answer = div0[0].text 
    elif len(div1) != 0: 
       answer = div1[0].text+"\n"+div1[0].find_next_sibling("div").text 
    elif len(div2) != 0: 
       answer = div2[0].find_next("span").text+"\n"+div2[0].find_next("div",class_="kCrYT").text 
    elif len(div3)!=0: 
        answer = div3[1].text 
    elif flag==True: 
       page2 = requests.get(wiki) 
       soup = BeautifulSoup(page2.text, 'html.parser') 
       title = soup.select("#firstHeading")[0].text
       paragraphs = soup.select("p") 
       for para in paragraphs: 
           if bool(para.text.strip()): 
               answer = title + "\n" + para.text 
               break 
    else: 
        answer = "Sorry. I could not find the desired results"
    canvas2.create_text(10, 225, anchor=NW, text=answer, font=('Candara Light', -25,'bold italic'),fill="white", width=350) 
    flag2 = False 
    loading.destroy()
    p1=Thread(target=speak,args=(answer,)) 
    p1.start() 
    p2 = Thread(target=transition) 
    p2.start() 

 

The take command function 

This function is invoked when the user clicks the green button to ask any question. The speech recognition library listens for 5 seconds and converts the audio input to text using google recognize API. 

 

def takecommand(): 
    global loading 
    global flag 
    global flag2 
    global canvas2 
    global query 
    global img4 
    if flag2 == False: 
        canvas2.delete("all") 
        canvas2.create_image(0,0, image=img4, anchor="nw")  
    speak("I am listening.") 
    flag= True 
    r = sr.Recognizer() 
    r.dynamic_energy_threshold = True 
    r.dynamic_energy_adjustment_ratio = 1.5 
    #r.energy_threshold = 4000 
    with sr.Microphone() as source: 
        print("Listening...") 
        #r.pause_threshold = 1 
        audio = r.listen(source,timeout=5,phrase_time_limit=5) 
        #audio = r.listen(source) 
 
    try: 
        print("Recognizing..") 
        query = r.recognize_google(audio, language='en-in') 
        print(f"user Said :{query}\n") 
        query = query.lower() 
        canvas2.create_text(490, 120, anchor=NE, justify = RIGHT ,text=query, font=('fixedsys', -30),fill="white", width=350) 
        global img3 
        loading = Label(root, image=img3, bd=0) 
        loading.place(x=900, y=622) 
 
    except Exception as e: 
        print(e) 
        speak("Say that again please") 
        return "None"

 

The shutdown function 

This function farewells the user and destroys the root window in order to exit the program. 

def shut_down(): 
    p1=Thread(target=speak,args=("Shutting down. Thankyou For Using Our Sevice. Take Care, Good Bye.",)) 
    p1.start() 
    p2 = Thread(target=transition) 
    p2.start() 
    time.sleep(7) 
   root.destroy()

 

Conclusion 

It is time to wrap up, I hope you enjoyed our little application. This is the power of Python, you can create small attractive applications in no time with a little amount of code. Keep following us for more cool python projects! 

 

Code - CTA