In this tutorial, you will learn how to create an attractive voice-controlled python chatbot application with a small amount of coding. To build our application we’ll first create a good-looking user interface through the built-in Tkinter library in Python and then we will create some small functions to achieve our task.

Here is a sneak peek of what we are going to create.

Voice controlled chatbot using coding in Python – Data Science Dojo

Before kicking off, I hope you already have a brief idea about web scraping, if not then read the following article talking about Python web scraping.

PRO-TIP: Join our 5-day instructor-led Python for Data Science training to enhance your deep learning

Pre-requirements for building a voice python chatbot

Make sure that you are using Python 3.8+ and the following libraries are installed on it

Pyttsx3 (pyttsx3 is a text-to-speech conversion library in Python)
SpeechRecognition (Library for performing speech recognition)
Requests (The requests module allows you to send HTTP requests using Python)
Bs4 (Beautiful Soup is a library that is used to scrape information from web pages)
pyAudio (With PyAudio, you can easily use Python to play and record audio)

If you are still facing installation errors or incompatibility errors, then you can try downloading specific versions of the above libraries as they are tested and working currently in the application.

Python 3.10
pyttsx3==2.90
SpeechRecognition==3.8.1
requests==2.28.1
beautifulsoup4==4.11.1
beautifulsoup4==4.11.1

Now that we have set everything it is time to get started. Open a fresh new py file and name it VoiceChatbot.py. Import the following relevant libraries on the top of the file.

from tkinter import *
import time
import datetime
import pyttsx3
import speech_recognition as sr
from threading import Thread
import requests
from bs4 import BeautifulSoup

The code is divided into the GUI section, which uses the Tkinter library of python and 7 different functions. We will start by declaring some global variables and initializing instances for text-to-speech and Tkinter. Then we start creating the windows and frames of the user interface.

The user interface

This part of the code loads images initializes global variables, and instances and then it creates a root window that displays different frames. The program starts when the user clicks the first window bearing the background image.

if __name__ == “__main__”:

#Global Variables

loading = None
query = None
flag = True
flag2 = True

#initalizng text to speech and setting properties

engine = pyttsx3.init() # Windows voices = engine.getProperty('voices') engine.setProperty('voice', voices[1].id) rate = engine.getProperty('rate') engine.setProperty('rate', rate-10)

#loading images

    img1= PhotoImage(file='chatbot-image.png') 
    img2= PhotoImage(file='button-green.png') 
    img3= PhotoImage(file='icon.png') 
    img4= PhotoImage(file='terminal.png') 
    background_image=PhotoImage(file="last.png") 
    front_image = PhotoImage(file="front2.png")

#creating root window

    root=Tk() 
    root.title("Intelligent Chatbot") 
    root.geometry('1360x690+-5+0')
    root.configure(background='white')

#Placing frame on root window and placing widgets on the frame

    f = Frame(root,width = 1360, height = 690) 
    f.place(x=0,y=0) 
    f.tkraise()

#first window which acts as a button containing the background image

    okVar = IntVar() 
    btnOK = Button(f, image=front_image,command=lambda: okVar.set(1)) 
    btnOK.place(x=0,y=0) 
    f.wait_variable(okVar) 
    f.destroy()     
    background_label = Label(root, image=background_image) 
    background_label.place(x=0, y=0)

#Frame that displays gif image

    frames = [PhotoImage(file='chatgif.gif',format = 'gif -index %i' %(i)) for i in range(20)] 
    canvas = Canvas(root, width = 800, height = 596) 
    canvas.place(x=10,y=10) 
    canvas.create_image(0, 0, image=img1, anchor=NW)

#Question button which calls ‘takecommand’ function

    question_button = Button(root,image=img2, bd=0, command=takecommand) 
    question_button.place(x=200,y=625)

#Right Terminal with vertical scroll

    frame=Frame(root,width=500,height=596) 
    frame.place(x=825,y=10) 
    canvas2=Canvas(frame,bg='#FFFFFF',width=500,height=596,scrollregion=(0,0,500,900)) 
    vbar=Scrollbar(frame,orient=VERTICAL) 
    vbar.pack(side=RIGHT,fill=Y) 
    vbar.config(command=canvas2.yview) 
    canvas2.config(width=500,height=596, background="black") 
    canvas2.config(yscrollcommand=vbar.set) 
    canvas2.pack(side=LEFT,expand=True,fill=BOTH) 
    canvas2.create_image(0,0, image=img4, anchor="nw") 
    task = Thread(target=main_window) 
    task.start() 
    root.mainloop()

The main window functions

This is the first function that is called inside a thread. It first calls the wishme function to wish the user. Then it checks whether the query variable is empty or not. If the query variable is empty, then it checks the contents of the query variable. If there is a shutdown or quit or stop word in query, then it calls the shutdown function, and the program exits. Else, it calls the web_scraping function. This function calls another function with the name wishme.

def main_window(): 
    global query 
    wishme() 
    while True: 
        if query != None: 
            if 'shutdown' in query or 'quit' in query or 'stop' in query or 'goodbye' in query: 
                shut_down() 
                break 
            else: 
                web_scraping(query) 
                query = None

The wish me function

This function checks the current time and greets users according to the hour of the day and it also updates the canvas. The contents in the text variable are passed to the ‘speak’ function. The ‘transition’ function is also invoked at the same time in order to show the movement effect of the bot image, while the bot is speaking. This synchronization is achieved through threads, which is why these functions are called inside threads.

def wishme(): 
    hour = datetime.datetime.now().hour 
    if 0 <= hour < 12: 
        text = "Good Morning sir. I am Jarvis. How can I Serve you?" 
    elif 12 <= hour < 18: 
        text = "Good Afternoon sir. I am Jarvis. How can I Serve you?" 
    else: 
        text = "Good Evening sir. I am Jarvis. How can I Serve you?" 
    canvas2.create_text(10,10,anchor =NW , text=text,font=('Candara Light', -25,'bold italic'), fill="white",width=350) 
    p1=Thread(target=speak,args=(text,)) 
    p1.start() 
    p2 = Thread(target=transition) 
    p2.start()

The speak function

This function converts text to speech using pyttsx3 engine.

def speak(text): 
    global flag 
    engine.say(text) 
    engine.runAndWait() 
    flag=False

The transition functions

The transition function is used to create the GIF image effect, by looping over images and updating them on canvas. The frames variable contains a list of ordered image names.

def transition(): 
    global img1 
    global flag 
    global flag2 
    global frames 
    global canvas 
    local_flag = False 
    for k in range(0,5000): 
        for frame in frames: 
            if flag == False: 
                canvas.create_image(0, 0, image=img1, anchor=NW) 
                canvas.update() 
                flag = True 
                return 
            else: 
                canvas.create_image(0, 0, image=frame, anchor=NW) 
                canvas.update() 
                time.sleep(0.1)

The web scraping function

This function is the heart of this application. The question asked by the user is then searched on google using the ‘requests’ library of python. The ‘beautifulsoap’ library extracts the HTML content of the page and checks for answers in four particular divs. If the webpage does not contain any of the four divs, then it searches for answers on Wikipedia links, however, if that is also not successful, then the bot apologizes.

def web_scraping(qs): 
    global flag2 
    global loading 
    URL = 'https://www.google.com/search?q=' + qs 
    print(URL) 
    page = requests.get(URL) 
    soup = BeautifulSoup(page.content, 'html.parser') 
    div0 = soup.find_all('div',class_="kvKEAb") 
    div1 = soup.find_all("div", class_="Ap5OSd") 
    div2 = soup.find_all("div", class_="nGphre") 
    div3  = soup.find_all("div", class_="BNeawe iBp4i AP7Wnd") 

    links = soup.findAll("a") 
    all_links = [] 
    for link in links: 
       link_href = link.get('href') 
       if "url?q=" in link_href and not "webcache" in link_href: 
           all_links.append((link.get('href').split("?q=")[1].split("&sa=U")[0])) 

    flag= False 
    for link in all_links: 
       if 'https://en.wikipedia.org/wiki/' in link: 
           wiki = link 
           flag = True 
           break
    if len(div0)!=0: 
        answer = div0[0].text 
    elif len(div1) != 0: 
       answer = div1[0].text+"\n"+div1[0].find_next_sibling("div").text 
    elif len(div2) != 0: 
       answer = div2[0].find_next("span").text+"\n"+div2[0].find_next("div",class_="kCrYT").text 
    elif len(div3)!=0: 
        answer = div3[1].text 
    elif flag==True: 
       page2 = requests.get(wiki) 
       soup = BeautifulSoup(page2.text, 'html.parser') 
       title = soup.select("#firstHeading")[0].text
       paragraphs = soup.select("p") 
       for para in paragraphs: 
           if bool(para.text.strip()): 
               answer = title + "\n" + para.text 
               break 
    else: 
        answer = "Sorry. I could not find the desired results"
    canvas2.create_text(10, 225, anchor=NW, text=answer, font=('Candara Light', -25,'bold italic'),fill="white", width=350) 
    flag2 = False 
    loading.destroy()
    p1=Thread(target=speak,args=(answer,)) 
    p1.start() 
    p2 = Thread(target=transition) 
    p2.start()

The take command function

This function is invoked when the user clicks the green button to ask any question. The speech recognition library listens for 5 seconds and converts the audio input to text using google recognize API.

def takecommand(): 
    global loading 
    global flag 
    global flag2 
    global canvas2 
    global query 
    global img4 
    if flag2 == False: 
        canvas2.delete("all") 
        canvas2.create_image(0,0, image=img4, anchor="nw")  
    speak("I am listening.") 
    flag= True 
    r = sr.Recognizer() 
    r.dynamic_energy_threshold = True 
    r.dynamic_energy_adjustment_ratio = 1.5 
    #r.energy_threshold = 4000 
    with sr.Microphone() as source: 
        print("Listening...") 
        #r.pause_threshold = 1 
        audio = r.listen(source,timeout=5,phrase_time_limit=5) 
        #audio = r.listen(source) 
 
    try: 
        print("Recognizing..") 
        query = r.recognize_google(audio, language='en-in') 
        print(f"user Said :{query}\n") 
        query = query.lower() 
        canvas2.create_text(490, 120, anchor=NE, justify = RIGHT ,text=query, font=('fixedsys', -30),fill="white", width=350) 
        global img3 
        loading = Label(root, image=img3, bd=0) 
        loading.place(x=900, y=622) 
 
    except Exception as e: 
        print(e) 
        speak("Say that again please") 
        return "None"

The shutdown function

This function farewells the user and destroys the root window in order to exit the program.

def shut_down(): 
    p1=Thread(target=speak,args=("Shutting down. Thankyou For Using Our Sevice. Take Care, Good Bye.",)) 
    p1.start() 
    p2 = Thread(target=transition) 
    p2.start() 
    time.sleep(7) 
   root.destroy()

Conclusion

It is time to wrap up, I hope you enjoyed our little application. This is the power of Python, you can create small attractive applications in no time with a little amount of code. Keep following us for more cool python projects!

LLM - Online Courses

Reviews

Consulting

Community

voice chatbot

Syed Umair Hasan

Create a voice controlled python chatbot using web scraping