Natural Language

Natural Language Processing – Tasks and techniques
Muhammad Fahad Alam
| November 7, 2022

This blog discusses the different tasks and techniques used in natural language processing. We will be using python code to demo what and how each task works. We will also discuss why these tasks and techniques are essential for natural language processing. 

 

Introduction

According to a survey, only 32 percent of the business data is put to work, and 68 percent goes unleveraged. Most data are often unstructured. According to estimations, 80 to 90 percent of business data is unstructured, and so are emails, reports, social media posts, websites, and documents. Using NLP techniques, it became possible for machines to manage and analyze unstructured data accurately and quickly.  

Computers can now understand, manipulate, and interpret human language. Businesses use NLP to improve customer experience, listen to customer feedback, and find market gaps. Almost 50% of companies today use NLP applications, and 25% plan to do so in 12 months.   

The future of customer care is NLP. Customers prefer mobile messaging and chatbots over the legacy voice channel. It is four times more accurate. According to the IBM market survey, 52% of global IT professionals reported using or planning to use NLP to improve customer experience. Chatbots can resolve 80% of routine tasks and customer questions with a 90% success rate by 2022. Estimates show that using NLP in chatbots will save companies USD 8 billion annually.     

The NLP market was at 3 billion US dollars in 2017 and is predicted to rise to 43 billion US dollars in 2025, around 14 times higher. 

 

Natural Language Processing (NLP)  

Natural language processing is a branch of artificial intelligence that enables computers to analyze, understand, and drive meaning from a human language using machine learning and respond to it. NLP combines computational linguistics with artificial intelligence and machine learning to create an intelligent system capable of understanding and responding to text or voice data the same way humans do. 

 

NLP analyzes the syntax and semantics of the text to understand the meaning and structure of human language. Then it transforms this linguistic knowledge into a machine-learning algorithm to solve real-world problems and perform specific tasks.   

Natural language is challenging to comprehend, which makes NLP a challenging task. Mastering a language is easy for humans, but implementing NLP becomes difficult for machines because of the ambiguity and imprecision of natural language. 

 

NLP requires syntactic and semantic analysis to convert human language into a machine-readable form that can be processed and interpreted. 

 

Syntactic analysis  

Syntactic analysis is the process of analyzing language with its formal grammatical rules. It is also known as syntax analysis or parsing formal grammatical rules applied to a group of words but not a single word. After verifying the correct syntax, it takes text data as input and creates a structural input representation. It creates a parse tree. A syntactically correct sentence does not necessarily make sense. It needs to be semantically correct to make sense.   

 

Semantic analysis  

Semantic analysis is the process of figuring out the meaning of the text. It enables computers to interpret the words by analyzing sentence structure and the relationship between individual words of the sentence. Because of language’s ambiguous and polysemic nature, semantic analysis is a particularly challenging area of NLP. It analyzes the sentence structure, word interaction, and other aspects to discover the meaning and topic of the text.  

 

NLP tasks and techniques: 

Before proceeding further, ensure you run the below code block to install all the dependencies. 

 

!pip install -U spacy 

!python -m spacy download en 

!pip install nltk 

!pip install prettytable 

Here are some everyday tasks performed in syntactic and semantic analysis:  

 

Tokenization  

Tokenization is a common task in NLP. It separates natural language text into smaller units called tokens. For example, in Sentence tokenization paragraph separates into sentences, and word tokenization splits the words of a sentence.  

 

The code below shows an example of word tokenization using spaCy.   

 

Code:  

import spacy 

nlp = spacy.load("en_core_web_sm") 

doc = nlp("Data Science Dojo is the leading platform providing data science training.") 

for token in doc: 

    print(token.text) 

 

Output: 

 

Data 

Science 

Dojo 

is 

the 

leading 

platform 

providing 

data 

science 

training 

. 

Part-of-speech tagging  

Part of speech or grammatical tagging labels each word as an appropriate part of speech based on its definition and context. POS tagging helps create a parse tree that helps understand word relationships. It also helps in Named Entity Recognition, as most named entities are nouns, making it easier to identify them. 

In the code below, we use pos_ attribute of the token to get the part of speech for the universal pos tag set.   

 

Code:  

import spacy 

from prettytable import PrettyTable 

table = PrettyTable(['Token', 'Part of speech', 'Tag']) 

nlp = spacy.load("en_core_web_sm") 

doc = nlp("Data Science Dojo is the leading platform providing data science training.") 

for token in doc: 

  table.add_row([token.text, token.pos_, token.tag_]) 

print(table) 

 

Output:    

Part of speech tag
Part of speech tag

Demo: 

Try it yourself with this Analyze Text Demo. 

Analyze Text
Analyze Text

 

Dependency and Consistency parsing  

Dependency parsing is how grammatical structure in a sentence is analyzed to find out the related word and their relationship. Each relationship has one head and one dependent. Then, a label based on the nature of dependency is assigned between the head and the dependent.  

Consistency parsing is a process by which phrase structure grammar is identified to visualize the entire syntactic structure.   

In the code below, we created a dependency tree using the displacy visualizer of spacy.  

 

Code:  

 

import spacy 

nlp = spacy.load("en_core_web_sm") 

doc = nlp("Data Science Dojo is the leading platform providing data science training.")         

spacy.displacy.render(doc, style="dep") 

 

Output:  

 output

 

Demo:  

Try it yourself with this Analyze Text Demo. 

 

Lemmatization and stemming  

We use inflected forms of the word when we speak or write. These inflected forms are created by adding prefixes or suffixes to the root form. In the process of lemmatization and stemming, we are grouping similar inflected forms of a word into a single root word. In this way, we link all the words with the same meaning as a single word, which is simpler to analyze by the computer.  

 

The word’s root form in lemmatization is lemma, and in stemming is a stem. Lemmatization and stemming do the same task of grouping inflected forms, but they are different. Lemmatization considers the word and its context in the sentence, while stemming only considers the single word. So, we consider POS tags in lemmatization but not in stemming. That is why lemma is an actual dictionary word, but stem might not be.  

Now we are applying lemmatization using spacy.   

Code:    

 

import spacy 

nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner']) 

doc = nlp("Data Science Dojo is the leading platform providing data science training.") 

lemmatized = [token.lemma_ for token in doc] 

print("Original: \n", doc) 

print("\nAfter Lemmatization: \n", " ".join(lemmatized)) 

 

Output:   

Original 

 Data Science Dojo is the leading platform providing data science training. 

After Lemmatization:  

 Data Science Dojo is the lead platform to provide datum science training.  

 

Unfortunately, spacy does not contain any function for stemming.  

Let us use Porter Stemmer from nltk to see how stemming works.  

 

Code: 

import nltk 

nltk.download('punkt') 

from nltk.stem import PorterStemmer 

from nltk.tokenize import word_tokenize   

ps = PorterStemmer() 

sentence = "Data Science Dojo is the leading platform providing data science training." 

words = word_tokenize(sentence) 

stemmed = [ps.stem(token) for token in words]  

print("Original: \n", " ".join(words)) 

print("\nAfter Stemming: \n", " ".join(stemmed)) 

 

Output:    

Original:  

 Data Science Dojo is the leading platform providing data science training . 

After Stemming:  

 data scienc dojo is the lead platform provid data scienc train . 

 

Stop word removal  

Stop words are the frequent words that are used in any natural language. However, they are not particularly useful for text analysis and NLP tasks. Therefore, we remove them, as they do not play any role in defining the meaning of the text.   

 

Code: 

 

import spacy 

nlp = spacy.load("en_core_web_sm") 

doc = nlp("Data Science Dojo is the leading platform providing data science training.") 

token_list = [ token.text for token in doc ] 

filtered_sentence = [ word for word in token_list if nlp.vocab[word].is_stop == False ]  

print("Tokens:\n",token_list) 

print("\nAfter stop word removal:\n", filtered_sentence)    

 

Output: 

 

Tokens: 

['Data', 'Science', 'Dojo', 'is', 'the', 'leading', 'platform', 'providing', 'data', 'science', 'training', '.'] 

 

After stop word removal: 

['Data', 'Science', 'Dojo', 'leading', 'platform', 'providing', 'data', 'science', 'training', '.'] 

 

Demo: 

Try it yourself with this Cleanse Stop Words Demo. 

Cleanse Stop Word Demo
Cleanse Stop Word Demo

 

Named entity recognition  

Named entity recognition is an NLP technique that extracts named entities from the text and categorizes them into semantic types like organization, people, quantity, percentage, location, time, etc. Identifying named entities helps identify the critical element in the text, which can help sort the unstructured data and find valuable information.   

 

Code: 

 

import spacy 

from prettytable import PrettyTable 

nlp = spacy.load("en_core_web_sm") 

doc = nlp("Data Science Dojo was founded in 2013 but it was a free Meetup group long before the official launch. With the aim to bring the knowledge of data science to everyone, we started hosting short Bootcamps with the most comprehensive curriculum. In 2019, the University of New Mexico (UNM) added our Data Science Bootcamp to their continuing education department. Since then, we've launched various other trainings such as Python for Data Science, Data Science for Managers and Business Leaders. So far, we have provided our services to more than 10,000 individuals and over 2000 organizations.") 

table = PrettyTable(["Entity", "Start Position", "End Position", "Label"]) 

for ent in doc.ents: 

    table.add_row([ent.text, ent.start_char, ent.end_char, ent.label_]) 

print(table) 

spacy.displacy.render(doc, style="ent") 

 

Output:   

 

Named Entity
Named Entity

Visualization:   

 

Named Entity Visual
Named Entity Visual

 

Demo: 

Try it yourself with this Text Entity Extractor Demo. 

 

Text Entity Extractor Demo
Text Entity Extractor Demo

 

Sentiment analysis 

Sentiment analysis, also referred to as opinion mining, uses natural language processing to find and extract sentiments from the text. It determines whether the data is positive, negative, or neutral.  

 

Some of the real-world applications of sentiment analysis are:  

  • Customer support  
  • Customer feedback  
  • Brand monitoring  
  • Product analysis  
  • Market research  

 

Demo: 

Try it yourself with this Opinion Mining Demo. 

 

Opinion Mining Demo
Opinion Mining Demo

Conclusion:  

We have discussed natural language processing and what common tasks it performs in natural language processing. Then, we saw how we can perform different functions in spacy and nltk and why they are essential in natural language processing.   

Full Code Available 

 We know about the different tasks and techniques we perform in natural language processing, but we have yet to discuss the applications of natural language processing. For that, you can follow this blog. 

Read more about: 

Blog: NLP Applications

 

Upgrade your data science skillset with our Python for Data Science and Data Science Bootcamp training!  

 

Applications of Natural Language Processing 
Fahad Alam
| September 8, 2022

This blog will discuss the different Natural Language Processing applications. We will see the applications and what problems they solve in our daily life. 

 Introduction   

One of the essential things in the life of a human being is communication. We need to communicate with other human beings to deliver information, express our emotions, present ideas, and much more. The key to communication is language. We need a common language to communicate, which both ends of the conversation can understand. Doing this is possible for humans, but it might seem a bit difficult if we talk about communicating with a computer system or the computer system communicating with us. 

But we have a solution for that, Artificial Intelligence, or more specifically, a branch of Artificial Intelligence known as Natural Language Processing (NLP). Natural Language Processing enables the computer system to understand and comprehend information the same way humans do. It helps the computer system understand the literal meaning and recognize the sentiments, tone, opinions, thoughts, and other components that construct a proper conversation. 

Natural Language Processing (NLP)
Applications of Natural Language Processing

After making the computer understand human language, a question arises in our minds, how can we utilize this ability of a computer to benefit humankind? 

Natural Language Processing Applications: 

Let’s answer this question by going over some Natural Language Processing applications and understanding how they decrease our workload and help us complete many time-taking tasks more quickly and efficiently. 

1. Email filtering 

Email is a part of our everyday life. Whether it is related to work or studies or many other things, we find ourselves plunged into the pile of emails. We receive all kinds of emails from various sources; some are work-related or from our dream school or university, while others are spam or promotional emails. Here Natural Language Processing comes to work. It identifies and filters incoming emails into “important” or “spam” and places them into their respective designations.

2. Language translation 

There are as many languages in this world as there are cultures, but not everyone understands all these languages. As our world is now a global village owing to the dawn of technology, we need to communicate with other people who speak a language that might be foreign to us. Natural Language processing helps us by translating the language with all its sentiments.  

3. Smart assistants 

In today’s world, every new day brings in a new smart device, making this world smarter and smarter by the day. And this advancement is not just limited to machines. We have advanced enough technology to have smart assistants, such as Siri, Alexa, and Cortana. We can talk to them like we talk to normal human beings, and they even respond to us in the same way.

All of this is possible because of Natural Language Processing. It helps the computer system understand our language by breaking it into parts of speech, root stem, and other linguistic features. It not only helps them understand the language but also in processing its meaning and sentiments and answering back in the same way humans do. 

 4. Document analysis 

Another one of NLP’s applications is document analysis. Companies, colleges, schools, and other such places are always filled to the brim with data, which needs to be sorted out properly, maintained, and searched for. All this could be done using NLP. It not only searches a keyword but also categorizes it according to the instructions and saves us from the long and hectic work of searching for a single person’s information from a pile of files. It is not only limited to this but also helps its user to inform decision-making on claims and risk management. 

5. Online searches 

In this world full of challenges and puzzles, we must constantly find our way by getting the required information from available sources. One of the most extensive information sources is the internet. We type what we want to search and checkmate! We have got what we wanted. But have you ever thought about how you get these results even when you do not know the exact keywords you need to search for the needed information? Well, the answer is obvious.

It is again Natural Language Processing. It helps search engines understand what is asked of them by comprehending the literal meaning of words and the intent behind writing that word, hence giving us the results, we want. 

 6. Predictive text 

A similar application to online searches is predictive text. It is something we use whenever we type anything on our smartphones. Whenever we type a few letters on the screen, the keyboard gives us suggestions about what that word might be and when we have written a few words, it starts suggesting what the next word could be. These predictive texts might be a little off in the beginning.

Still, as time passes, it gets trained according to our texts and starts to suggest the next word correctly even when we have not written a single letter of the next word. All this is done using NLP by making our smartphones intelligent enough to suggest words and learn from our texting habits. 

7. Automatic summarization 

With the increasing inventions and innovations, data has also increased. This increase in data has also expanded the scope of data processing. Still, manual data processing is time taking and is prone to error. NLP has a solution for that, too, it can not only summarize the meaning of information, but it can also understand the emotional meaning hidden in the information. Thus, making the summarization process quick and impeccable. 

 8. Sentiment analysis 

The daily conversations, the posted content and comments, book, restaurant, and product reviews, hence almost all the conversations and texts are full of emotions. Understanding these emotions is as important as understanding the word-to-word meaning. We as humans can interpret emotional sentiments in writings and conversations, but with the help of natural language processing, computer systems can also understand the sentiments of a text along with its literal meaning. 

 9. Chatbots  

With the increase in technology, everything has been digitalized, from studying to shopping, booking tickets, and customer service. Instead of waiting a long time to get some short and instant answers, the chatbot replies instantly and accurately. NLP gives these chatbots conversational capabilities, which help them respond appropriately to the customer’s needs instead of just bare-bones replies.

Chatbots also help in places where human power is less or is not available round the clock. Chatbots operating on NLP also have emotional intelligence, which helps them understand the customer’s emotional sentiments and respond to them effectively. 

 10. Social media monitoring   

Nowadays, every other person has a social media account where they share their thoughts, likes, dislikes, experiences, etc., which tells a lot about the individuals. We do not only find information about individuals but also about the products and services. The relevant companies can process this data to get information about their products and services to improve or amend them. NLP comes into play here. It enables the computer system to understand unstructured social media data, analyze it and produce the required results in a valuable form for companies.

Conclusion: 

We now understand that NLP has many applications, spreading its wings in almost every field. Help decrease manual labor and do the tasks accurately and efficiently. 

5 original R programming books to upskill natural language processing
Dave Langer
| April 4, 2017

Natural Language Processing is a key Data Science skill. Learn how to expand your R programming knowledge with Text Analytics.

It is my firm conviction that Natural Language Processing/Text Analytics is a must-have skill for any practicing Data Scientist.

From analyzing customer feedback in NSAT surveys to scraping Microsoft’s internal job postings for analyzing popular technical skills, to segmenting customers via textual features, I have universally found that Text Analytics is a wildly useful skill.

Sources to learn R programming

Not surprisingly, I am often asked by students of our Data Science Bootcamp, folks that I mentor on Data Science and my LinkedIn contacts about the subject of Text Analytics. The good news is that there are many great resources for the R programmer to learn Text Analytics.

What follows is a practical curriculum where the only required knowledge is basic R programming skills. I have read all of the books referenced below and can attest that studying the curriculum will have you master Text Analytics in no time!

Text Analytics with R for Students of Literature

Text Analytics with R for Students of Literature
Book cover of Text Analytics with R for Students of Literature by Matthew L. Jockers

is quite simply the best, most straightforward introduction to working with text that I have found. Professor Jockers illustrates many of the fundamentals using out of the box R programming. This book provides a great foundation for anyone looking to get started in Text Analytics with R.

Taming Text

Taming Text
Book cover of Taming Text by Grant, Thomas, and Andrew

is the next stop on the Text Analytics journey. While this book is primarily written for Java programmers, there is a lot of theory that is immensely useful for R programmers learning to work with text. Additionally, the book covers the OpenNLP Java library which is available to R programmers via the excellent openNLP package.

R Logo
R programming logo

The CRAN NLP Task View illustrates the wide-ranging Text Analytics support for the R programmer. Unfortunately, it also illustrates that the landscape is fractured as well. However, a couple of packages are worthy of study. The tm package is often the go-to Text Analytics package for R programmers. However, the new quanteda package shows a lot of promise. Lastly, the excellent openNLP package deserves a second callout.

Introduction to Information Retrieval for Text Analytics

Introduction to Information Retrieval for Text Analytics
Book cover of Introduction to Information Retrieval for Text Analytics by Christopher, Prabhakar, and Hinrich

while focused primarily on the problem of search, nevertheless, contains a wealth of theory and understanding (e.g., the Vector Space Model) to take the R programmer to the next level. The text is language agnostic, is quite excellent, and free!

Top-Books-on-Natural-Language-Processing-with-Python
Top-Books-on-Natural-Language-Processing_with-Python

While the Natural Language Toolkit (NLTK) is Python-based, the book on the subject of NLP is a wealth of goodness to the R programmer. I put this resource last in the list as learning the above conceptual material and R packages provides the necessary background to translate some of the concepts (e.g., chunking) into the R context. Awesome stuff and free to boot!

There you have it, a practical curriculum for the R programmer to ramp into Text Analytics. Don’t hesitate to reach out if you have any questions or comments – I monitor my blog almost continually.

Until next time, happy data sleuthing!

Watch our video tutorials on text analytics.

Related Topics

Up for a Weekly Dose of Data Science?

Subscribe to our weekly newsletter & stay up-to-date with current data science news, blogs, and resources.