Interested in a hands-on learning experience for developing LLM applications?
Join our LLM Bootcamp today and Get 30% Off for a Limited Time!

5 original R programming books to upskill natural language processing

June 15, 2022

Natural Language Processing is a key Data Science skill. Learn how to expand your knowledge with R programming books on Text Analytics.

It is my firm conviction that Natural Language Processing/Text Analytics is a must-have skill for any practicing Data Scientist.

From analyzing customer feedback in NSAT surveys to scraping Microsoft’s internal job postings for analyzing popular technical skills to segmenting customers via textual features, I have universally found that Text Analytics is a wildly useful skill.

R programming books – Sources to learn from

Not surprisingly, I am often asked by students of our Data Science Bootcamp, folks that I mentor on Data Science and my LinkedIn contacts about the subject of Text Analytics. The good news is that there are many great resources for the R programmer to learn Text Analytics.

What follows is a practical curriculum where the only required knowledge is basic R programming skills. I have read all of the books referenced below and can attest that studying the curriculum will have you mastering Text Analytics in no time!

Text Analytics with R for Students of Literature

Text Analytics with R for Students of Literature
Book cover of Text Analytics with R for Students of Literature by Matthew L. Jockers

is quite simply the best, most straightforward introduction to working with text that I have found. Professor Jockers illustrates many of the fundamentals using out of the box R programming. This book provides a great foundation for anyone looking to get started in Text Analytics with R.

Taming Text

Taming Text
Book cover of Taming Text by Grant, Thomas, and Andrew

is the next stop on the Text Analytics journey. While this book is primarily written for Java programmers, there is a lot of theory that is immensely useful for R programmers learning to work with text. Additionally, the book covers the OpenNLP Java library which is available to R programmers via the excellent openNLP package.

R Logo
R programming logo

The CRAN NLP Task View illustrates the wide-ranging Text Analytics support for the R programmer. Unfortunately, it also illustrates that the landscape is fractured as well. However, a couple of packages are worthy of study. The tm package is often the go-to Text Analytics package for R programmers. However, the new quanteda package shows a lot of promise. Lastly, the excellent openNLP package deserves a second callout.

Introduction to Information Retrieval for Text Analytics

Introduction to Information Retrieval for Text Analytics
Book cover of Introduction to Information Retrieval for Text Analytics by Christopher, Prabhakar, and Hinrich

while focused primarily on the problem of search, nevertheless, contains a wealth of theory and understanding (e.g., the Vector Space Model) to take the R programmer to the next level. The text is language agnostic, is quite excellent, and free!

Top-Books-on-Natural-Language-Processing-with-Python
Top-Books-on-Natural-Language-Processing_with-Python

While the Natural Language Toolkit (NLTK) is Python-based, the book on the subject of NLP is a wealth of goodness to the R programmer. I put this resource last in the list as learning the above conceptual material and R packages provides the necessary background to translate some of the concepts (e.g., chunking) into the R context. Awesome stuff, and free to boot!

There you have it, a practical curriculum for the R programmer to ramp into Text Analytics. Don’t hesitate to reach out if you have any questions or comments – I monitor my blog almost continually.

Until next time, happy data sleuthing!

Watch our video tutorials on text analytics.

 

 

Written by Dave Langer

Data Science Dojo | data science for everyone

Discover more from Data Science Dojo

Subscribe to get the latest updates on AI, Data Science, LLMs, and Machine Learning.