fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Automated Web Scraping Using rvest in R

Agenda

There are many blogs and tutorials that teach you how to scrape data from a bunch of web pages once and then you’re done. But one-off web scraping is not useful for many applications that require sentiment analysis on recent or timely content or capturing changing events and commentary or analyzing trends in real-time. As fun as it is to do an academic exercise of web scraping for one-off analysis on historical data, it is not useful when wanting to use timely or frequently updated data.
You would like to tap into news sources to analyze the political events that are changing by the hour and people’s comments on these events. These events could be analyzed to summarize the key discussions and debates in the comments, rate the overall sentiment of the comments, find the key themes in the headlines, see how events and commentary change over time, and more. You need a collection of recent political events or news scraped every hour so that you can analyze these events.

What you’ll learn

  • To automatically web scrape using rvest periodically so you can analyze frequently updated data
  • To write standard web scraping commands in R using rvest, filtering timely data, analyzing or summarizing key information in the text, and sending an email alert of the results of your analysis
  • To set up a script to run every hour so that text is scraped and analyzed periodically to capture changing events and commentary, or analyze trends in real-time
Rebecca 500X500 v2 compressed
Rebecca Merrett
Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.

We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!

Resources

Refer to this repository for R code, scripts, and supplemental materials.