fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Facebook

Manthan Koolwal
Manthan Koolwal
| February 27

These days social platforms are quite popular. Websites like YouTube, Facebook, Instagram, etc. are used widely by billions of people.  These websites have a lot of data that can be used for sentiment analysis against any incident, election prediction, result prediction of any big event, etc. If you have this data, you can analyze the risk of any decision.

In this post, we are going to web-scrape public Facebook pages using Python and Selenium. We will also discuss the libraries and tools required for the process. So, if you’re interested in web scraping and data analysis, keep reading!

Facebook scraping with Python

Read more about web scraping with Python and BeautifulSoup and kickstart your analysis today.   

What do we need before writing the code? 

We will use Python 3.x for this tutorial, and I am assuming that you have already installed it on your machine. Other than that, we need to install two III-party libraries BeautifulSoup and Selenium. 

  • BeautifulSoup — This will help us parse raw HTML and extract the data we need. It is also known as BS4. 
  • Selenium — It will help us render JavaScript websites. 
  • We also need chromium to render websites using Selenium API. You can download it from here. 

 

Before installing these libraries, you have to create a folder where you will keep the python script. 

Now, create a python file inside this folder. You can use any name and then finally, install these libraries. 

What will we extract from a Facebook page? 

We are going to scrape addresses, phone numbers, and emails from our target page. 

First, we are going to extract the raw HTML using Selenium from the Facebook page and then we are going to use. find() and .find_all() methods of BS4 to parse this data out of the raw HTML. Chromium will be used in coordination with Selenium to load the website. 

Read about: How to scrape Twitter data without Twitter API using SNScrape. 

Let’s start scraping  

Let’s first write a small code to see if everything works fine for us. 

Let’s understand the above code step by step. 

  • We have imported all the libraries that we installed earlier. We have also imported the time library. It will be used for the driver to wait a little more before closing the chromium driver. 
  • Then we declared the PATH of our chromium driver. This is the path where you have kept the chromedriver. 
  • One empty list and an object to store data. 
  • target_url holds the page we are going to scrape. 
  • Then using .Chrome() method we are going to create an instance for website rendering. 
  • Then using .get() method of Selenium API we are going to open the target page. 
  • .sleep() method will pause the script for two seconds. 
  • Then using .page_source we collect all the raw HTML of the page. 
  • .close() method will close down the chrome instance. 

 

Once you run this code it will open a chrome instance, then it will open the target page and then after waiting for two seconds the chrome instance will be closed. For the first time, the chrome instance will open a little slow but after two or three times it will work faster. 

Once you inspect the page you will find that the intro section, contact detail section, and photo gallery section all have the same class names

with a div. But since for this tutorial, our main focus is on contact details therefore we will focus on the second div tag. 

Let’s find this element using the .find() method provided by the BS4 API. 

We have created a parse tree using BeautifulSoup and now we are going to extract crucial data from it. 

Using .find_all() method we are searching for all the div tags with class


and then we selected the second element from the list.
 

Now, here is a catch. Every element in this list has the same class and tag. So, we have to use regular expressions in order to find the information we need to extract. 

Let’s find all of these element tags and then later we will use a for loop to iterate over each of these elements to identify which element is what. 

Here is how we will identify the address, number, and email. 

  • The address can be identified if the text contains more than two commas. 
  • The number can be identified if the text contains more than two dash(-). 
  • Email can be identified if the text contains “@” in it. 

We ran a for loop on allDetails variable. Then we are one by one identifying which element is what. Then finally if they satisfy the if condition we are storing it in the object o. 

In the end, you can append the object o in the list l and print it. 

Once you run this code you will find this result. 

Complete Code 

We can make further changes to this code to scrape more information from the page. But for now, the code will look like this. 

Conclusion 

Today we scraped the Facebook page to collect emails for lead generation. Now, this is just an example of scraping a single page. If you have thousands of pages, then we can use the Pandas library to store all the data in a CSV file. I leave this task for you as homework. 

I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on your social media. 

Data Science Dojo
Ayesha Saleem
| September 13

In today’s blog, we will try to understand the working behind social media algorithms and focus on the top 6 social media platforms. Algorithms are a part of machine learning which has also become a key area to measure success of digital marketing; these are written by coders to learn human actions. It specifies the behavior of data by using a mathematical set of rules 

According to the latest data for 2022, users worldwide spend 147 minutes, on average every day on social media. The use of social media is booming with every passing day. We get hooked up on the content of our interest. But you cannot deny that it is often surprising to experience the content we just discussed with our friends or family.  

Social Media algorithms

Social media algorithms sort posts on a user’s feed based on their interest rather than the publishing time. Every content creator desires to get the maximum impressions on their social media postings or their marketing campaigns. That’s where the need to develop quality content comes in. Social media users only experience the content that the algorithms figure out to be most relevant for them.  

1. Insights into Facebook algorithm 

Facebook

Facebook had 2.934 billion monthly active users in July 2022.  

Anna Stepanov, Head of Facebook App Integrity said “News Feed uses personalized ranking, which considers thousands of unique signals to understand what’s most meaningful to you. Our aim isn’t to keep you scrolling on Facebook for hours on end, but to give you an enjoyable experience that you want to return to.” 

On Facebook, which means that the average reach for an organic post is down over 5 percent while the engagement rate is just 0.25 percent which drops to 0.08 percent if you have over 100k followers. 

Facebook’s algorithm is not static, it has evolved over the years with the objective to keep its users engaged with the platform. In 2022, Facebook adopted the idea of showing stories to users instead of news, like before. So, what we see on Facebook is no longer a newsfeed but “feed” only. 

Further, it works mainly on 3 ranking signals: 

  • Interactivity:

The more you interact with the posts from one of your friends or family members, Facebook is going to show you their activities relatively more on your feed.  

  • Interest:

If you like content about cars or automobiles, there’s a high chance Facebook algorithm will push relevant posts to your feed. This happens because we search, like, interact or spend most of our time seeing the content we like.  

  • Impressions:

Viral or popular content becomes a part of everyone’s Facebook. That’s because the Facebook algorithm promotes content that is in general liked by its users. So, you’re also more likely to see what’s everyone talking about today.  

2. How does YouTube algorithm work 

Youtube

There are 2.1 billion monthly active YouTube users worldwide. When you open YouTube, you see multiple streaming options. YouTube says that in 2022, homepages and suggested videos are usually the top sources of traffic for most channels. 

The broad selection is narrowed on the user homepage on the basis of two main types of ranking signals.  

  • Performance:

When a video is uploaded on YouTube, the algorithm evaluates it on the basis of a few key metrics: 

  • Click-through rate 
  • Average view duration 
  • Average percentage viewed 
  • Likes and dislikes 
  • Viewer surveys 

If a video gains good viewership and engagement by the regular followers of the channel, then the YouTube algorithm will offer that video to more users on YouTube.  

  • Personalization:

The second-ranking signal for YouTube is personalization. In case you love watching DIY videos, YouTube algorithm processes to keep you hooked on the platform by suggesting interesting DIY videos to you.  

Personalization works based on a user’s watch history or the channels you subscribed to lately. It tracks your past behavior and figures out your most preferred streaming options.  

Lastly, you must not forget that YouTube acts as a search engine too. So, what you type in the search bar plays a major role in shortlisting the top videos for you.  

3. Instagram algorithm explained  

Instagram

In July 2022, Instagram reached 1.440 billion users around the world according to the global advertising audience reach numbers.  

The main content on Instagram revolves around posts, stories, and reels. Instagram CEO Adam Mosseri said, “We want to make the most of your time, and we believe that using technology [the Instagram algorithm] to personalize your experience is the best way to do that.” 

Let’s shed some light to the Instagram’s top 3 ranking factors for year 2022: 

  • Interactivity:

Every account holder or influencer on Instagram runs after followers. Because that’s the core to getting your content viewed by the users. To get something on our Instagram feed we need to follow other accounts. As much as our interaction with someone’s content occurs, we will be able to see more of their postings.  

  • Interest:

This ranking factor has more influence on reels feed and explore page. The more you show interest in watching a specific type of content and tap on it, the more of that category will be shown to you. And it’s not essential to follow someone to see their postings on reels and explore the page. 

  • Information:

How relevant is the content uploaded on Instagram? This highlights the value of content posted by anyone. If people are talking about it, engaging with it, and sharing it on their stories, you are also going to see it on your feed. 

4. Guide to Pinterest algorithm 

Pinterest

Being the 15th most active social media platform, Pinterest had 433 million monthly active users in July 2022.  

Pinterest is popular amongst audiences who are more likely interested in home décor, aesthetics, food, and style inspirations. This platform carries a slightly different purpose of use than the above-mentioned social media platforms. Therefore, the algorithm works with distinct ranking factors for Pinterest.  

Pinterest algorithm promotes pins having: 

  • High-quality images and visually appealing designs  
  • Proper use of keywords in the pin descriptions so that pins come up in search results. 
  • Increased activity on Pinterest and engagement with other users. 

Needless to mention, the algorithm weighs more for the pins that are similar to a user’s past pins and search activities. 

5. Working process behind LinkedIn algorithm  

LinkedIn

There are 849.6 million users with LinkedIn in July 2022. LinkedIn is a platform for professionals. People use it to build their social networks and have the right connections that can help them succeed in their careers.  

To maintain the authenticity and relevance of connections for professionals, the LinkedIn algorithm processes billions of posts per day to keep the platform valuable for its users. LinkedIn’s ranking factors are mainly these: 

  • Spam:

LinkedIn considers post as spam if it contains a lot of links, has multiple grammatical errors, and consists of bad vocabulary. Also, avoid using hashtags like #comment, #like, or #follow can flag the system, too. 

  • Low-quality posts:

There are billions of posts uploaded on LinkedIn every day. The algorithm works to filter out the best for users to engage with. Low-quality posts are not spam but they lack value as compared to other posts. It is evaluated based on the engagement a post receives. 

  • High-quality content:

You wonder what’s the criteria to create high-quality posts on LinkedIn? Here are some tips to remember: 

Easy to read posts 

Encourages responses with a question 

Uses three or fewer hashtags 

Incorporates strong keywords 

Tag responsive people to the post 

Moreover, LinkedIn appreciates consistency in posts, so it’s recommended to keep your followers engaged not only with informative posts but also conversing with users in the comments section.  

6. A sneak peek at the TikTok algorithm 

TikTok

TikTok will have 750 million monthly users worldwide in 2022. In the past couple of years, this social media platform has gained popularity for all the right reasons. The TikTok algorithm is considered as a recommendation system for its users.  

We have found one great explanation of TikTok “For You” page algorithm by the platform itself: 

“A stream of videos curated to your interests, making it easy to find content and creators you love … powered by a recommendation system that delivers content to each user that is likely to be of interest to that particular user.” 

Key ranking factors for the TikTok algorithm are: 

  • User interactions:

This factor is like the Instagram algorithm, but mainly concerns the following actions of users: 

Which accounts do you follow 

Comments you’ve posted 

Videos you’ve reported as inappropriate 

Longer videos you watch all the way to the end (aka video completion rate) 

Content you create on your own account 

Creators you’ve hidden 

Videos you’ve liked or shared on the app 

Videos you’ve added to your favorites 

Videos you’ve marked as “Not Interested” 

Interests you’ve expressed by interacting with organic content and ads 

  • Video information: 

Videos with missing information, incorrect captions, titles, and tags are buried under hundreds of videos being uploaded on TikTok every minute. On the discover tab, your video information signals tend to seek for: 

Captions 

Sounds 

Hashtags* 

Effects 

Trending topics

  • TikTok account settings:

TikTok algorithm optimizes the audience for your video based on the options you selected while creating your account. Some of the device and account settings that decide audience for your videos are: 

Language preference 

Country setting (you may be more likely to see content from people in your own country) 

Type of mobile device 

Categories of interest you selected as a new user 

Social media algorithms relation with content quality 

Apart from all the key ranking factors for each platform, we discussed in this blog, one thing remains ascertain for all i.e., maintain content quality. Every social media platform is algorithm bsed which means it only filters out the best quality content for visitors. 

No matter which platform you focus on growing your business or your social network, it highly relies on the meaningful content you provide your connections.  

If we missed your favorite social media platform, don’t worry, let us know in the comments and we will share its algorithm in the next blog.  

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence