fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

algorithms

Seif Author image
Seif Sekalala
| February 13

Simplify complex modern life with problem-solving tools. Digital tech created an abundance of tools, but a simple set can solve everything.

In last week’s post, DS-Dojo introduced our readers to this blog-series’ three focus areas, namely: 1) software development, 2) project-management, and 3) data science. This week, we continue that metaphorical (learning) journey with a fun fact. Better yet, a riddle. What do ALL jobs have in common?

One can (correctly) argue that essentially, all jobs require the worker in question to accomplish one basic or vital goal: solve (a) problem(s). And indeed, one can earnestly argue that the three interdisciplinary fields of this series (software-development, project-management, and data science) are iconic vis-a-vis their problem-solving characteristics. 

 

Advanced problem-solving tools for a (post-) modern world

One of the paradoxes of our (post-)modern era is this fact: our lives have become so much easier, safer, and much more enjoyable, thanks to digital technology. And yet simultaneously, our lives have gotten so complicated, with an overwhelming glut of technological tools at our disposal. 

 

And I suppose one can view this as a “rich person-problem,” akin to a kid in a candy store, indeed. In any case, here is the good news: “as luck would have it,” we can utilize a simple (set of) tool(s), with which we can both solve problems expansively, and/or simplify our lives as needed. 

 

To the rescue (!): Google, checklists, algorithms and data structures, and project-management

Incidentally, a Google search using search terms related to the topic at hand suggests a consensus vis-a-vis best practices for solving problems, and/or simplifying our lives.

Ultimately, we can use two or three vital tools: 1) [either] a simple checklist, 2) [or,] the interdisciplinary field of project-management, and 3) algorithms and data structures.

 

Here’s a fun question for you, dear reader: can you think of a tool that can simplify both simple and complex tasks such as i) grocery shopping, ii) surgery, and iii) safely flying an airplane? If you answered, “a checklist,” you’re correct. 

 

But for more complicated problems, the interdisciplinary field of project management might be useful–i.e., via the 12 (project-management) elements introduced in last week’s post. To recap, those twelve elements (e.g. as defined by Belinda Goodrich, 2021) are: 

  • Project life cycle, 
  • Integration, 
  • Scope, 
  • Schedule, 
  • Cost, 
  • Quality, 
  • Resources, 
  • Communications, 
  • Risk, 
  • Procurement, 
  • Stakeholders, and 
  • Professional responsibility / ethics. 

 

In addition to the mindful use of the above twelve elements, our Google-search might reveal that various authors suggest some vital algorithms for data science. For instance, in the table below, we juxtapose four authors’ professional opinions with DS-Dojo’s curriculum.

 

What problem-solving tools next digital age has to offer 

Thanks to Moore’s law (e.g., as described via the relevant Wikipedia article about Moore’s law and other factors, the digital age will keep producing hardware and software tools that are both wondrous, and/or overwhelming (e.g., IoT, Web 3.0, metaverse, quantum-computing, etc.).

In this blog post, DS-Dojo provides a potential remedy to our readers vis-a-vis finding easier solutions to our world’s problems, and the avoidance of that “spoilt for choice” dilemma.

By using checklists and tools derived from the three interdisciplinary fields of this blog series, we can solve our world’s ever-growing/evolving problems, and/or simplify our lives as needed.

 

Sample Overview of Data-Science Dojo’s Curriculum:

  • Weeks 1 to 3: Introduction to Quantitative Data-Analysis
  • Weeks 4 to 8: Classification
  • Week 9: Applications of Classification
  • Week 10: Special Topic: Text Analysis Fundamentals
  • Week 11: Unsupervised Learning
  • Weeks 12 and 13: Regression
  • Weeks 14 to 16: More Applications of Previously-Learned Concepts
VS.
Tech-Vidvan’s 

“Top 10”:

  1. Linear Regression
  2. Logistic Regression
  3. Decision Trees
  4. Naive Bayes
  5. K-Nearest Neighbors
  6. Support Vector Machine
  7. K-Means Clustering
  8. Principal Component Analysis
  9. Neural Networks
  10. Random Forests
P. Zheng’s “Guide to Data Structures and Algorithms” Parts 1 and Part 2

1) Big O Notation

2) Search

3) Sort

3)–i)–Quicksort

3)–ii–Mergesort

4) Stack

5) Queue 

6) Array

7) Hash Table

8) Graph

9) Tree (e.g., Decision Tree)

10) Breadth-First Search

11) Depth-First Search

12) Dijkstra’s Algorithm

Disha Ganguli’s Top 10

  1. Linear Regression  
  2. Logistic Regression  
  3. Decision Trees  
  4. ID3 Algorithm  
  5. Cart Algorithm  
  6. Naïve Bayes  
  7. K-nearest neighbors (KNN) 
  8. Support vector machine (SVM) 
  9. K-means clustering 
  10. PCA Algorithm
Data-Quest’s Top 10:

5 Supervised Learning Techniques: 

1) Linear Regression 

2) Logistic Regression

3) CART 

4) Naïve Bayes 

5) KNN

3 Unsupervised Learning Techniques

6) Apriori

7) K-means 

8) PCA

2 Ensembling Techniques

9) Bagging with Random Forests 

10) Boosting with XGBoost.

Data Science Dojo
Ayesha Saleem
| September 13

In today’s blog, we will try to understand the working behind social media algorithms and focus on the top 6 social media platforms. Algorithms are a part of machine learning which has also become a key area to measure success of digital marketing; these are written by coders to learn human actions. It specifies the behavior of data by using a mathematical set of rules 

According to the latest data for 2022, users worldwide spend 147 minutes, on average every day on social media. The use of social media is booming with every passing day. We get hooked up on the content of our interest. But you cannot deny that it is often surprising to experience the content we just discussed with our friends or family.  

Social Media algorithms

Social media algorithms sort posts on a user’s feed based on their interest rather than the publishing time. Every content creator desires to get the maximum impressions on their social media postings or their marketing campaigns. That’s where the need to develop quality content comes in. Social media users only experience the content that the algorithms figure out to be most relevant for them.  

1. Insights into Facebook algorithm 

Facebook

Facebook had 2.934 billion monthly active users in July 2022.  

Anna Stepanov, Head of Facebook App Integrity said “News Feed uses personalized ranking, which considers thousands of unique signals to understand what’s most meaningful to you. Our aim isn’t to keep you scrolling on Facebook for hours on end, but to give you an enjoyable experience that you want to return to.” 

On Facebook, which means that the average reach for an organic post is down over 5 percent while the engagement rate is just 0.25 percent which drops to 0.08 percent if you have over 100k followers. 

Facebook’s algorithm is not static, it has evolved over the years with the objective to keep its users engaged with the platform. In 2022, Facebook adopted the idea of showing stories to users instead of news, like before. So, what we see on Facebook is no longer a newsfeed but “feed” only. 

Further, it works mainly on 3 ranking signals: 

  • Interactivity:

The more you interact with the posts from one of your friends or family members, Facebook is going to show you their activities relatively more on your feed.  

  • Interest:

If you like content about cars or automobiles, there’s a high chance Facebook algorithm will push relevant posts to your feed. This happens because we search, like, interact or spend most of our time seeing the content we like.  

  • Impressions:

Viral or popular content becomes a part of everyone’s Facebook. That’s because the Facebook algorithm promotes content that is in general liked by its users. So, you’re also more likely to see what’s everyone talking about today.  

2. How does YouTube algorithm work 

Youtube

There are 2.1 billion monthly active YouTube users worldwide. When you open YouTube, you see multiple streaming options. YouTube says that in 2022, homepages and suggested videos are usually the top sources of traffic for most channels. 

The broad selection is narrowed on the user homepage on the basis of two main types of ranking signals.  

  • Performance:

When a video is uploaded on YouTube, the algorithm evaluates it on the basis of a few key metrics: 

  • Click-through rate 
  • Average view duration 
  • Average percentage viewed 
  • Likes and dislikes 
  • Viewer surveys 

If a video gains good viewership and engagement by the regular followers of the channel, then the YouTube algorithm will offer that video to more users on YouTube.  

  • Personalization:

The second-ranking signal for YouTube is personalization. In case you love watching DIY videos, YouTube algorithm processes to keep you hooked on the platform by suggesting interesting DIY videos to you.  

Personalization works based on a user’s watch history or the channels you subscribed to lately. It tracks your past behavior and figures out your most preferred streaming options.  

Lastly, you must not forget that YouTube acts as a search engine too. So, what you type in the search bar plays a major role in shortlisting the top videos for you.  

3. Instagram algorithm explained  

Instagram

In July 2022, Instagram reached 1.440 billion users around the world according to the global advertising audience reach numbers.  

The main content on Instagram revolves around posts, stories, and reels. Instagram CEO Adam Mosseri said, “We want to make the most of your time, and we believe that using technology [the Instagram algorithm] to personalize your experience is the best way to do that.” 

Let’s shed some light to the Instagram’s top 3 ranking factors for year 2022: 

  • Interactivity:

Every account holder or influencer on Instagram runs after followers. Because that’s the core to getting your content viewed by the users. To get something on our Instagram feed we need to follow other accounts. As much as our interaction with someone’s content occurs, we will be able to see more of their postings.  

  • Interest:

This ranking factor has more influence on reels feed and explore page. The more you show interest in watching a specific type of content and tap on it, the more of that category will be shown to you. And it’s not essential to follow someone to see their postings on reels and explore the page. 

  • Information:

How relevant is the content uploaded on Instagram? This highlights the value of content posted by anyone. If people are talking about it, engaging with it, and sharing it on their stories, you are also going to see it on your feed. 

4. Guide to Pinterest algorithm 

Pinterest

Being the 15th most active social media platform, Pinterest had 433 million monthly active users in July 2022.  

Pinterest is popular amongst audiences who are more likely interested in home décor, aesthetics, food, and style inspirations. This platform carries a slightly different purpose of use than the above-mentioned social media platforms. Therefore, the algorithm works with distinct ranking factors for Pinterest.  

Pinterest algorithm promotes pins having: 

  • High-quality images and visually appealing designs  
  • Proper use of keywords in the pin descriptions so that pins come up in search results. 
  • Increased activity on Pinterest and engagement with other users. 

Needless to mention, the algorithm weighs more for the pins that are similar to a user’s past pins and search activities. 

5. Working process behind LinkedIn algorithm  

LinkedIn

There are 849.6 million users with LinkedIn in July 2022. LinkedIn is a platform for professionals. People use it to build their social networks and have the right connections that can help them succeed in their careers.  

To maintain the authenticity and relevance of connections for professionals, the LinkedIn algorithm processes billions of posts per day to keep the platform valuable for its users. LinkedIn’s ranking factors are mainly these: 

  • Spam:

LinkedIn considers post as spam if it contains a lot of links, has multiple grammatical errors, and consists of bad vocabulary. Also, avoid using hashtags like #comment, #like, or #follow can flag the system, too. 

  • Low-quality posts:

There are billions of posts uploaded on LinkedIn every day. The algorithm works to filter out the best for users to engage with. Low-quality posts are not spam but they lack value as compared to other posts. It is evaluated based on the engagement a post receives. 

  • High-quality content:

You wonder what’s the criteria to create high-quality posts on LinkedIn? Here are some tips to remember: 

Easy to read posts 

Encourages responses with a question 

Uses three or fewer hashtags 

Incorporates strong keywords 

Tag responsive people to the post 

Moreover, LinkedIn appreciates consistency in posts, so it’s recommended to keep your followers engaged not only with informative posts but also conversing with users in the comments section.  

6. A sneak peek at the TikTok algorithm 

TikTok

TikTok will have 750 million monthly users worldwide in 2022. In the past couple of years, this social media platform has gained popularity for all the right reasons. The TikTok algorithm is considered as a recommendation system for its users.  

We have found one great explanation of TikTok “For You” page algorithm by the platform itself: 

“A stream of videos curated to your interests, making it easy to find content and creators you love … powered by a recommendation system that delivers content to each user that is likely to be of interest to that particular user.” 

Key ranking factors for the TikTok algorithm are: 

  • User interactions:

This factor is like the Instagram algorithm, but mainly concerns the following actions of users: 

Which accounts do you follow 

Comments you’ve posted 

Videos you’ve reported as inappropriate 

Longer videos you watch all the way to the end (aka video completion rate) 

Content you create on your own account 

Creators you’ve hidden 

Videos you’ve liked or shared on the app 

Videos you’ve added to your favorites 

Videos you’ve marked as “Not Interested” 

Interests you’ve expressed by interacting with organic content and ads 

  • Video information: 

Videos with missing information, incorrect captions, titles, and tags are buried under hundreds of videos being uploaded on TikTok every minute. On the discover tab, your video information signals tend to seek for: 

Captions 

Sounds 

Hashtags* 

Effects 

Trending topics

  • TikTok account settings:

TikTok algorithm optimizes the audience for your video based on the options you selected while creating your account. Some of the device and account settings that decide audience for your videos are: 

Language preference 

Country setting (you may be more likely to see content from people in your own country) 

Type of mobile device 

Categories of interest you selected as a new user 

Social media algorithms relation with content quality 

Apart from all the key ranking factors for each platform, we discussed in this blog, one thing remains ascertain for all i.e., maintain content quality. Every social media platform is algorithm bsed which means it only filters out the best quality content for visitors. 

No matter which platform you focus on growing your business or your social network, it highly relies on the meaningful content you provide your connections.  

If we missed your favorite social media platform, don’t worry, let us know in the comments and we will share its algorithm in the next blog.  

Data Science Dojo
Angela Baltes
| March 17

Angela Baltes completed Data Science Dojo’s bootcamp program at the University of New Mexico. Here’s her reflection on the course.

The opportunity to participate in Data Science Dojo’s: A Hands-on Introduction to Data Science bootcamp was a simple decision, as I have been a consumer of bootcamps for several years and have found my success varies with them.

In my prior self-paced learning, I found that there were concepts that I simply did not understand well, or perhaps were not explicitly stated in whatever course I was taking.

I wanted to experience an immersive in-person bootcamp with the hopes that practical examples and in-person interactions would be helpful in understanding and retaining the material. Not to mention, I was able to network with others who are interested in this field.

Data Science  is taught by Raja Iqbal, CEO and Chief Data Scientist. He is a talented presenter, and I appreciated his style of teaching the material. He was accompanied by Arham Akheel, who assisted Raja in helping students and provided us with machine learning demonstrations.

This combination was very complimentary to one another and worked well. Please check out Data Science Dojo’s website and check their schedule for they may be coming to a city near you!

5-day Data Science Bootcamp

The bootcamp was offered in Albuquerque, New Mexico for 3 days instead of the prior 5-day bootcamp. From what I understand, we were the first cohort to try this format.

Day 1

On this first day, we spent some time looking into data exploration, and how to approach data problems. We discussed things as a group, and I enjoyed the energy from class.

We discussed that a model is only as good as the data provided to it-garbage in, garbage out. Data is the new oil and is the most valuable asset a company can have, however, we, as data scientists, need to tap into that resource by refining it and getting the most value from it.

One thing that I have personally struggled with is that this course was extremely helpful for learning how to ask the right questions and evaluate business impact. It is our job to ask questions.

Many times, in the past, I was given a task, and I simply began to hammer away without questions asked. In data science, feature engineering and data exploration are the most important tasks, as these activities help to further define and evaluate if this is a worthy endeavor for a company.

Day 2

On this day, we began to delve into machine learning algorithms, more specifically, supervised learning. I found this valuable, as I myself, have the most experience with and understanding of supervised learning. We stressed again before building a model to ask, “What is the intended use of this model?”  as that would be pertinent information in determining what features and format to provide the model to the stakeholders’ that will use it.

We analyzed the Titanic dataset in detail and discussed what features to include in our decision tree model. We also discussed entropy, stopping criteria, and splitting. Our homework assignment was to submit our Titanic model to our leaderboard. I did not place very high, lol.

Day 3

On the last day of the bootcamp, we discussed the pitfalls in machine learning, such as overfitting and underfitting, and understood the bias/variance tradeoff. I have read about this topic to the point of nausea in other settings, but this truly helped me to understand it. Seeing practical examples helped me put this in context.

What was interesting and new to me was discussing how to properly evaluate a model, as it is not always about the accuracy-sometimes (depending upon the problem and domain), it is about the precision or recall! We then spent a great deal of time on hyperparameter tuning, and then how to deploy our machine learning model as a web service, which was way too cool.

What I’ve Learned

I did not completely understand how to tune hyperparameters and how to properly evaluate the performance of a model before the bootcamp. Now I understand why this is necessary and how to carry out this task. We bridged the gap between data science and business value in this course, and that was the foundation going forward.

What I learned is that it is not always about the accuracy of a model, and to align the business needs with precision or recall depending upon the domain and problem one is looking to solve.

I have learned why it is important for the data scientist to ask questions, and not just questions in general, but the right questions, and how the most important tasks before building a model are data exploration, data discovery, and feature engineering. We need to understand the business impact and how this model will add value.

For me, this was paramount. Too many times do we focus on wanting cool models to say we are involved in machine learning rather than focusing on the business need.

I have learned how to use Microsoft tools to build and deploy a model as a web service. I found the ease and simplicity of this to be amazing and something I would like to continue to explore.

The Pros

  • The in-person class setting was helpful in order to understand and connect to the topics at hand. For those who have taken online bootcamps with varying success, you may also appreciate being able to interact with the instructor and other students.
  • The breadth of material covered was impressive. I appreciated that we covered the most important topics in machine learning and addressed common mistakes. We dedicated some of the day to hyperparameter tuning when a model is not performing optimally.
  • We addressed the proper mindset to have for data analysis. How to ask the right questions, and not be afraid to ask questions!
  • Raja and Arham have great chemistry as team members and are fantastic instructors.

The Cons

  • The condensed format was rather overwhelming. This material isn’t truly suited for a 3-day setting. We really only scratched the surface. This cannot be truly helped, but it was worth mentioning.
  • This course is not for those who are new to programming and/or data science. Although we did use Microsoft Azure for machine learning, there is an assumption that the student has some familiarity with programming and data science concepts. You will likely get more out of this course if you have some prior knowledge.

Conclusion

I highly recommend this bootcamp for those who would like to increase their knowledge in data science. This experience was valuable for me so that I can bridge the gap between theory and implementation. From this point on, more learning will be required, but this gave me the boost in the right direction. Cheers!

cheers
Cheers to Data Science Dojo

This review was originally published on Angela Baltes’ personal blog.

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence