fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Kaggle competition

Data Science Dojo
Nathan Piccini
| January 23

What are Kaggle Competitions? I didn’t know, so I looked it up. Get started by reading what I learned and find an active list of Kaggle competitions. 

First of all, What’s Kaggle?

Until a few months ago I didn’t know the answer to that question. If you don’t either that’s okay, we’re going to answer it together. But first, you need to know a little background information about this data science network.

Kaggle was founded in 2010 with the idea that data scientists need a place to come together and collaborate on projects. This has transformed into a network with more than 1,000,000 registered users and has created a safe place for data science learning, sharing, and competition.

Using the human competitive spirit, Kaggle created a platform for organizations to host data science competitions that have fueled new methodologies and techniques in data science and given organizations new insights from the data they provided.

Being the competitive person I am, the competition aspect is what originally caught my eye, and gave me the desire to learn about the intricacies of a Kaggle Competition.

How Kaggle works

While combing through the Kaggle website and other informative articles, I found there are three basic steps in Kaggle Competitions.

  1. Preparation: Each Kaggle competition has a host, and each host has to prepare and provide data. When providing data, the host has the opportunity to give additional information such as a description, evaluation method, timeline, and prize for winning.
pubg kaggle competition description
Preparation of a Kaggle competition with the details
  1. Experimentation: At this time, you’ve had your morning coffee, you’ve read all the information in the overview 500 times, and you’re ready to win 1st place. Now is the time to experiment, submit, and learn. There are three ways to upload your work:
    • Kaggle Kernels
    • Manual Uploads
    • Kaggle API

    If you don’t want anyone to really know what you’re doing, you should upload your experiments manually or by using the Kaggle API. Kaggle Kernels are a way for competitors to share what they’ve accomplished and get feedback from their peers. Kernels will give you ideas as to how to conquer the data, and I suggest you go through some of the popular ones.

    Kaggle kernels from pubg competition

  2. Results: In every Kaggle competition, there are public and private leaderboards. Be warned, the leaderboards are VERY different. The public leaderboard is based on a small percentage of the test data decided by the host. Although it gives you a good idea, it does not always reflect who will win and lose.

The private leaderboard is what really matters. Not calculated until the end of the competition, this leaderboard is based on a larger proportion of data and, ultimately, decides the winners and losers.

Private leaderboard - Kaggle competitions

Public leaderboard - Kaggle competitions

If you would like to dive deep into the different types or formats and datasets offered by Kaggle, take a look at Kaggle’s Help and Documentation.

Active Kaggle competitions

[Updated May 6, 2019]

Kaggle competitions have a limited amount of time you can enter your experiments. This list does not represent the amount of time left to enter or the level of difficulty associated with posted datasets. One way to determine the level of difficulty is to look at the prize.

Typically, the larger the prize, the more difficult/advanced the problem is. You can also look at the type of competition. You can find the four categories and Kaggle’s description of them below.

  1. Featured: “These are full-scale machine learning challenges which pose difficult, generally commercially-purposed prediction problems.”
  2. Research: “Research competitions feature problems which are more experimental than featured competition problems.”
  3. Getting Started: “These are semi-permanent competitions that are meant to be used by new users just getting their foot in the door
    in the field of machine learning.”
  4. Playground: “These are competitions which often provide relatively simple machine learning tasks, and are similarly targeted at newcomers or Kagglers interested in practicing
    a new type of problem in a lower-stakes setting.”

I will try my best to keep this list as up-to-date as possible. Unfortunately, I’m not spending all my time on Kaggle’s website. So if you see something has ended, or a new competition has been added, please leave a comment below. Thanks and have fun!

Learn more about Kaggle

Data Science Dojo
Phuc Duong
| November 17

Data Science Portfolios can be built through Kaggle Competitions. Kaggle has positioned itself as the premier platform to learn data science.

Kaggle is a crowd-sourcing platform in which companies post their real-world data science problems in an effort to solve their problems. Kaggle will host Kaggle competitions to the public and rank the participants against one another. When large companies need help with their data science challenges, they turn to Kaggle and its community for help.

The famous $1 million dollar Netflix challenge was originally a Kaggle competition. Other companies who are leveraging the Kaggle competitions to address business solutions are:

AllstateBoshState FarmRed HatFacebookExpediaHome DepotYelpAirbnbWalmart, and Liberty Mutual.

Kaggle_competition_ data science

The future of hiring in data science

To be hired as an artist or an architect, you need to present a portfolio which showcases years of your work. Programmers need a Github/StackOverflow account to showcase their contributions. Similarly, Kaggle is positioning itself in the same way for any data vocation. The litmus test to be hired as a data scientist gets another hoop.

Kaggle has spent the last half decade positioning itself as the premier platform for hiring, recruiting, and screening for data science talent. Companies and recruiters get a transparent record of your performance, a paper trail of your successes and failures, and most importantly, your growth–all of which is tracked in a quantifiable manner.

Employers want to see whether a candidate has been battle-tested in data science through these competitions.

To you as an individual, this is a chance to participate and compete in Kaggle against others like you from around the world, prove to employers and recruiters that you have some hands-on experiments under your belt, and that you are worthy of the title of data scientist.

How do I compete? Where do I start?

The best way is just to dive right in. Be warned that you are competing against people from all over the world. Your first submission scores will be demoralizing. However, do not give up as each failed attempt will shape you into a better data scientist. To fail in Kaggle competitions is not only a good thing, but is desired. Employers want to see your growth and better yet, your potential to overcome your challenges.

  1. Create an account
  2. Pick a competition
  3. Understand the data science problem
  4. Build models and continually tune to improve

We have posted a few tutorial videos below to get you started.

How do you submit to Kaggle?

 

How do you build an initial model for a Kaggle competition in R?

 

How to improve your model further by building a predictive model to predict missing values in R.

 

Competing in Kaggle using Azure Machine Learning Studio

 

Kaggle and Data Science Dojo

Data Science Dojo hosts a Virtual Data Science and Data Engineering bootcamp to expose people to the entire breadth of data science. Bootcamp attendees are required to participate in a Kaggle capstone project that spans the length of the course.

The end goal of the project is to take what you’ve learned each day and apply it to your model. Each attendee takes what they learned during the day and applies it to their Kaggle models. At the end of the Bootcamp, the top 3 performers receive a prize.

 

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence