Kaggle

Arham Noman
| September 28, 2022

For a 21st-century professional, having proven analytical skills is increasingly important. Companies all over the world have started to push data scientists to participate in leading data science competitions. Businesses now emphasize all their employees gain analytical skillsets, regardless of their department.

One of the best ways to prove that you have a strong grip on analytics/ data science skills is to take part in reputable competitions that test these to show your employer that you have the required skill set.  

There are many events these days for data science professionals, so it can get overwhelming trying to figure out which ones are worth your time. If you are not sure where to begin, or which ones to take part in, here are a few notable ones to help you get started. 

 

Data Science Competitions
Participating in data science competitions – Data Science Dojo

 

1. Kaggle 

Kaggle is the most popular platform for practicing data science skills. It hosts multiple popular datasets, and regularly has competitions where anyone can participate to build the best machine learning models with data set and compete against others working on the same dataset.

You can learn more about Kaggle competitions on our blog here: Insightful Kaggle competitions and data science portfolios | Data Science Dojo 

 

 Read more about Kaggle Competitions in this blog by

 

2. IBM Call for Code 

The IBM call for code competition asks for contributions across several different areas in order to solve real world challenges. There are currently 4 areas in 2022 where you can get involved and build solutions:

The Global Challenge, open source projects, racial justice, and deployments. You can find out more on the call for code page here: Call for Code | Tech for Good | IBM Developer  

 

3. Machine Hack: 

Machine hack is a community that hosts competitions or hackathons for data science and AI enthusiasts. There are a wide variety of challenges available from the data science pipeline, from machine learning to data visualization. You can also win cash prizes for some of the challenges. 

 

4. DataCamp: 

DataCamp has weekly competitions on their website. Each event has a cash prize associated with it as well. You can submit your solutions, and vote on the best solutions from other participants as well 

 

5. DrivenData: 

DrivenData provides a platform for data scientists who want to make a social impact with their work. The challenges on the platform focus on solving social issues through data science.

These challenges include things like predicting public health risks at restaurants, identifying endangered species in images, and matching students to schools where they are likely to succeed. The winning code gets a prize, and gets published under an open-source license for others to benefit as well 

 

Are you excited to participate in data science competitions?

All of the above-mentioned data science events allow you to gain hands-on learning of data science skills. It offers a platform to the learner for improving problem-solving skills and proving their abilities in the competitive market.

Not only does participating in these competitions helps you stand out, but these also let you brainstorm innovative ideas for the future.

Nathan Piccini
| January 23, 2019

What’s a Kaggle Competition? I didn’t know, so I looked it up. Get started by reading what I learned, and find an active list of competitions. 

First of all, what’s Kaggle?

Until a few months ago I didn’t know the answer to that question. If you don’t either that’s okay, we’re going to answer it together. But first, you need to know a little background information about this data science network.

Kaggle was founded in 2010 with the idea that data scientists need a place to come together and collaborate on projects. This has transformed into a network with more than 1,000,000 registered users, and has created a safe place for data science learning, sharing, and competition.

Using the human competitive spirit, Kaggle created a platform for organizations to host competitions that have fueled new methodology and techniques in data science, and given organizations new insights from the data they provided.

Read more:

Kick-off with Kaggle competitions to learn data science skills

Being the competitive person I am, the competition aspect is what originally caught my eye, and gave me the desire to learn about the intricacies of a Kaggle Competition.

How Kaggle competition works

While combing through the Kaggle website and other informative articles, I found there are three basic steps in Kaggle Competitions.

  1. Preparation: Each Kaggle competition has a host, and each host has to prepare and provide data. When providing data, the host has the opportunity to give additional information such as a description, evaluation method, timeline, and prize for winning.

pubg kaggle competition description

      2. Experimentation: At this time, you’ve had your morning coffee, you’ve read all the information in the overview 500 times, and you’re ready to win 1st place. Now is the time to experiment, submit, and learn. There are three ways to upload your work:

  • Kaggle Kernels
  • Manual Uploads
  • Kaggle API

If you don’t want anyone to really know what you’re doing, you should upload your experiments manually or by using the Kaggle API. Kaggle Kernels are a way for competitors to share what they’ve accomplished and get feedback from their peers. Kernels will give you ideas as to how to conquer the data, and I suggest you go through some of the popular ones.

kaggle kernels from pubg competitions

 

  1. Results: In every Kaggle competition, there are public and private leaderboards. Be warned, the leaderboards are VERY different. The public leaderboard is based on a small percentage of the test data decided by the host. Although it gives you a good idea, it does not always reflect who will win and lose. The private leaderboard is what really matters. Not calculated until the end of the competition, this leaderboard is based on a larger proportion of data and, ultimately, decides the winners and losers.

public leaderboard for kaggle

If you would like to dive deep into the different types or formats and datasets offered by Kaggle, take a look at Kaggle’s Help and Documentation.

Active Kaggle competitions

[Updated May 6, 2019]

Kaggle competitions have a limited amount of time you can enter your experiments. This list does not represent the amount of time left to enter or the level of difficulty associated with posted datasets. One way to determine the level of difficulty is to look at the prize. Typically, the larger the prize, the more difficult/advanced the problem is. You can also look at the type of competition. You can find the four categories and Kaggle’s description of them below.

  1. Featured: “These are full-scale machine learning challenges which pose difficult, generally commercially-purposed prediction problems.”
  2. Research: “Research competitions feature problems which are more experimental than featured competition problems.”
  3. Getting Started: “These are semi-permanent competitions that are meant to be used by new users just getting their foot in the door
    in the field of machine learning.”
  4. Playground: “These are competitions which often provide relatively simple machine learning tasks, and are similarly targeted at newcomers or Kagglers interested in practicing
    a new type of problem in a lower-stakes setting.”

I will try my best to keep this list as up-to-date as possible. Unfortunately, I’m not spending all my time on Kaggle’s website. So if you see something has ended, or a new competition has been added, please leave a comment below. Thanks and have fun!

Know more about Kaggle competitions

Rahim Rasool
| July 1, 2019

Kaggle Days Dubai is an event to improve your data science skillset. Here’s what you can expect to learn from the grandmasters.

Anyone interested in analytics or machine learning would certainly be aware of Kaggle. Kaggle is the world’s largest community of data scientists and offers companies to host prize money competitions for data scientists around the world to compete in. This has made it the largest online competition platform too. However, Kaggle has started to evolve itself to organize offline meetups globally.

One such initiative is the organization of Kaggle Days. Up till now, four Kaggle Days events have been organized in various cities around the world, the recent one being in Dubai. The format of Kaggle Days involves a 2-day session consisting of presentations, practical workshops, and brainstorming sessions during the first day followed by an offline competition the next day.

For a machine learning enthusiast with intermediate experience in this field, participating in a Kaggle hosted competition and teaming up with a Kaggle Grandmaster to compete against other grandmasters was an enjoyable experience on its own for me. I couldn’t reach the top ranks in the competition, but competing with and networking with the dozens of grandmasters and other enthusiasts present during the 2-day event boosted my learning and abilities.

My desire was to make the best use of this opportunity, learn to the utmost extent I could, and ask the right questions from the grandmasters present at the event to get the best out of their wisdom and learn the optimal ways to approach any data science problem. It was heart-whelming to discover how supportive they were as they shared tricks and advice to get to the top position in data science competitions, and improve the performance of any Machine Learning project. In this blog, I’d like to share the insights that I gathered during my conversations, and the noteworthy points I recorded during their presentations.

Strengthen your basic knowledge of Kaggle

My primary mentor during the offline competition was Yauhen Babakhin. Yauhen is a data scientist at H2O.ai and has worked on a range of domains including e-commerce, gaming, and banking, specializing in NLP related problems. An inspiring personality and one of the youngest Kaggle Grandmasters. Fortunately, I got the opportunity to network with him the most. His profile deluded my misconception that only someone with a doctoral degree can achieve the prestige of being a Grandmaster.

During our conversations, the most significant advice that came from Yauhen was to strengthen our basic knowledge and have an intuition about various machine learning concepts and algorithms. One does not need to go extensively deep into these concepts or does not need to be extra knowledgeable to begin with. As he said, “start learning a few important learning models but get to know how they work!” It will be ideal to start with the basics and extend your knowledge along the way by building experience through competitions, especially the ones hosted on Kaggle. For most of the queries, Yauhen suggests, one must know what to search on Google. This alone will prove to be an extremely handy tool on its own to get us through most of the problems despite having limited experience relative to our competitors.

Day-2-Kaggle-310--22-
Kaggle competition day 2

 

Furthermore, Yauhen emphasized on how Kaggle single-handedly played a leading role in heightening his skills. Throughout this period, he stressed on how challenges triggered him to perform better and learn more. It was such challenges that provoked him to learn beyond his current knowledge and explore areas beyond his specialization such as computer vision, said the winner of the $100,000 TGS Salt identification challenge. It was these challenges that prompted him to dive into various areas of machine learning and it was this trick that he suggested us to use to accelerate career growth.

Through this conversation, I was able to learn the importance of going broad. Though Yauhen insisted about selecting problems that target a broad range of problems and cover various aspects of Data Science, he also suggested to limit it to the extent that it should align with our career pursuits and realize even if we even need to target something beyond what we are ever going to use. Lastly, the Grandmaster in his late 20’s also wanted us to practice with deep learning models as it’ll allow us to target a broad set of problems and to discover the best approaches used by previous winners and to combine them in our projects or competition submissions. These approaches could be found in blogs, kernels and forum discussions.

Remain persistent

My next detailed interaction was with Abhishek Thakur. The conversation provoked me to ask as many questions as I could, as every suggestion given by Abhishek seemed wise and encouraging. One of the rare examples of someone crowned with 2 Kaggle Grandmaster titles, competition and discussion grandmaster, Abhishek is the chief data scientist at boost.ai, once attaining the 3rd rank in global competitions at Kaggle. What made his profile more convincing was Abhishek’s accelerated growth from a novice to a grandmaster within a time period of a year and a half. He started his career in machine learning from scratch and took this initiative from Kaggle itself. Initially starting off with the lowest rank in competitions, Abhishek was adamant that Kaggle could be the only platform one can totally rely on to catapult his growth within such a short period of time.

Day-1-Kaggle-292--17-
Abhishek speaking at Kaggle

 

However, as Abhishek repeatedly said, it all required continuous persistence. From the beginning till now, even after being placed in the bottom ranks initially, Abhishek carried on and demonstrated how persistence was the key to his success. Upon inquiring about the significant tools that led him to get golds in his recent participation, Thakur emphasized immensely on feature engineering. He insisted on how this step was the most important from all in distinguishing the winner. Similarly, he suggested that a thorough exploratory data analysis can assist one to find those magical features that can enable one to get the winning results.

Like other Grandmasters who have attained massive success in this domain, Abhishek also laid emphasis on improving one’s personal profile through Kaggle. Not only does it offer you a distinct and fast-paced learning experience, as it did for all the grandmasters at the event, but it’s also recognized across various industries and major employees who value these rankings. Abhishek told how it enabled him to get numerous lucrative job offers over time.

Start instantly with competitions

During the first day, I was able to attend Pavel Pleskov’s workshop on ‘Building The Ultimate Binary Classification Pipeline’. Based in Russia, Pavel currently works for an NLP startup, PointAPI, and was once ranked at number 2 among Kagglers globally. The workshop was fantastic, but the conversations during and after the workshop intrigued me the most as they mostly comprised of tips for beginners.

Someone who quit his profitable business to compete on Kaggle, Pavel insisted on the ‘do what you love’ strategy as it leads to more life satisfaction and profit. Pavel told us how he started off with some of the most popular online courses on machine learning but found them lacking practical skills and homeworks which he covered using Kaggle. For beginners, he strongly recommended not to put off Kaggle contests or wait until the completion of courses, but to start instantly. According to him, practical experience on Kaggle is more important than any other course assignment.

Some other noteworthy and touching tips from Pavel were that in order to win such competitions, unlike many students who approach Kaggle as an academic problem and start creating fancy architectures and ultimately do not score well, Pavel approaches a problem with a business mindset. He increased the probability of success by leveraging resources, such as including people in his team who had resources, like a GPU, or merging his team with another to improve the overall score.

Day-2-Kaggle-1--39-
Kaggle competition day 2

Upon an inquiry related to keeping the right balance between taking out time to build theoretical knowledge and using that time to generate new ideas, Pavel advised looking at forum threads on Kaggle. They can help you know how much theoretical knowledge you are missing while competing with others. Pavel is an avid user of LightGBM and CatBoost models, which he claims has given him superior rankings during the competitions. One of his suggestions is to use fast.ai library that, despite receiving many critical reviews, has been a flexible and useful library which he mostly keeps in consideration.

Hunt for ideas and rework them

Due to the limitation of time during the 2 days event, I was able to hear less from another young grandmaster from Russia, coincidentally sharing the same first name with his fellow Russian grandmaster, Pavel Ostyakov. Remarkably, Pavel was still an undergrad student then, and has been working for Yandex and Samsung AI for past couple of years.

Day-2-Kaggle-1--35--2

He brought a distinct set of advice that can prove to be extremely resourceful when one is targeting gold in competitions. He emphasized on writing clean code that could be used in the future and allows easy collaboration with other teammates, a practice usually overlooked which later becomes troubling for participants. He also insisted on trying to read as many forums on Kaggle as one can. Not just ones related to the same competition but those belonging from other competitions as well since most of them our similar. Apart from searching for workable solutions, Pavel suggested also looking for ideas that failed. As he recommended, one must try using (and reworking) those failed ideas as there are chances they may work.

Pavel also brought up the point that in order to surpass other competitors, reading research papers and implementing their solutions could increase your chances of success. However, during all this time he stressed a lot on to have a mindset that anyone can achieve gold in a competition, even if he/she possesses limited experience relative to others.

Experiment with diverse strategies

Other noteworthy tips and ideas that I collected while mingling with grandmasters and attending their presentations included those from Gilberto Titericz (Giba), the grandmaster from Brazil with 45 Gold medals! While personally inquiring Giba, he repeatedly used the key-word ‘experiment’ and insisted that it is always important to experiment with new strategies, methods and parameters. This is one simple, although tedious, way to learn quickly and get great results.

Day-3-Kaggle-1--35--2
Training session of Kaggle

Giba also proposed, that to attain top performance, one must build models using different viewpoints of the data. This diversity can come from feature engineering, using varying training algorithms or using different transformations. Therefore, one must explore all possibilities. Furthermore, Giba suggested that fitting a model using default hyperparameters is good enough to start a competition and build a benchmark score to improve further. Regarding teaming up, he repeated that diversity is the key here as well and choosing someone who thinks similar to you is not a good move.

A great piece of advice that came from Giba was to blend models. Combining models can help improve the performance of the final solution, especially if each model’s predictions has low correlation. A blend can be something simple as a weighted average. For instance, non-linear models like Gradient Boosting Machines blend very well with neural network based models.

Blending Models
Blending models suggested by Giba

Conclusion

Considering the key-takeaways from the suggestions given by these grandmasters and observing the way they competed during the offline competition, I noted that beginners in data science must use their effort to try varying methodologies as much as they can.  Moreover, a summary of the recommendations given above stress the significance of taking part in online competitions no matter how much knowledge or experience one possesses.

I also noted that most of the experienced data scientists were fond of using ensemble techniques and one of the most prominent methods used by them was the creation of new features out of the existing ones. In fact, this is what was cited by the winners of the offline competition as their strategy for success. Conclusively, these sorts of meetups could enable one to interact with the top minds in the field and gain the maximum within a short period of time as I fortunately did.

Data Science Dojo
| October 21, 2019

In 2019, Data Science Dojo sponsored Kaggle Days taking place from December 11 to 12.

Kaggle Days will give Data Science Dojo a platform to continue giving back to the data science community.

kaggle days tokyo social announcement
Kaggle Days Tokyo Registrations Open (Source)

Kaggle Days is a conference created by Kaggle and LogicAI for Kagglers to meet offline. It’s the “first global series of offline events for seasoned data scientists and Kagglers” as written on the Kaggle Days website.

These days take place all over the world, including current and past events in ChinaDubaiSan FranciscoParisTokyo, and Warsaw. Attendees meet Grandmasters, win prizes, and compete in offline events.

They also have the opportunity to learn from seasoned professionals who are there to help grow the community.

Raja Iqbal, Chief Data Scientist and CEO at Data Science Dojo, is one of the seasoned professionals looking to help the community grow.

“We have 10 Meetup groups spread across the globe, but we’ve never been as far east as Tokyo. The closest we get is Singapore.” Raja said while counting on his fingers. “I just can’t wait to meet more people from a different part of the world who are excited to learn data science.”

What to expect at Kaggle days Tokyo

In Tokyo, attendees can expect to network, learn, compete, and earn prizes, like in many of the conferences. Kaggle Grandmasters, Masters, and data science experts will be in attendance to give presentations, talk shop, and network with everyone in attendance.

Data Science Dojo will be there to give a 90-minute workshop as well as network, hire, and learn from top Kagglers. The topic of DSD’s workshop has been narrowed down to two possibilities:

  • The Art of Building Machine Learning Models for Large Scale Machine Learning
  • Feature Engineering for Real-World Machine Learning Problems

Kaggle CTO, Ben Hamner, is the Keynote Speaker giving a talk titled Leveling-up Kaggle Competitions. Other talks from presenters include:

  • Computer Vision with Keras – Dimitris Katsios, ML Engineer at LPIXEL
  • Joining NN Competitions (for beginners) – Tomohiro Takesako, Competitions Master
  • My Journey to Grandmaster – Jin Zhan,  Competitions Grandmaster
  • Intro to BigQuery ML for Kagglers – Polong Lin, Developer Advocate at Google

Two Kaggle Competitions team members will also be giving talks. Julia Elliott (Competitions Team Lead) and Walter Reade (Data Scientist).

Presentations are tentative and subject to change. This will be updated when the full agenda has been announced.

About Data Science Dojo

Data Science Dojo offers a 5 day, in-person, and top-rated Data Science Bootcamp around the world. During the course, students learn everything from predictive analytics and ensemble methods to recommender systems and the fundamentals of big data engineering.

Raja and his team of instructors have trained more than 4,000 individuals from nearly 1,000 different companies. Attendees come from diverse backgrounds, including software development, management consulting, medicine, education, project management, target=”_blank” public service, finance, not-for-profit, mining, oil and gas, and more.

Helpful links

Data Science Dojo meetup groups

Nathan Piccini
| December 27, 2019

Data Science Dojo sponsored Kaggle Days Tokyo. Here’s an overview of what Kaggle Days are and what to expect.

Overview

Kaggle is an online learning platform for data science and machine learning. The educator uses competitions to help its users (called Kaggelers) practice and grow their data science skillset with publicly available datasets.

Kaggle Days are events that take place around the world. They started as a partnership between LogicAI and Kaggle as a way to bring Kaggelers together for an offline event. Competitions, seminars, workshops, and networking opportunities are available for Kaggelers to participate in. These events take place as one-off local events (Meetups) as well as multiday global events (conferences).

Kaggle days Tokyo – Agenda

The global event in Tokyo is taking place this December 11-12. Registration closed within a matter of days of opening, which shows the amount of popularity these events have among their participants. The agenda is jam-packed with exciting talks and tutorials from Kaggle Grandmasters and data science professionals, and I’d like to highlight a few.

kaggle days tokyo brochure
Kaggle Days Tokyo – Schedule

Raja Iqbal – Tutorial on model validation and parameter tuning

Raja Iqbal is the CEO, Chief Data Scientist, and Lead Instructor at Data Science Dojo. He has an MS from Stanford and Ph.D. from Tulane University. He spent more than 6 years at Microsoft Bing and Bing Ads working on various data science and machine learning research projects. Below is a description, given by Raja, of his workshop:

“Cross-validation is a popular technique for model validation and parameter tuning. In this tutorial, we will discuss other model validation and parameter techniques in scenarios where k-fold cross validation may not be the best choice. We will also discuss some parametric and non-parametric statistical tests for comparing models.”

Why should you attend? 

Modern machine learning is about gathering the right data, feature engineering, validation, and parameter tuning. Not understanding the concepts or using the techniques correctly renders machine learning useless.

Date: 12/11/19

Time: 10:15 am – 11:45 am

Location: 27F Hanabi Room

Jin Zhan – My journey to grandmaster: Success and failure

Becoming a Kaggle Grandmaster (GM) is no small accomplishment. It takes years of practice to obtain this impressive title. Jin Zhan has multiple years of experience in data science and machine learning, as well as Hadoop. Currently, Zhan is a Data Scientist at Fast Retailing, where he focuses on demand forecasting, recommender systems, and customer comment analysis.

Why should you attend?

The original reason I chose this out of the bunch was that Jin is going to talk about his failures before becoming a grandmaster. Talking about our failures is often difficult, but we can learn more from them than our success. After (admittedly) combing through his LinkedIn profile, I found Zhan to be the perfect picture of success on Kaggle. His experience doesn’t come from one place and his education comes from multiple sources. Besides, Zhan’s a Grandmaster. What other reason to attend do you need?

Date: 12/11/19

Time: 4:35 pm – 4:45 pm

Location: 27F Matsuri Room

Kaggle competition

During a Kaggle competition, typically the only help or mentoring you receive is from your teammates or through Kaggle Kernels. At the competition in Tokyo (as well as the other global events) mentors will be available to help you along the way.

The mentors for the competition in Tokyo include:

  • Ryuji Sakata – Kaggle Grandmaster and Data Scientist/Researcher at Panasonic Corporation
  • Walter Reade – Data Scientist on Kaggle Competitions Team
  • Dimitry Gordeev – Kaggle Grandmaster and Data Scientist at UNIQA
  • Pawel Jankiewicz – Kaggle Grandmaster and Owner/Founder at LogicAI
  • Jin Zhan – Kaggle Grandmaster and Data Scientist at Fast Retailing

You should feel compelled to pick their brains as much as you can. All of these people are successful and established data scientists with extensive knowledge of Kaggle competitions. Get as much out of them as you can.

Date: 12/12/19

Time: 10:30 am – 6:30 pm

If you’re a Kaggeler who missed out on joining a global Kaggle Days event, keep an eye on their schedule. You can also join a local event and get to know your local Kaggelers!

Related Topics

Web Development
Top
Statistics
Software Testing
Programming Language
Podcasts
Natural Language
Machine Learning
Hypothesis Testing
High-Tech
Events
Discussions
Demos
Data Visualization
Data Security
Data Science
Data Mining
Data Engineering
Data Analytics
Conferences

Up for a Weekly Dose of Data Science?

Subscribe to our weekly newsletter & stay up-to-date with current data science news, blogs, and resources.