fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

data science skills

What are Data Scientists? What job skills should they possess? Learn about the essential data scientist skills and their roles.

How do we distinguish a genuine data scientist from a dressed-up business analyst, BI, or other related roles?

Truth be told, the industry does not have a standard definition of a data scientist. You have probably heard jokes like “A data scientist is a data analyst living in Silicon Valley”. Just for fun, take a look at the below cartoon that demonstrates this.

Finding an “effective” data scientist is difficult. Finding people to play the role of a data scientist can be equally difficult. Note the use of “effective” here. I use this word to highlight the fact that there could be people who possess some of these data science skills yet may not be the best fit for a data science role. The irony is that even people looking to hire data scientists might not fully understand data science. There are still some job advertisements in the market that describe traditional data analyst and business analyst roles while labeling it a “Data Scientist” position.

Instead of giving a list of data science skills with bullet points, I will highlight the differences between some of the data-related roles.

Consider the following scenario:

Shop-Mart and Bulk-Mart are two competitors in the retail setting. Someone high up in the management chain asks this question: “How many Shop-Mart customers also go to Bulk-Mart?” Replace Shop-Mart and Bulk-Mart with Walmart, Target, Safeway, or any retail outlets that you know of. The question might be of interest to the management of one of these stores or even a third party. The third party could possibly be a market research or consumer behavior company interested in gathering actionable insights about consumer behavior.

How professionals in different data-related roles will approach the problem:

Traditional BI/Reporting professional:

The BI professional generates reports from structured data using SQL and some kind of reporting services (SSRS for example) and sends the data back to management. Management asks more questions based on the data that was sent, and the cycle continues. Insights about the data are most likely not included in the reports. A person in this role will be experienced mostly in database-related skills.

Data analyst:

data_analyst
Data analyst working on data sheets

 

In addition to doing what a BI professional does, a data analyst will also keep other factors like seasonality, segmentation, and visualization in mind. What if certain trends in shopping behavior are tied to seasonality?

What if the trends are different across gender, demographics, geography, or product category? A data analyst will slice and dice the data to understand and annotate the report. Aside from database skills, a data analyst will have an understanding of some of the common visualization tools.

Business analyst:

A business analyst possesses the skills of a BI professional and a data analyst, plus they have domain knowledge and an understanding of the business. A business analyst may also have some basic skills in forecasting.

Data mining or big data engineer:

mining
Mining work depicting data mining

A data miner does the job of the data analyst, possibly from unstructured data if needed, plus possesses MapReduce and other big data skills. An understanding of common issues in running jobs on large-scale data and debugging of MapReduce jobs is needed.

Statistician (a traditional one):

A statistician pulls data from a database or obtains it from any of the roles mentioned above and performs statistical analysis. This person ensures the quality of data and correctness of the conclusions by using standard practices like choosing the right sample size, confidence level, level of significance, type of test, and so on.

In the past, statisticians did not traditionally come from a computer science background, needed for writing code to implement statistical models. The situation has changed, Stat students now graduating with strong programming skills and decent foundation skills in CS. This enables them to perform the tasks that previous statisticians were not trained for traditionally.

Program/Project manager:

meeting
Project managers working together

The program or project manager looks at all the data provided by the professionals mentioned so far, aligns these findings with the business, and influences the leadership to take appropriate action. This person possesses communication and presentation skills that can influence others without authority.

Ironically a PM is influencing business decisions using the data and insights provided by others. If the person does not have a knack for understanding data, chances are that they will not be able to influence others to make the best decisions.

Now, putting it all together

The rise of online services has brought a paradigm shift in the software development life cycle and business iteration over successive features and products. Having a different data puller, analyst, statistician, and project manager is just not possible anymore. Now the mantra is: ship, experiment, and learn; adapt; ship, experiment, and learn. This situation has resulted in the birth of a new role: a data scientist.

The mentioned qualities make up the needed data scientist skills. In addition to the skills mentioned above, a data scientist should have rapid prototyping and programming, machine learning, visualization, and hacking skills.

Domain knowledge and soft skills are equally important as technical skills

The importance of domain knowledge and soft skills, like communication and influencing without authority, are severely underestimated both by hiring managers and aspiring data scientists. Insights without domain knowledge can potentially mislead the consumers of these insights. Correct insights without the ability to influence decision-making are just as bad as having no insights.

All of what I have said above is based on my own tenure as a data scientist at a major search engine and later with the advertising platform within the same company. I learned that sometimes people asking the question may not understand what they want to know. This sounds preposterous yet it happens way too often.

Very often a bozo will start digging into something that is not related to the issue at hand just to prove that he/she is relevant. A data scientist encounters such HIPPOs (Highly Paid Person’s Opinions) that are somewhat unrelated to the problem and are very often a big distraction from the problem at hand.

Data scientist skills must include the ability to manage situations such as people asking irrelevant, distracting questions that are outside the scope of the task at hand. This is hard, especially in situations where the person asking the question is several levels up the corporate ladder and is known to have an ego. It is a data scientist’s responsibility to manage up and around while presenting and communicating insights.

Suggested skills a data science expert should possess

Curiosity about data and passion for the domain

If you are not passionate about the domain or business, and if you are not curious about data, then it is unlikely that you will succeed in a data scientist role. If you are working with an online retailer, your data scientist skills should be hungry to crunch and munch from the smorgasbord (of data, of course) to know more. If your curiosity does not keep you awake, no skill in the world can help you succeed.

Soft skills:

hello

Communication and influencing without authority are necessary skills. Understand the minimum action that has the maximum impact. Too many findings are as bad as no findings at all.  The ability to scoop information out of partners and customers, even from the unwilling ones, is extremely important. The data you are looking for may not be sitting in one single place. You may have to beg, borrow, steal, and do whatever it takes to get the data.

Being a good storyteller is also something that helps. Sometimes the insights obtained from data are counter-intuitive. If you’re not a good storyteller, it will be difficult to convince your audience.

Math/Theory

Machine Learning algorithms, statistics, and probability 101 are fundamental to data science. This includes understanding probability distributions, linear regression, statistical inference, hypothesis testing, and confidence intervals. Learning optimization, such as gradient descent, would be the icing on the cake.

Computer science/programming

clutter
Programming language

You should know at least one scripting language (I prefer Python), or a statistical tool such as R. There are plenty of resources to get started. Data Science Dojo provides numerous, free tutorials on getting started with Python and R. You can also learn the basics of programming from sites like CodeAcademy and LearnPython.

It is necessary to possess decent algorithms and DS skills in order to write code that can analyze a lot of data efficiently. You may not be a production code developer, but you should be able to write decent code.

Database management and SQL skills are also helpful, as this is where you will be fetching your data to build models. It also doesn’t hurt to understand Microsoft Excel or another spreadsheet software.

Big data and distributed systems

distribution
Distributed systems showing the distribution of data

You need to understand basic MapReduce concepts, the Hadoop and Hadoop file systems, and at least one language like Hive/Pig. Some companies have their own proprietary implementations of these languages.
Knowledge of tools like Mahout and any of the XaaS, like Azure and AWS, would be helpful. Once again, big companies have their own XaaS, so you may be working on variants of any of these.

Visualization

Possess the ability to create simple yet elegant and meaningful visualizations. Personally, R packages like ggplot, lattice, and others have helped me in most cases, but there are other packages that you can use. In some cases, you might want to use D3.

A visualization of data scientist skills:

How to become a data scientist
Data scientist skills

Where are data scientists in the big data pipeline?

Below is a visualization of the big data pipeline, the associated technologies, and the regions of operation. In general, the depiction of where the data scientist belongs in this pipeline is largely correct, but there is one caveat. Data scientist skills must include being comfortable about diving into the “Collect” and “Store” territories if needed. Usually, data scientists are working on transformed data and beyond. However, in scenarios where the business cannot afford to wait for the transformation process to finish, data scientist skills enable them to work on raw data to gather insights.

Big-Data-Technologies-Platforms-and-Products_vn2bcg

 

Are you ready to become a data scientist? Are you interested in possessing the data scientist skills? Learn the foundation of data science and start implementing your models in our time convenient, 5-day bootcamp. Check it out here!

June 14, 2022

Data Science Dojo hosted a data science interview AMA. It was a great opportunity to learn about data science career options, job roles, and skills.

As automated systems replace traditional business processes, a huge amount of data is being generated. Companies all around the world, from publishing houses to health care, are trying to unlock the value of data. Consequently, data science is becoming one of the most sought-after fields for young professionals all over the world.

Recently, Data Science Dojo hosted a data science interview AMA. It was a fantastic opportunity to learn about data science career options and job roles. The online seminar also included a Q/A session where data science enthusiasts asked many questions related to data science interviews and careers from the panelists.

The online seminar is presented by Data Scientist and Lead Instructor, Rebecca Merrett. She holds a post-graduate diploma in Mathematics and Statistics from the University of Southern Queensland. Co-hosting the webinar is Data Scientist Tarun Shrivas, who is a seasoned professional in Marketing Research and Analytics. He holds a master’s degree in business Analytics from Seattle University (Seattle, WA).

The webinar begins with a presentation on how to best prepare yourself for the data science industry. The discussion includes the diverse types of Data Scientists, data science interviews, job roles, commonly used tools, and how to go about building your portfolio. The presentation concludes with about 60 minutes of Q&A. You can watch the video below or continue reading.

* 0:00:00 – Introduction

* 0:01:41 – About Rebecca

* 0:02:22 – About Tarun

* 0:02:50 – Rebecca’s Presentation

* 0:22:23 – Q/A

Entering the field of data science

The presenters talk about how there’s no right way to build a foundation in data science. You can attend a university, a data science bootcamp, independent mentoring, or even free online courses. Some of these paths will take more effort than others, but one thing is evident, you MUST have a strong understanding of mathematical and statistical concepts.

How to Enter into the Field of Data Science
How to Enter into the Field of Data Science

Types of data science interviews and expectations

Throughout the presentation, emphasis was given to understanding the types of interview questions, job roles, and how candidates can best capitalize on their skillset.

The interviewer is expecting candidates to have knowledge about database tools and the skills required to read, retrieve, and make sense of the available data. A working knowledge of SQL queries is always helpful as well.

A Data Scientist role also requires candidates to have a fundamental understanding of the following:

  • Conditional probability
  • Bayes theorem
  • Normal and Binomial distributions
  • Central limit theorem
  • Linear Regression

Does this cover everything you should know? No, but these are some of the core subjects in data science. If you’re applying for a role involving product management and analytics, then experience with A/B testing will most likely need to be demonstrated.

Roles available

As we know, data science is a vast field, so it’s understandable that there are a variety of job functions available. Following are the three main types of data science roles:

The ‘All-Rounder’ Data Scientist

The Data Scientist is expected to build predictive models which include processing and cleaning data, isolating key features, and collecting new features. Data Scientists should be familiar with big data and machine learning concepts and should be able to drive business decisions.

The ‘Business Facing’ Data Analyst

The Data analyst is expected to visualize and segment data in a way that can help a business gather actionable insights. Data Analyst uses data to understand a key problem, opportunity, or trend that can be utilized in decision making. Data Analysts should be able to transform and manipulate large data sets, produce visualizations, and track web analytics.

The ‘Geeky’ Data Engineer

The Data Engineer is dedicated to deploying analytic solutions in the real world through front-end applications. A Data engineer should be able to set up the infrastructure for large amounts of data and possess strong software engineering skills.

Example Questions

Here are some examples of data science interview questions and answers presented at the end of the AMA to give candidates an idea of what to expect and how to best prepare for the interview.

Math & Stats

Example Question: Students’ academic scores follow a normal distribution with a mean of 18 and a standard deviation of 6. What proportion of students have scored between 18 and 24?

To solve this, you should be familiar with the z-score for normal distribution to difference the sample mean from the population mean in proportion to the standard deviation.

Product & Metrics

A company has created a web page to promote a product and encourage signups. One version of the page includes the “Find out more” the other version, “Learn more about us!”. Before going ahead with the second call-to-action, what action would you take to ensure this is the right choice in terms of user signup?

This is a typical A/B test question. You will need to conduct an A/B test with both versions of the page. One audience group will be exposed to version 1 and the other to version 2 so that we can ascertain which version of the page leads to more signups

The important thing is to keep your end goal in mind. If the end goal is the number of signups, then you would prefer the version that leads to a higher proportion of signups even if that page does not get a lot of traffic.

Commonly used tools

The most used tools by data scientists are discussed so that the audience may become familiar with them to build their portfolio.

Here is the list of the most used tools by Data Scientists:

  • R
  • Python
  • Apache Hadoop
  • MapReduce
  • NoSQL Databases
  • Cloud Computing
  • D3
  • Apache Pig
  • Tableau
  • iPython Notebooks
  • GitHub

R and Phyton have the most extensive set of libraries & tools to help and automate everyday tasks. If you’re a Data Engineer, you’re more likely to work with Hadoop, MapReduce, and Spark and as a Data Analyst, interactive data visualizing tools such as Tableau would be frequently used.

Resume tips

The resume is often the first impression your potential employer receives. Therefore, it’s important to carefully design your resume. In the webinar, resume structure and design are discussed in detail.

Structure

You should highlight your strong selling points first. This could be one of your interesting projects which is relevant to the employer. Organizing your resume in the most optimal manner is important to communicate your strong selling points and relevant content.

Design

Keep your resume interesting and to the point. Avoid having multiple pages and lengthy content. Your resume should include contact information and hyperlinks to your projects. It’s a great idea to share content like your website, LinkedIn profile, and other portfolio resources on your resume.

Experience

If you have job experience the important thing is to focus on the results you achieved rather than the actions, you took. You want the hiring team to perceive you as result driven. Be sure to list your experiences in chronological order.

Here are some tips to make your resume stand out:

  • Start bullet points with action verbs where possible.
  • Quantify or state the results of your action where possible.
  • Include Data Science projects and publications.
  • Highlight your business acumen skills.
  • Customize your resume based on the type of job role.
  • Use Resume analyzers: vmockjobscan
  • Check out this data scientist resume guide

What NOT to do in a data science interview?

Tarun and Rebecca explained what not to do in an interview from their own experience interviewing data science candidates. The most important thing is to provide clear examples of your experience with data and statistical analysis, if not then your chances of landing the job might be affected. You should provide clear examples of each component of a project you worked on, solving a specific problem, discussing the outcomes of your effort and other activities you were involved in.

Here are few other things to avoid in an interview:

  • Not giving concrete examples of experience with data and statistical analysis.
  • Lack of business acumen.
  • Purely academic or research background.
  • Not asking the right questions.
  • Being too serious. Try your best to make it a pleasant experience for your interviewer.
  • Lack of knowledge about the company.
  • Poor communication skills.
  • Talking in clichés (“I’m a team player”, “I’m a perfectionist”).

Most of those tips apply to candidates applying for a variety of roles. Having knowledge about the company, being practical, building your project portfolio, and improving your communication skills is relevant for most job roles today.

Questions and answers

Attendees posted a few of the questions before the webinar while some of the live questions were also answered. The audience seemed very interested in finding out about data science education and foundation requirements and how to enter the field as a fresh graduate with limited experience.

Q: How to handle LinkedIn invitations from strangers and how to respond to a recruiter reaching out?

The best way to respond to recruiters is to take time composing the reply. You want to present yourself as very interested in the company and their business. You also need to be appreciative of the fact that the recruiter is reaching out to you. You can talk about their products and services, a project they are working on, or any new development which may require new hiring. Present yourself as a potential problem solver for their business.

Q: What are some of the important questions to ask during the data science interview?

You can ask about what kind of data they are working with. The company could be working with highly problematic data and that’s the reason they are hiring an expert. They could be having a data modeling or data management problem. So, it’s a good idea to find out what data problems they are facing. This will give you insight into your day-to-day activities and the job role.

Q: How to answer what are you expecting from this role?

This question is another way of asking how the company fits into your overall career plan. Here you want to justify your current position, maybe you are just entering the field of data science or switching careers or companies. You need to justify why you’re choosing this company and the role.

Q: Sharing new ideas with the interviewers about the company be a good sign?

It is good to share new ideas, but keep in mind that first, you need to understand the problem they are having. To propose a solution, you need to have a good understanding of the problem.

Q: How can I answer questions about the most important metrics for an ad marketing campaign?

To answer these questions, it is important to have an end goal in mind. Ask yourself what the company is trying to achieve at the end of the day with the help of this metric. For example, if the company is using the number of clicks on a webpage without considering the end goal of signups then this will not give them a clear picture of the campaign’s success. If one of the pages has a 60% click rate but zero signups while the other has only a 20% click rate but a 90% signup rate, then, in this case, the latter would be considered more successful. So, when answering questions about marketing metrics, please keep in mind the end goal.

Q: What is the best thing I can do while in college to land a job in data science after graduating?

The best thing to do is gain experience, and one of the best ways to gain experience is through community projects. Look for charitable organizations or community organizations that might not have a big budget to hire someone but are willing to have volunteers lead them in the right direction.

For example, an environmental organization is looking to collect donations. They have data about different potential cities to set up donation drives. You could conduct population & demographic analysis to find out about the best cities for setting up the donation drives.

Q: There is a lot of competition for entry-level data science jobs. How do you stand out?

Yes, it is challenging especially if you’re talking about the Indian sub-continent. If we talk about the US, then the scenario is different. The number of jobs is abundant compared to the supply of talent, but there’s also another challenge of having the right skill set and experience. Companies are looking to hire individuals with particular skill sets. So it is important to keep improving your skills and gain experience to be able to compete. Having skills other than that of data science can also help to differentiate you from the competition. Try to learn about other business functions to create a more holistic profile.

Q: What are the things that data scientists should keep in mind when searching for their first job?

Sometimes it’s better to go with smaller companies, as they can provide you with more valuable experience. You could really make an impact working for a smaller company, as only a few people are running the data science projects. While most of the competition is looking to get into tech giants, it might be a good idea to start your career with a smaller company where competition is less, and more opportunities are available to learn and grow.

Q: Do I need Master’s/Ph.D. or an advanced degree to get into data science?

It’s not necessary to get advanced degrees to start your career in data science. Although it’s important to have a good foundation, which you can get from your bachelor’s or some other degree, as is the case with most technical fields,. But getting advanced degrees does not always guarantee you the best job. It’s equally important to gain experience with community projects, internships, or trainee opportunities. Having a Ph.D. means you have become an excellent researcher and are experienced in working on exceedingly difficult problems. This sometimes means opportunities available for advanced degree holders may be limited.

Q: Where can you practice machine learning?

Going to hackathons is an effective way to practice your machine learning skills in a comfortable setting. It’s also a good environment for guidance and feedback to improve your machine learning skills. You can also start practicing on Kaggle.

Q: What kind of portfolio is required to get into an entry-level Data Science job?

Working on your foundation is particularly important for entry-level data science jobs. Having a good foundation in mathematics and statistics is required. Being able to understand the metrics and business problems is also required for most data science roles. Understanding linear algebra, conditional probability, Bayes theorem, and central tendencies are necessary. Having a strong foundation helps you with the tools of data science and making analysis. Your portfolio should showcase an understanding of the core concepts and familiarity with some of the commonly used tools.

Q: How to transition from one career to another? For example, from cloud computing development environment to data science or from marketing and automation to data Science or from software engineering to data science.

There are always some transferable skills. If we talk about digital marketing, there is a lot of analytics in this field and requires data science.

If you’re looking at the big production systems, there are many components of software engineering involved. So being skilled in software engineering and data science would be a great advantage. For cloud computing, you can deploy your models if the company is big enough for the heavy-duty infrastructure. You need to find a role where your skills are transferable.

Also, if you are already working somewhere your current organization would be the best place to make the transition into another function. After that, you can definitely look for a company where your preferred role is available and where data science is encouraged.

Q: What’s the interviewer’s approach when hiring fresh data scientists?

Conceptual clarity is particularly important even if you don’t have years of experience in different data science domains. Make sure whatever you mention in your resume you should be truly clear about the concept behind it. The Interviewer will also evaluate your understanding of basic concepts which includes Mathematics, Statistics, and Machine learning. This will give the company a sense of how much effort is required to train the candidate.

Q: How do I tell a story about myself and my projects to stand out?

It is especially important to provide the interviewer with an opportunity to look at the work you have done. For that purpose, you can use the GitHub repository to make your analytics available. Including links to your repository on your resume is a good idea too. Even better is to build a portfolio on WordPress to get noticed.

Portfolio websites are becoming more common nowadays. If you look at the companies’ hiring pages, they do ask for a LinkedIn profile, GitHub repository, and your website. So, this is a fantastic opportunity to highlight your work efficiently. Your portfolio should not be limited to your code and output only but should also include some writing sample that describes your output. It’s always a clever idea to highlight your communication skills. Most of the time, the hiring person is evaluating if you’re able to clearly communicate your analysis and findings, so communication becomes an essential skill.

If you put your work online, it becomes easier for the hiring team to research you. So, at the time of the interview, they have a better idea of your abilities which could make a significant difference.

The webinar was a perfect combination of practical information and guidelines to kick start your career in data science. A great deal of the discussion applies to candidates applying for a role outside of the data science domain.

It’s important for candidates to have a conceptual understanding of the field and demonstrate an interest in and understanding of the company they are applying for. To start your career in data science, your first step is to have a strong foundation of the core subjects. The next step is to build your portfolio. Make sure to always be working on your experience. Volunteering for a community project is a wonderful way to practice your skills. Having strong technical skills along with interpersonal and communication skills will help you stand out from the crowd in this highly competitive job market.  Don’t forget about applying for smaller companies. Your role will be more involved, and the lessons you learn from mistakes and successes will be more profound.

Thanks for reading! I hope this has given you a good understanding of data science career options and how to best prepare for an interview. Here is another awesome blog on 101 Data Science Interview Questions to help you get fully prepared for the interview.

June 10, 2022

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence