Overview
Data cleaning is the process of identifying and correcting or removing errors, inconsistencies, and irrelevant information from a dataset. The goal of data cleaning is to prepare the data for further analysis, modeling, or visualization by ensuring that it is accurate, consistent, and relevant. In this module, we will be talking about:
In this course, you will:
- Identify missing values in a dataset
- Apply techniques such as imputation for handling missing values
- Identify and handle duplicate data in a dataset
- Understand the importance of consistent data formats
- Standardize data types and values in a given dataset
Course Contents
1. Missing Data
- What is Missing Data?
- Why is it important?
- Deleting the Missing Values
- Imputation Using Central Tendency Measures
- Knowledge check
2. Duplicate Data
- What is Duplicate Data?
- Why is it important?
- Identifying Duplicate Data
- Removing Duplicate Data
- Knowledge check
3. Inconsistent Data
- What is Inconsistent Data?
- Why is it important?
- Data Standardization
- Knowledge check