Overview

Data cleaning is the process of identifying and correcting or removing errors, inconsistencies, and irrelevant information from a dataset. The goal of data cleaning is to prepare the data for further analysis, modeling, or visualization by ensuring that it is accurate, consistent, and relevant. In this module, we will be talking about:

In this course, you will:

Identify missing values in a dataset
Apply techniques such as imputation for handling missing values
Identify and handle duplicate data in a dataset

Understand the importance of consistent data formats
Standardize data types and values in a given dataset

Course Contents

1. Missing Data

What is Missing Data?
Why is it important?
Deleting the Missing Values
Imputation Using Central Tendency Measures
Knowledge check

2. Duplicate Data

What is Duplicate Data?
Why is it important?
Identifying Duplicate Data
Removing Duplicate Data
Knowledge check

3. Inconsistent Data

What is Inconsistent Data?
Why is it important?
Data Standardization
Knowledge check

LLM - Online Courses

Reviews

Consulting

Community

Data Cleaning Using Python

3 Modules

1 Hour