Overview
Data normalization and scaling are techniques used to adjust the values of a dataset so that they are on a similar scale and have the same properties. This is important because many statistical and machine learning algorithms assume that the data is normally distributed, or that the features are on a similar scale. If this assumption is not met, the algorithms may produce biased or inaccurate results. By normalizing or scaling the data, the data is transformed into a consistent and interpretable form that is suitable for further analysis and modeling. This is an important step in the data science process that helps to ensure the validity and accuracy of the results obtained from any further analysis or modeling.
In this course, you will:
- Apply normalization and scaling techniques to datasets
- Choose the appropriate technique between normalization and scaling, depending on your dataset
- Identify datasets that need to be scaled and/or normalized before proceeding with analysis
Course Contents
1. Data Normalization
- What is data normalization?
- Why is it important?
- Z-score normalization
- Unit vector normalization
- Mean normalization
- Quantile normalization
- Knowledge check
2. Data Scaling
- What is data scaling?
- Why is it important?
- Logarithmic scaling
- Decimal scaling
- Robust scaling
- Mean absolute deviation scaling
- Knowledge check