Learn Practical Data Science, Programming, and Machine Learning. 25% Off for a Limited Time.
Join our Data Science Bootcamp
Python for Data Science

Feature Engineering Using Python

2 Modules
1 Hour

Overview

Data enrichment is the process of enhancing, refining, and improving raw data by adding more information to it from external sources. This can include adding additional fields, filling in missing values, correcting errors, and providing more context about the data. The goal is to make the data more complete, accurate, and useful for analysis. An example of data enrichment is adding geographic data to a dataset. For instance, suppose you have a dataset containing a list of customers and their orders, but it doesn’t contain any information about where these customers live. By using an external geocoding service, you can enrich the dataset by adding latitude and longitude coordinates to each customer’s address. This information can be used to analyze patterns in customer locations, visualize the data on a map, or combine it with other geospatial data to gain further insights. Data enrichment is a powerful technique that can help to improve the quality and usefulness of data by adding more information to it. It is useful in a variety of contexts, including marketing, sales, customer service, and research, where having more complete and accurate data can lead to better insights and decisions.

In this course, you will:

  • Improve your raw data by enriching it with additional information
  • Create new useful features using feature engineering techniques
  • Understand why enrichment is important for getting accurate insights

Course Contents

1. Data Augmentation

  • What is data augmentation?
  • Why is it important?
  • Data annotation
  • Text augmentation
  • Synthetic data generation
  • Knowledge check

2. Feature Engineering

  • What is feature engineering?
  • Why is it important?
  • Feature selection
  • Binning
  • Encoding categorical variables
  • Derived features
  • Knowledge check