Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more
Python for Data Science

Data Transformation Using Python

3 Modules
1 Hour


Data transformation is a process of modifying and converting raw data into a clean, structured, and organized format that is ready for further analysis and modeling. This process can involve several different steps such as merging data from multiple sources, reshaping the data to make it consistent, and grouping data to help reveal patterns and insights. The end result is a dataset that is standardized, optimized, and well-suited for statistical modeling and other types of data analysis.

In this course, you will:

  • Understand how to convert raw data into a clean and structured format
  • Synthesize information from multiple datasets to create a unified dataset
  • Summarize and interpret data to produce valuable insights
  • Evaluate the quality of data to facilitate effective decision-making

Course Contents

1. Merging Data

  • What is merging data?
  • Why is it important?
  • Performing inner join
  • Performing left join
  • Performing right join
  • Performing outer join
  • Performing cross join
  • Performing concatenation
  • Knowledge check

2. Aggregating Data

  • What is aggregating data?
  • Why is it important?
  • Grouping data
  • Binning data
  • Performing statistical measures
  • Performing multi-level indexing
  • Knowledge check

3. Reshaping Data

  • What is reshaping data?
  • Why is it important?
  • Creating pivot tables
  • Melting data
  • Stacking and unstacking data
  • Transposing data
  • Encoding categorical data
  • Knowledge check