Clustering is an unsupervised machine learning technique where data need not be labeled. The goal of clustering is to find like-items such as similar customers, similar products, or similar students, just to name a few. Popular clustering algorithms include K-means and hierarchical clustering, as well as DBSCAN, PAM, and more.
In this session, participants will learn how hierarchical clustering methods build clusters from the ground up. The benefits of a hierarchical clustering approach will be shared, and a few disadvantages will be covered. Our example use case focuses on using The College Scorecard data to cluster similar schools together based on various demographic and institution characteristics – to help students identify similar schools when they are exploring their multiple options.
Senior Data Scientist at Nelnet