fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Introduction to Hierarchical Clustering with College Scorecard Data

Agenda

Clustering is an unsupervised machine learning technique where data need not be labeled. The goal of clustering is to find like-items such as similar customers, similar products, or similar students, just to name a few. Popular clustering algorithms include K-means and hierarchical clustering, as well as DBSCAN, PAM, and more.

In this session, participants will learn how hierarchical clustering methods build clusters from the ground up. The benefits of a hierarchical clustering approach will be shared, and a few disadvantages will be covered. Our example use case focuses on using The College Scorecard data to cluster similar schools together based on various demographic and institution characteristics – to help students identify similar schools when they are exploring their multiple options.

Richard Huebner

Senior Data Scientist at Nelnet

Dr. Richard Huebner has 25+ years of experience in data-related roles including senior individual contributor and Director-level roles in data science and data architecture. He earned his Ph.D. in IT and his research focuses on developing analytics and data science capabilities for organizations. His expertise is in educational data mining and academic/learning analytics. Bridging the gap between academia and industry, Dr. Rich routinely publishes articles, speaks at conferences, and participates on several advisory boards for universities.

We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!

Resources