The total amount of digital data generated worldwide is increasing at a rapid rate. Simultaneously, approximately 80% of this newly generated data is unstructured data – data that does not conform to a table- or object-based model. Examples of unstructured data include text, images, protein structures, geospatial information, and IoT data streams. Despite this, the vast majority of companies and organizations do not have a way of storing and analyzing these increasingly large quantities of unstructured data. Embeddings – high-dimensional, dense vectors which represent the semantic content of unstructured data – can remedy this.
In this tutorial, we’ll introduce embeddings and vector search from both an ML- and application-level perspective. We’ll start with a high-level overview of embeddings and discuss best practices around embedding generation and usage. We’ll then use this knowledge to build two systems: semantic text search and reverse image search. Finally, we’ll see how we can put our application into production using Milvus, the world’s most popular open-source vector database.
We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!