Matryoshka Embeddings with Aditya Kusupati, Zach Nussbaum, and Zain Hasan – Weaviate Podcast #89!



Hey everyone! Thank you so much for watching the 89th Weaviate Podcast on Matryoshka Representation Learning! I am beyond grateful to be joined by the lead author of Matryoshka Representation Learning, Aditya Kusupati, Zach Nussbaum, a Machine Learning Engineer at Nomic AI bringing these embeddings to production, and my Weaviate colleague, Zain Hasan, who has done amazing research on Matryoshka Embeddings! We think this is a super powerful development for Vector Search! This podcast covers all sorts of details from generally what Matryoshka embeddings are, the challenges of training them, experiences building an embeddings API product from Nomic AI and how it ties with Nomic Atlas, Aditya’s research on differentiable ANN indexes, and many more! This was such a fun one, I really hope you find it useful! Please let us know what you think!

Links:

Matryoshka Representation Learning (MRL) from the Ground Up:

Matryoshka Representation Learning:

Nomic AI Embeddings:

Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning:

Chapters
0:00 Welcome Everyone!
1:57 Matryoshka in Nomic Embeddings
3:10 Origin of Matryoshka Embeddings
7:50 Optimization Challenge
11:02 Is there any reason not to do this?
13:13 Datasets for Embedding Train
18:18 Synthetic Queries
24:08 Adding MRL to existing models
32:06 Fine-Tuning Embedding APIs
33:48 Nomic Atlas and Matryoshka Embeds
35:38 Nomic Embeddings Launch
38:10 Matryoshka Weightings
42:48 Information Diffusion
48:04 How do you measure clustering?
55:18 Differentiable ANN Indexes
1:08:40 Exciting directions for the future!

[ad_2]

source

Exit mobile version