Loading Events

« All Events

Virtual Paper Review – Unifying Embeddings: From Words to Pixels

July 16 @ 6:00 pm7:30 pm

Unifying Embeddings From Words to Pixels

Join us virtually this Wednesday as we crack open the magic behind modern multimodal embeddings and why they’re not just “text embeddings with pictures pasted on.”

Part I – Foundations

  • Modality alignment 101 – How contrastive pre-training pulls text, images, and video frames into one joint space.

  • Vector anatomy – Why pixel patches, temporal frame sequences, and word tokens all reduce to the same dot-product math.

  • Evaluation metrics – What “recall,” “precision,” and “rankability” really mean once your queries aren’t just strings.

Part II – Deep Dive

We’ll explore two recent papers as our testbed for these concepts:

  • JinaEmbed V4 – A single-path backbone unifying images, video, text, and code.
  • On the Rankability of Visual Embeddings – Recovering numeric “more vs. less” axes with only two labeled examples.

We’ll finish by surveying how to turn these insights into a two-stage reranking pipeline from dense k-NN recall to lightweight LoRA heads or simple linear probes for precision uplift.

Links:

Details:

Details

Date:
July 16
Time:
6:00 pm – 7:30 pm