This event has passed.

Virtual Paper Review – Unifying Embeddings: From Words to Pixels

Name: Virtual Paper Review – Unifying Embeddings: From Words to Pixels
Start: 2025-07-16T18:00:00-05:00
End: 2025-07-16T19:30:00-05:00

July 16 @ 6:00 pm – 7:30 pm

Join us virtually this Wednesday as we crack open the magic behind modern multimodal embeddings and why they’re not just “text embeddings with pictures pasted on.”

Part I – Foundations

Modality alignment 101 – How contrastive pre-training pulls text, images, and video frames into one joint space.
Vector anatomy – Why pixel patches, temporal frame sequences, and word tokens all reduce to the same dot-product math.
Evaluation metrics – What “recall,” “precision,” and “rankability” really mean once your queries aren’t just strings.

Part II – Deep Dive

We’ll explore two recent papers as our testbed for these concepts:

JinaEmbed V4 – A single-path backbone unifying images, video, text, and code.
On the Rankability of Visual Embeddings – Recovering numeric “more vs. less” axes with only two labeled examples.

We’ll finish by surveying how to turn these insights into a two-stage reranking pipeline from dense k-NN recall to lightweight LoRA heads or simple linear probes for precision uplift.

Links:

JinaEmbed V4 (PDF): https://arxiv.org/abs/2506.18902
Rankability: https://arxiv.org/abs/2507.03683

Details:

Date – 7/16/2025
Time – 6-7:30pm
Location – VIRTUAL
Zoom –https://us02web.zoom.us/j/85067618698?pwd=lnLbLGFfcKMzjnSplzF9GdkyG9FOJU.1

Details

Date:: July 16
Time:: 6:00 pm – 7:30 pm