This event has passed.

Virtual Paper Review – Gemma 4 & TurboQuant: Fitting New Models on Your Hardware

Name: Virtual Paper Review – Gemma 4 & TurboQuant: Fitting New Models on Your Hardware
Start: 2026-04-15T18:00:00-05:00
End: 2026-04-15T19:30:00-05:00

April 15 @ 6:00 pm – 7:30 pm

Join us virtually this Wednesday at 6 pm to continue our monthly Paper Review series! Google released the Gemma 4 model family and TurboQuant KV cache compression about a week apart. We’ll look at what’s actually new in the architecture, how TurboQuant holds up against its claims, and walk through fitting these models across different hardware.

Part I – Foundations

KV cache: what it is and why it’s the bottleneck for long contexts
Per-Layer Embeddings: why half the parameters “don’t count”
Sliding window + global attention and the 5:1 hybrid layer approach
Quantization tradeoffs from fp16 down to sub-4-bit

Part II – Deep Dive

Gemma 4 architecture: what matters for local deployment
TurboQuant: the algorithm, the claims, and what implementations show
Live experiments across RTX PRO 6000 Blackwell, RTX 5090, A6000 Ampere, Mac M3 Pro
Fine-tuning with Unsloth/TRL on local hardware

Links:

Gemma 4 Model Card: https://ai.google.dev/gemma/docs/core/model_card_4

Gemma 4 HF Blog: https://huggingface.co/blog/gemma4

TurboQuant Paper: https://arxiv.org/abs/2504.19874

Details:

Date – 04/15/2026
Time – 6:00 – 7:30 pm
Google Meet – https://meet.google.com/acw-zrnm-akn

Details

Date:: April 15
Time:: 6:00 pm – 7:30 pm