Loading Events

« All Events

  • This event has passed.

Virtual Paper Review – Gemma 4 & TurboQuant: Fitting New Models on Your Hardware

April 15 @ 6:00 pm7:30 pm

Gemma 4 & TurboQuant

Join us virtually this Wednesday at 6 pm to continue our monthly Paper Review series! Google released the Gemma 4 model family and TurboQuant KV cache compression about a week apart. We’ll look at what’s actually new in the architecture, how TurboQuant holds up against its claims, and walk through fitting these models across different hardware.

Part I – Foundations

  • KV cache: what it is and why it’s the bottleneck for long contexts
  • Per-Layer Embeddings: why half the parameters “don’t count”
  • Sliding window + global attention and the 5:1 hybrid layer approach
  • Quantization tradeoffs from fp16 down to sub-4-bit

Part II – Deep Dive

  • Gemma 4 architecture: what matters for local deployment
  • TurboQuant: the algorithm, the claims, and what implementations show
  • Live experiments across RTX PRO 6000 Blackwell, RTX 5090, A6000 Ampere, Mac M3 Pro
  • Fine-tuning with Unsloth/TRL on local hardware

Links:

Gemma 4 Model Card: https://ai.google.dev/gemma/docs/core/model_card_4

Gemma 4 HF Blog: https://huggingface.co/blog/gemma4

TurboQuant Paper: https://arxiv.org/abs/2504.19874

Details:

Details

Date:
April 15
Time:
6:00 pm – 7:30 pm