- This event has passed.
Mixture of Experts: Harnessing the Hidden Architecture of GPT4
October 25 @ 6:00 pm - 7:00 pm
We’re back at HudsonAlpha this week to talk about Mixture of Experts. Josh Phillips has been working with MoE across several LLM based projects over the last few months. He will give us an overview of MoE and the current progress on this pattern among the open source community.
In today’s AI landscape, we often face a trade-off between the capabilities of dense language models and the computational cost associated with running them. The Mixture of Experts (MoE) architecture offers an intriguing solution to this dilemma. By replacing standard linear layers with a Gating Mechanism that routes tasks to specialized ‘experts,’ MoE models manage to keep computational costs stable while expanding the model’s memory footprint. This approach allows for high performance without the usual spike in inference time. This however comes at the cost of increased complexity during training and inference.
The Mixture of Experts architecture is widely known to be the “secret sauce” behind the GPT4 model currently available from OpenAI, although concrete details of their implementation of the architecture are not currently known.
In this talk we will explore the building blocks of the mixture of experts architecture and how it compares to traditional transformers. We will also explore the current paradigms for implementing mixture of experts, example projects, and the challenges being faced by each.
- Sparse MoE (Cornerstone) : https://arxiv.org/abs/1701.06538
- Sparse MoE (ModuleFormer) : https://arxiv.org/abs/2306.04640
- Two Layer Feedforward Networks for Efficient Transformers : https://arxiv.org/pdf/2310.10837.pdf
- Soft MoE : https://arxiv.org/pdf/2308.00951.pdf
- MoE in Real World Cloud Applications : https://arxiv.org/abs/2211.10017
- Mixture of LoRA : https://arxiv.org/abs/2309.05444
- Stable MoE : https://arxiv.org/abs/2202.08906
- Parameter sharing for wide linear layers : https://arxiv.org/pdf/2107.11817.pdf
- Scaling Router Based Language Models : https://arxiv.org/abs/2202.01169
Repos / Code
- ModuleFormer : https://github.com/IBM/ModuleFormer
- Interactive Switch Transformer Guide : https://nn.labml.ai/transformers/switch/index.html
- Soft MoE : https://github.com/lucidrains/mixture-of-experts/
- Stable MoE : https://github.com/lucidrains/st-moe-pytorch/
- Cross GPU Training for MoE : https://www.deepspeed.ai/tutorials/mixture-of-experts-nlg/
- T5 (Encoder/Decoder) MoE : https://github.com/google-research/t5x/tree/main/t5x/contrib/moe
- Vision MoE : https://github.com/google-research/vmoe/
- Issues with MoE LoRAs : https://sumo43.github.io/jekyll/update/2023/08/25/lora_moe.html
- Date – 10/25/2023
- Time – 6-7pm
- Location – HudsonAlpha
- Address – 601 Genome Way Northwest, Huntsville, AL 35806
- Zoom –https://us02web.zoom.us/j/84751900660?pwd=Zy9NeXhBdkhYMUJWcGVuNTdFQ0NEUT09