Tag
moe
4 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Mixtral 8x7B released on: 2023-12-11.
410aec4f418f2b11 · 2 sources · 95% confidence
Mixtral 8x7B architecture: Sparse Mixture-of-Experts (8 experts × 7B params, 2 experts routed per token).
ad79b14fafb362cd · 2 sources · 100% confidence
Sparsely-Gated Mixture-of-Experts (MoE) introduced in paper: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (Shazeer et al., 2017).
2d6d7f61f1db6493 · 1 source · 100% confidence
Switch Transformer introduced in paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Fedus et al., 2021).
3d9c14b9379038c9 · 2 sources · 100% confidence