Mixtral 8x7B architecture: Sparse Mixture-of-Experts (8 experts × 7B params, 2 experts routed per token).