Topic hub · 47 claims
Open-weight LLMs — the 2023-2025 catalog
The open-weight LLM landscape — every major release verified against the official announcement and the Hugging Face model card. Includes license, parameter count, release date, and family lineage.
The open-weight wave
Between Llama 2 (July 2023) and Llama 4 (April 2025), open-weight LLMs went from rare research artifacts to a competitive parallel ecosystem matching frontier closed APIs on most general benchmarks. The fleet of open-weight families — Meta Llama, Mistral, Google Gemma, Alibaba Qwen, DeepSeek, Allen AI OLMo, IBM Granite, TII Falcon, 01.AI Yi, Stability LM, Microsoft Phi, Tencent Hunyuan — gave researchers, fine-tuners, and on-prem deployments real options. The license diversity matters: some are pure Apache 2.0 (Mistral most, Gemma 2 under Gemma Terms, OLMo Apache 2.0), some are conditional (Llama 3 with monthly-active-user threshold), some are NVIDIA Open Model License (Nemotron), some are research-only.
Sizes + architectures span 4 orders of magnitude
Open weights range from on-device-tier SmolLM 135M up to Hunyuan-Large 389B (52B active MoE). Architectures span dense Transformer (most Llama, Mistral 7B, Gemma 2), Mixture-of-Experts (Mixtral 8x7B/8x22B, DeepSeek-V2/V3, Hunyuan-Large, Mistral Nemo isn't MoE), and hybrid SSM-Transformer (AI21 Jamba — first production Mamba). The choice of architecture maps to deployment tradeoffs: MoE = high quality at lower active-parameter cost; dense = simpler inference; SSM hybrid = longer context window with lower attention-quadratic cost.
Multilingual + specialist forks
Beyond the English-default releases, the open-weight ecosystem has specialist forks. Cohere Aya 23 covers 23 languages; Mistral Saba targets Arabic + South Asian; Allen AI Tülu 3 is the open-replication recipe for Llama-3-Instruct quality; Stability LM specializes in stability of generation; StarCoder 2 focuses on code. The composition matters because LLM cost-per-token is roughly constant in the open ecosystem but quality on a specialist task varies massively. Pick the right specialist before fine-tuning a general model.
Why this catalog matters for verification
AI-assistants are most likely to hallucinate when they confidently misstate a release date, license, parameter count, or family lineage. Open-weight models confuse the picture further: Llama 2 vs Llama 3 vs Llama 3.1 vs Llama 3.2 vs Llama 3.3 vs Llama 4 — six distinct releases, six distinct dates, frequent mis-attribution. This hub holds the verified record for each.
Defined terms (5)
- Open-weight
- A model whose trained weights are publicly downloadable, with a license permitting at least research use. Distinct from open-source (which would require open training data + code + weights).
- Mixture-of-Experts (MoE)
- Architecture where each token routes to a small subset of expert sub-networks. Examples: Mixtral 8x7B (8 experts × 7B params, 2 active per token), DeepSeek-V3 (671B total / 37B active), Hunyuan-Large (389B total / 52B active).
- Apache 2.0 license
- Permissive open-source license allowing commercial use, modification, redistribution. Used by Mistral 7B, Mixtral, OLMo 2, IBM Granite, AI21 Jamba, Mistral Pixtral 12B, Mistral Nemo.
- Llama 3 Community License
- Meta's license for Llama 3 family — permissive for most use but requires a separate agreement if your platform exceeds 700M monthly active users.
- Tülu
- Allen Institute for AI's open-recipe instruction-tuning project. Tülu 3 (2024-11) replicates Llama-3-Instruct quality with fully-open training data + code + recipes.
All claims in this topic (47)
- AI21 Jamba·publicly released on 2024-03-28 by AI21 Labs — first production-grade hybrid SSM-Transformer (Mamba + Transformer) model, 52B/12B active(1.00 · 2 sources)
- Allen AI OLMo 2·publicly released on 2024-11-26 by Allen Institute for AI — fully-open 7B + 13B models with full training data, code, recipes(1.00 · 2 sources)
- Cohere Aya 23·publicly released on 2024-05-22 by Cohere For AI — multilingual model covering 23 languages(1.00 · 2 sources)
- Cohere Aya Vision·publicly released on 2025-03-04 by Cohere For AI — multilingual open-weight vision-language models (8B + 32B), 23 languages(1.00 · 2 sources)
- Databricks DBRX·publicly released on 2024-03-27 by Databricks — 132B-parameter MoE (36B active per token), Databricks Open Model License(1.00 · 2 sources)
- DeepSeek-R1·released on 2025-01-20 with reasoning chain-of-thought capabilities(1.00 · 2 sources)
- DeepSeek-V2·publicly released on 2024-05-07 by DeepSeek — MoE 236B-parameter open model(1.00 · 2 sources)
- DeepSeek-V3·publicly released on 2024-12-26 by DeepSeek AI — 671B-parameter MoE (37B active), open weights(1.00 · 2 sources)
- Falcon LLM·publicly released on 2023-05-23 by Technology Innovation Institute (TII)(1.00 · 2 sources)
- Gemma·released on 2024-02-21(1.00 · 2 sources)
- Genmo Mochi 1·publicly released on 2024-10-22 by Genmo — open-weight text-to-video diffusion model, 10B parameters(1.00 · 2 sources)
- Google Gemma 2·publicly released on 2024-06-27 by Google DeepMind — Gemma 2 family (9B + 27B), Gemma terms(1.00 · 2 sources)
- Google Gemma 3·publicly released on 2025-03-12 by Google DeepMind — Gemma 3 family (1B/4B/12B/27B), 128k context, multimodal vision(1.00 · 2 sources)
- Hugging Face SmolLM·publicly released on 2024-07-16 by Hugging Face — small-LM family (135M/360M/1.7B) optimized for on-device inference(1.00 · 2 sources)
- IBM Granite·publicly released on 2024-05-09 by IBM — open-weight enterprise-AI model family (3B/8B/13B/20B/34B), Apache 2.0(1.00 · 2 sources)
- Llama 2·released on 2023-07-18(1.00 · 2 sources)
- Llama 2 70B·parameter count 70000000000(1.00 · 2 sources)
- Llama 3·released on 2024-04-18(1.00 · 2 sources)
- Llama 3 70B·parameter count 70000000000(1.00 · 2 sources)
- Llama 3 8B·parameter count 8000000000(1.00 · 2 sources)
- Meta Llama 3.2 Vision·publicly released on 2024-09-25 by Meta — 11B + 90B vision-language variants of Llama 3.2(1.00 · 2 sources)
- Mistral 7B·released on 2023-09-27(1.00 · 3 sources)
- Mistral AI·founded in 2023(1.00 · 1 sources)
- Mistral Codestral·publicly released on 2024-05-29 by Mistral AI — code-specialized model(1.00 · 2 sources)
- Mistral Large 2·released on 2024-07-24 by Mistral AI(1.00 · 2 sources)
- Mistral Le Chat·publicly released on 2024-02-26 by Mistral AI — consumer chat assistant interface to Mistral models(1.00 · 2 sources)
- Mistral Nemo·publicly released on 2024-07-18 by Mistral AI + NVIDIA — 12B model with 128k context, Apache 2.0(1.00 · 2 sources)
- Mistral OCR·publicly released on 2025-03-06 by Mistral AI — document-understanding OCR model with high-fidelity table + math extraction(1.00 · 2 sources)
- Mistral Pixtral 12B·publicly released on 2024-09-11 by Mistral AI — 12B multimodal vision-language model, Apache 2.0(1.00 · 2 sources)
- Mistral Saba·publicly released on 2025-02-17 by Mistral AI — 24B model optimized for Arabic + South Asian languages(1.00 · 2 sources)
- Mistral Small 3·publicly released on 2025-01-30 by Mistral AI(1.00 · 2 sources)
- Mixtral 8x7B·architecture Sparse Mixture-of-Experts (8 experts × 7B params, 2 experts routed per token)(1.00 · 2 sources)
- MoE Mixtral 8x22B·released on 2024-04-10 by Mistral AI(1.00 · 2 sources)
- NVIDIA Nemotron-4 340B·publicly released on 2024-06-14 by NVIDIA — 340B-parameter open-weight model optimized for synthetic data generation(1.00 · 2 sources)
- OLMo·released on 2024-02-01 by Allen Institute for AI(1.00 · 2 sources)
- Snowflake Arctic·publicly released on 2024-04-24 by Snowflake — 480B-parameter MoE LLM (17B active), Apache 2.0(1.00 · 2 sources)
- Stability AI Stable Diffusion 3.5·publicly released on 2024-10-22 by Stability AI — SD 3.5 Large (8B) + Medium + Large Turbo open-weight image generation(1.00 · 2 sources)
- Stable LM·publicly released on 2023-04-19 by Stability AI(1.00 · 2 sources)
- Tencent Hunyuan Video·publicly released on 2024-12-04 by Tencent — 13B-parameter open-weight text-to-video model(1.00 · 2 sources)
- Tencent Hunyuan-Large·publicly released on 2024-11-05 by Tencent — 389B-parameter open-weight MoE (52B active)(1.00 · 2 sources)
- Yi (01.AI)·publicly released on 2023-11-05 by 01.AI (Kai-Fu Lee)(1.00 · 2 sources)
- DeepSeek V3·released on 2024-12-26(0.95 · 2 sources)
- Mistral Magistral·publicly released on 2025-06-10 by Mistral AI — first Mistral reasoning model (Magistral Small open-weight + Medium API)(0.95 · 2 sources)
- Mistral Medium 3·publicly released on 2025-05-07 by Mistral AI — Medium-tier proprietary model balancing cost + performance(0.95 · 2 sources)
- Mixtral 8x7B·released on 2023-12-11(0.95 · 2 sources)
- Qwen·released on 2023-08-03(0.95 · 2 sources)
- Qwen 3·publicly released on 2025-04-29 by Alibaba — Qwen 3 family (0.6B-235B), hybrid thinking mode, 128k context(0.95 · 2 sources)