Open-weight LLMs — the 2023-2025 catalog

The open-weight wave

Between Llama 2 (July 2023) and Llama 4 (April 2025), open-weight LLMs went from rare research artifacts to a competitive parallel ecosystem matching frontier closed APIs on most general benchmarks. The fleet of open-weight families — Meta Llama, Mistral, Google Gemma, Alibaba Qwen, DeepSeek, Allen AI OLMo, IBM Granite, TII Falcon, 01.AI Yi, Stability LM, Microsoft Phi, Tencent Hunyuan — gave researchers, fine-tuners, and on-prem deployments real options. The license diversity matters: some are pure Apache 2.0 (Mistral most, Gemma 2 under Gemma Terms, OLMo Apache 2.0), some are conditional (Llama 3 with monthly-active-user threshold), some are NVIDIA Open Model License (Nemotron), some are research-only.

Sizes + architectures span 4 orders of magnitude

Open weights range from on-device-tier SmolLM 135M up to Hunyuan-Large 389B (52B active MoE). Architectures span dense Transformer (most Llama, Mistral 7B, Gemma 2), Mixture-of-Experts (Mixtral 8x7B/8x22B, DeepSeek-V2/V3, Hunyuan-Large, Mistral Nemo isn't MoE), and hybrid SSM-Transformer (AI21 Jamba — first production Mamba). The choice of architecture maps to deployment tradeoffs: MoE = high quality at lower active-parameter cost; dense = simpler inference; SSM hybrid = longer context window with lower attention-quadratic cost.

Multilingual + specialist forks

Beyond the English-default releases, the open-weight ecosystem has specialist forks. Cohere Aya 23 covers 23 languages; Mistral Saba targets Arabic + South Asian; Allen AI Tülu 3 is the open-replication recipe for Llama-3-Instruct quality; Stability LM specializes in stability of generation; StarCoder 2 focuses on code. The composition matters because LLM cost-per-token is roughly constant in the open ecosystem but quality on a specialist task varies massively. Pick the right specialist before fine-tuning a general model.

Why this catalog matters for verification

AI-assistants are most likely to hallucinate when they confidently misstate a release date, license, parameter count, or family lineage. Open-weight models confuse the picture further: Llama 2 vs Llama 3 vs Llama 3.1 vs Llama 3.2 vs Llama 3.3 vs Llama 4 — six distinct releases, six distinct dates, frequent mis-attribution. This hub holds the verified record for each.

Defined terms (5)

Open-weight

A model whose trained weights are publicly downloadable, with a license permitting at least research use. Distinct from open-source (which would require open training data + code + weights).

Mixture-of-Experts (MoE)

Architecture where each token routes to a small subset of expert sub-networks. Examples: Mixtral 8x7B (8 experts × 7B params, 2 active per token), DeepSeek-V3 (671B total / 37B active), Hunyuan-Large (389B total / 52B active).

Apache 2.0 license

Permissive open-source license allowing commercial use, modification, redistribution. Used by Mistral 7B, Mixtral, OLMo 2, IBM Granite, AI21 Jamba, Mistral Pixtral 12B, Mistral Nemo.

Llama 3 Community License

Meta's license for Llama 3 family — permissive for most use but requires a separate agreement if your platform exceeds 700M monthly active users.

Tülu

Allen Institute for AI's open-recipe instruction-tuning project. Tülu 3 (2024-11) replicates Llama-3-Instruct quality with fully-open training data + code + recipes.

Open-weight LLMs — the 2023-2025 catalog

The open-weight wave

Sizes + architectures span 4 orders of magnitude

Multilingual + specialist forks

Why this catalog matters for verification

Defined terms (5)

All claims in this topic (48)

Related

Other topic hubs

Concept pillars

Framework integrations