Tag
inference
7 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
vLLM introduced in: Kwon et al. 2023 — high-throughput LLM serving via PagedAttention.
468a9e2c047d8f2f · 2 sources · 100% confidence
llama.cpp publicly released on: 2023-03-10 by Georgi Gerganov.
2c6ddc094019890c · 2 sources · 100% confidence
GPTQ introduced in: Frantar et al. 2022 — accurate post-training quantization for GPT models.
a9ab1ec12062f7ae · 2 sources · 100% confidence
Triton inference server publicly released on: 2018-11 by NVIDIA — formerly TensorRT Inference Server.
78ec1ceed08a221c · 2 sources · 100% confidence
SGLang introduced in: Zheng et al. 2024 — efficient LLM serving with structured outputs.
4244c11611a72550 · 2 sources · 100% confidence
Groq LPU publicly released on: 2024-02-19 by Groq — language processing unit inference.
6e19ed543cadbcdd · 2 sources · 95% confidence
Speculative decoding introduced in: Leviathan, Kalman, Matias 2023 — Google Research.
6cdc7730bf41bb3d · 2 sources · 100% confidence