Tag
foundational
38 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Transformer architecture introduced in paper: Attention Is All You Need (Vaswani et al., 2017).
ad17e76a8baad7a1 · 3 sources · 100% confidence
Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
67866330cd60e54d · 3 sources · 100% confidence
Retrieval-Augmented Generation (RAG) introduced in paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020).
d15057ced937a103 · 2 sources · 100% confidence
Low-Rank Adaptation (LoRA) introduced in paper: LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021).
d7b97d1b93d8d8bc · 2 sources · 100% confidence
Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).
a3e691683a4577af · 2 sources · 100% confidence
BERT (Bidirectional Encoder Representations from Transformers) introduced in paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018).
4c1ee70007dc89c1 · 2 sources · 100% confidence
GPT-2 introduced in paper: Language Models are Unsupervised Multitask Learners (Radford et al., 2019).
859551dc078c46f8 · 2 sources · 100% confidence
ResNet (Residual Networks) introduced in paper: Deep Residual Learning for Image Recognition (He et al., 2015).
4f55f77c4bfb316e · 2 sources · 100% confidence
T5 (Text-to-Text Transfer Transformer) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019).
ef28341c3b308737 · 2 sources · 100% confidence
Sparsely-Gated Mixture-of-Experts (MoE) introduced in paper: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (Shazeer et al., 2017).
2d6d7f61f1db6493 · 1 source · 100% confidence
Switch Transformer introduced in paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Fedus et al., 2021).
3d9c14b9379038c9 · 2 sources · 100% confidence
Chinchilla scaling laws introduced in paper: Training Compute-Optimal Large Language Models (Hoffmann et al., 2022).
8befcae6bce01a95 · 2 sources · 100% confidence
Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017).
00f224e1ccc158ef · 2 sources · 100% confidence
Mamba state-space model introduced in paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Gu, Dao, 2023).
3518f8aa40cb0d36 · 2 sources · 100% confidence
Chain-of-Thought prompting introduced in paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022).
3af924da138ff84c · 2 sources · 100% confidence
Adam optimizer introduced in paper: Adam: A Method for Stochastic Optimization (Kingma, Ba, 2014).
dffbe905003cc581 · 2 sources · 100% confidence
AlexNet introduced in paper: ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky, Sutskever, Hinton, 2012).
98b6e774be89d967 · 2 sources · 100% confidence
ImageNet dataset introduced in paper: ImageNet: A Large-Scale Hierarchical Image Database (Deng et al., 2009).
045e628def62181d · 2 sources · 100% confidence
Vision Transformer (ViT) introduced in paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al., 2020).
d3681b0981e0b700 · 2 sources · 100% confidence
Generative Adversarial Networks (GANs) introduced in paper: Generative Adversarial Networks (Goodfellow et al., 2014).
5b0c0612bd9e55b0 · 2 sources · 100% confidence
Variational Autoencoder (VAE) introduced in paper: Auto-Encoding Variational Bayes (Kingma, Welling, 2013).
62789e45973ab631 · 2 sources · 100% confidence
Denoising Diffusion Probabilistic Models (DDPM) introduced in paper: Denoising Diffusion Probabilistic Models (Ho, Jain, Abbeel, 2020).
e700f81fff6f38c7 · 2 sources · 100% confidence
Word2Vec introduced in paper: Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013).
4978f76d228a3db1 · 2 sources · 100% confidence
Byte-Pair Encoding (BPE) for Neural Machine Translation introduced in paper: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015).
e942c93d70a4dab2 · 2 sources · 100% confidence
ReAct (Reasoning + Acting) introduced in paper: ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022).
fceea64fa7d04d3a · 2 sources · 100% confidence
LoRA (Low-Rank Adaptation) introduced in paper: LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021).
f191b2876790dc6e · 2 sources · 100% confidence
QLoRA introduced in paper: QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023).
767cbe41c961be1a · 2 sources · 100% confidence
Rotary Position Embedding (RoPE) introduced in paper: RoFormer: Enhanced Transformer with Rotary Position Embedding (Su et al., 2021).
f8d64457ba9fd35b · 2 sources · 100% confidence
Byte-Pair Encoding (BPE) for NMT introduced in paper: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015).
aede848e23c8de8e · 2 sources · 100% confidence
SentencePiece tokenizer introduced in paper: SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Kudo & Richardson, 2018).
0d47bb8eb637a2e4 · 2 sources · 100% confidence
CLIP introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
bcdef949cc6d3644 · 2 sources · 100% confidence
ELMo (Embeddings from Language Models) introduced in paper: Deep contextualized word representations (Peters et al., 2018).
ee150c6e44364a3d · 2 sources · 100% confidence
Latent Diffusion Models (LDM) introduced in paper: High-Resolution Image Synthesis with Latent Diffusion Models (Rombach et al., 2021).
1aacbf0bf9248dc7 · 2 sources · 100% confidence
ELECTRA introduced in paper: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (Clark et al., 2020).
2f9c79357e9d4da9 · 2 sources · 100% confidence
GPT-3 introduced in paper: Language Models are Few-Shot Learners (Brown et al., 2020).
7d3e6a39b1656571 · 2 sources · 100% confidence
Codex introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021).
79be9b25cd64f250 · 2 sources · 100% confidence
SuperGLUE benchmark introduced in paper: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (Wang et al., 2019).
1a1e87145608c91a · 2 sources · 100% confidence
GLUE benchmark introduced in paper: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (Wang et al., 2018).
aa113b5e61d5c214 · 2 sources · 100% confidence