Tag
tokenization
3 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Byte-Pair Encoding (BPE) for Neural Machine Translation introduced in paper: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015).
e942c93d70a4dab2 · 2 sources · 100% confidence
Byte-Pair Encoding (BPE) for NMT introduced in paper: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015).
aede848e23c8de8e · 2 sources · 100% confidence
SentencePiece tokenizer introduced in paper: SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Kudo & Richardson, 2018).
0d47bb8eb637a2e4 · 2 sources · 100% confidence