Vision Transformer (ViT) introduced in paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al., 2020).
Subject
Vision Transformer (ViT)
Predicate
introduced_in_paper
Object
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al., 2020)
Primary source · preprint · 2020-10-22
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — arXiv (Dosovitskiy et al., Google Research)