EleViT: exploiting element-wise products for designing efficient and lightweight vision transformers

Uzair Shah, Jens Schneider, Giovanni Pintore, Enrico Gobbetti, Mahmood Alzubaidi, Mowafa Househ, Marco Agus

Proc. T4V - IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - 2024

Télécharger la publication :

We introduce EleViT, a novel vision transformer optimized for image processing tasks. Aligning with the trend towards sustainable computing, EleViT addresses the need for lightweight and fast models without compromising performance by redefining the multihead attention mechanism by primarily using element-wise products instead of traditional matrix multiplication. This modification preserves attention capabilities, while enabling multiple multihead attention blocks within a convolutional projection framework, resulting in a model with fewer parameters and improved efficiency in training and inference, especially for moderately complex datasets. Benchmarks against state-of-theart vision transformers showcase competitive performance on low-data regime datasets like CIFAR-10, CIFAR-100, and Tiny-ImageNet-200.

Images et films

Références BibTex

@InProceedings{SSPGAHA24,
  author       = {Shah, U. and Schneider, J. and Pintore, G. and Gobbetti, E. and Alzubaidi, M. and Househ, M. and Agus, M.},
  title        = {EleViT: exploiting element-wise products for designing efficient and lightweight vision transformers},
  booktitle    = {Proc. T4V - IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year         = {2024},
  note         = {To appear},
  keywords     = {image processing, vision transformers},
  url          = {https://publications.crs4.it/pubdocs/2024/SSPGAHA24},
}

Autres publications dans la base

» Uzair Shah
» Jens Schneider
» Giovanni Pintore
» Enrico Gobbetti
» Mahmood Alzubaidi
» Mowafa Househ
» Marco Agus