Aim of this work is to introduce a novel visual object tracking model based on siamese network and vision transformer. Tracking is performed by multiple tokens exploiting the learning and memorization capabilities of the vision transformers. Therefore, the tracking problem is divided into multiple sub-tasks and experiments by using multiple tokens for learning each individual sub-task. This makes possible to learn a robust characterization of the problem with an explainable architecture, understanding the motivation of the choice that the neural network does. This is due to the attention in the transformer that uses the representational capacity of tokens that allows one to identify, simply with respect to different architectures and methodologies, where all the interest is focused. Several experiments are performed on benchmark data proving to be among the most performing trackers compared with the state of the art in explainability, precision, robustness and speed.
Tracking vision transformer with class and regression tokens
Di Nardo E.;Ciaramella A.
2023-01-01
Abstract
Aim of this work is to introduce a novel visual object tracking model based on siamese network and vision transformer. Tracking is performed by multiple tokens exploiting the learning and memorization capabilities of the vision transformers. Therefore, the tracking problem is divided into multiple sub-tasks and experiments by using multiple tokens for learning each individual sub-task. This makes possible to learn a robust characterization of the problem with an explainable architecture, understanding the motivation of the choice that the neural network does. This is due to the attention in the transformer that uses the representational capacity of tokens that allows one to identify, simply with respect to different architectures and methodologies, where all the interest is focused. Several experiments are performed on benchmark data proving to be among the most performing trackers compared with the state of the art in explainability, precision, robustness and speed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.