Facebook Detr Resnet50
The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100.
The model is trained using a "bipartite matching loss": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.
@article{DBLP:journals/corr/abs-2005-12872,
author = {Nicolas Carion and
Francisco Massa and
Gabriel Synnaeve and
Nicolas Usunier and
Alexander Kirillov and
Sergey Zagoruyko},
title = {End-to-End Object Detection with Transformers},
journal = {CoRR},
volume = {abs/2005.12872},
year = {2020},
url = {https://arxiv.org/abs/2005.12872},
archivePrefix = {arXiv},
eprint = {2005.12872},
timestamp = {Thu, 28 May 2020 17:38:09 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2005-12872.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Select File
Drag Files or Click to BrowseURL
Recommended Models
Table Transformer Detection
DETR Document Table Detection

License Plate Blurring

Yolos Tiny

Face Analyzer

YoloV5 License Plate Detection

Parts of Car

Signature Detection
Facebook Detr Resnet50
