Faster R-CNN: Down the rabbit hole of modern object detection
tryolabs.comDoes anyone try to get accurate bounding boxes (rotation, correct angle) with these object detection models? Or does the greatly harden the problem?
That’s exactly what Faster-RCNN does. Edit: Except for rotation — they are axis aligned bounding boxes.
Mask-RCNN (more recent) takes it a step further and also generates a per-object pixel segmentation mask, which is even better than a bounding box obviously. For that reason, Mask-RCNN is much more exciting to me, and incredibly impressive if you see examples showing what it can do.
That said, “under the hood” of Mask-RCNN are still axis aligned 2D bounding boxes for every object (and this occasionally creates artifacts when a box is erroneously too small and crops off part of an object). IMO we need to somehow get away from these AABBs, but right now methods that use them simply work the best.
Object detection is an interesting failure for deep learning. Systems such as these perform well but whenever you have something like non max suppression at the end you are bound to get hard to fix errors. I'm more optimistic about deep mask and similar pixel wise approaches as well as using RNNs to generate a list of objects from an image.
I saw this today: https://github.com/facebookresearch/Detectron
wansn't R-CNN already superseded by YOLO[1]? didn't read the article, but no mention of it to compare itself to, so seems outdated maybe.
anyone had the time to dig deeper into this?
Tradeoffs: RCNN has better accuracy. YOLO is faster.
rcnn is two steps and ssd is single step.
Take a look at SSD instead; it seems to be more precise than YOLO and a bit faster. R-CNN variants are usually 10x slower than either of these two.
Is this what they use for self driving cars?
Faster R-CNN gives you only like 5fps on high-end GPU, so answer is no.