Reflections on Non Maximum Suppression (NMS)

Before NMS and after NMS
SSD and Yolo object detection networks ( from 12) .
select only rectangles above a confidence threshold
sort the thresholded rectangles in descending order
create an empty set of kept rectangle
loop over the sorted thresholded rectangles:
loop over the set of kept rectangles:
compute IOU between the rectangles
if IOU is above IOU threshold break loop
if all IOU are below the IOU threshold add to kept
create a priority queue of rectangles based on their scores
create an empty set of selected rectangles
loop over priority queue :
loop over selected set :
compute IOU between rectangles
if IOU above threshold break loop
if loop did not break add priority queue rectangle to selected set
radix sort rectangles in score descending order(DeviceRadixSort)
flip boxes to get x1<x2 and y1<y2 if necessary
for each box, compute bitmask of other boxes with IOU > threshold (NMSKernel)
build a global bit mask for selected boxes (NMSReduce)
(each thread handles a number of boxes)
make all bits of bitmask 0xFFFFFFF (e.g. all boxes are selected)
loop over all boxes
if the bit corresponding to the box is still 1
Bitwise AND inverse of this thread's box of global mask with bitmask of box
NMS execution time vs number of input boxes.
NMS execution time vs number of distinct ( non-overlapping) boxes, with input boxes fixed at 54000.
radix sort rectangles in score descending order(CUB DeviceRadixSort)
run the kernel as follows
for each rectangle
calculate IOU with each rectangle lower in score
if IOU is above threshold mark by setting -1 in the index
extract the non -1 indices (CUB DeviceSelect::If)
  1. OpenCV NMS
  2. tf.image.non_max_suppression
  3. Tensorflow 1.15 NMS — CPU
  4. Improving Object Detection With One Line of Code
  5. Tensorflow 1.15 NMS — GPU
  6. Learning non-maximum suppression,
  7. MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors,
  8. non_max_suppression GPU version is 3x slower than CPU version in TF 1.15,
  10. An efficient end-to-end object detection pipeline on GPU using CUDA,
  11. Code to experiment with NMS ops( or other ops) in Tensorflow 1.x., .
  12. SSD: Single Shot MultiBox Detector, .
  13. Daedalus: Breaking Non-Maximum Suppression in Object Detection via Adversarial Examples,




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How I reached around 82% of accuracy on Andrew Ng’s Data-Centric AI competition using TornadoAi &…

ET-USB: Transformer-Based Users’ Sequential Behavior Modeling

Future of Cyber Security for Connected and Autonomous Vehicles

Preprocessing Data: Feature Scaling

Transfer Learning With Yolo V3, Darknet, and Google Colab

Eurybia: MAIF releases a new open-source solution for quality IA models in production

Pickling Python Objects for Future Use

Deep Social is MORE accurate at gender and age recognition than Microsoft & Amazon!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Subrata Goswami

Subrata Goswami

More from Medium

LIME(Local Interpretable Model-Agnostic Explanations) for explaining machine learning models

Model Soups for Higher Performing Models

The Rise of GPUs in the AI Universe

Flow-based Generative Models