Definition
Non-maximum suppression converts a dense response map — or a set of candidate detections scored by confidence — into a sparse set of locally maximal responses by discarding any element that is not the strongest in its neighbourhood.
Input. A response function on a domain (a pixel grid, a continuous gradient field, or a finite set of scored bounding boxes), together with a neighbourhood relation (a grid window, a 1-D directional interval, or a pairwise-overlap measure).
Suppression rule. Retain element if and only if
Output. The subset of whose members survive the rule; each surviving element is a local maximum of with respect to .
The neighbourhood relation determines what "local" means. Three operationally distinct instantiations appear across detectors: a square window on a 2-D response map, used by corner detectors; a 1-D interval along the gradient direction, used by edge detectors; and an overlap-based set, used by object detectors. All three satisfy the single suppression rule above.
Mathematical Description
Grid / window suppression on a response map
Let be a response map on the pixel grid. For an window centred at pixel , the neighbourhood is
and is retained iff for all . Harris corner detection applies this over an 8-connected neighbourhood () to the signed response : only pixels with that are local maxima are kept as corners. FAST corner detection applies the same rule to its corner score — the largest threshold for which the pixel still passes the segment test — to resolve the adjacent-corner clustering the segment test alone cannot suppress: without it, multiple pixels around one geometric corner all pass the criterion.
Gradient-direction (1-D) suppression on an edge map
Canny's edge detector suppresses along the gradient direction rather than over a 2-D window, thinning the gradient ridge to a one-pixel-wide edge map. At each pixel the unit gradient direction defines a 1-D neighbourhood — the two pixels immediately adjacent to along — and is retained iff
This is equivalent to locating zero-crossings of the directional second derivative of the smoothed image along . Because is continuous and the image discrete, is typically quantised to one of four or eight cardinal directions and the two neighbours compared directly; sub-pixel implementations bilinearly interpolate the gradient-magnitude map along .
Greedy overlap-based suppression on detections
Object detectors produce candidate boxes with scores . Greedy suppression proceeds: sort by score descending; move the highest-scoring box to the kept set; discard every remaining box with
repeat until none remain. The procedure is a heuristic — not a globally optimal non-overlapping subset — and runs in in the number of candidates. YOLO applies it per class at inference time and reports a 2–3% mAP gain from this step on PASCAL VOC 2007.
Numerical Concerns
Window size vs feature density. In grid suppression a window too large suppresses genuine nearby features; one too small retains clusters of responses from the same physical feature. The neighbourhood used by corner detectors is aggressive but still admits multiple corners from a single junction when the response map has a broad plateau.
Tie-breaking. The rule retains a pixel when its score is greater than or equal to every neighbour, so ties retain both pixels. Implementations break ties by a coordinate convention or by using strict inequality, which keeps only the first pixel of a tied set.
Interpolation in gradient-direction suppression. When falls between cardinal directions, quantising it introduces a localisation bias of up to (8-direction quantisation) proportional to edge curvature; bilinear interpolation of the gradient-magnitude map reduces the bias to sub-pixel levels at extra cost.
IoU threshold and the precision/recall trade-off. A lower suppresses more boxes — higher precision, lower recall in crowded scenes; a higher threshold retains duplicates. PASCAL VOC practice uses ; benchmarks that penalise duplicates more heavily push toward lower thresholds or soft-suppression variants.
Plateau handling. Broad flat maxima — common when the integration window is large relative to feature spacing — produce plateaus of equal maximal score. Grid suppression with retains all plateau pixels; with it retains none. Neither is correct; morphological thinning after suppression, or a strict criterion with a small additive margin, is the usual mitigation.
Border effects. Near image boundaries the window extends outside the image. Zero-padding the response map suppresses responses within pixels of the border; mirror- or replicate-padding avoids this.
Where it appears
Non-maximum suppression is a post-processing step shared across gradient-based, intensity-comparison, and learned detectors.
- canny-edge-detector — gradient-direction (1-D) suppression on the gradient-magnitude map, thinning the response ridge to one pixel wide; the only registered page where the neighbourhood follows a continuous direction.
- harris-corner-detector — 8-connected grid suppression on the signed Harris response; retained corner pixels satisfy and are local maxima.
- shi-tomasi-corner-detector — identical grid suppression on ; because that response is non-negative everywhere, no sign gating is needed.
- fast-corner-detector — grid suppression on the corner score ; the mechanism that resolves the adjacent-corner clustering produced by the segment test.
- loy-fast-radial-symmetry — grid suppression on the radial-symmetry accumulator, retaining local maxima at the expected radius of the symmetric structure.
- yolo-v1 — greedy IoU-based suppression applied per class to the bounding-box candidates produced per image.
References
- J. Canny. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6)–698, 1986.
- C. Harris, M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988.
- E. Rosten, T. Drummond. Machine Learning for High-Speed Corner Detection. ECCV, 2006.
- J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You Only Look Once: Unified, Real-Time Object Detection. IEEE CVPR, 2016.
- J. Shi, C. Tomasi. Good Features to Track. IEEE CVPR, 1994.