Definition
The integral image — also called a summed-area table — is a precomputed two-dimensional prefix-sum array derived from a grayscale input image. Every entry stores the sum of all pixel values above and to the left of that position, inclusive. Any axis-aligned rectangular sum over the original image can then be retrieved using exactly four array reads, in time independent of the rectangle's area.
Given a grayscale image of width and height , the integral image is defined at every pixel by
For a rectangle with corners (top-left), (top-right), (bottom-left), (bottom-right), the pixel sum over the rectangle is
The evaluation cost is regardless of rectangle dimensions.
Origin and naming
The data structure originates not in computer vision but in computer graphics. Frank Crow introduced it in 1984 as the summed-area table, a texture-map antialiasing technique: it let a renderer average a texture over an arbitrary axis-aligned rectangle in constant time, lifting the square-only restriction of mip-mapping so the filter footprint could match a perspective-projected pixel. Viola and Jones reintroduced the identical construction to computer vision in 2001 under the name integral image, using it to evaluate Haar-like features for face detection at every scale and position in four reads. Every later computer-vision use — SURF's box filters, BRIEF's patch smoothing, sliding-window detectors — descends from that 2001 reintroduction, but the table itself, and the four-corner sum, are Crow's.
Mathematical Description
Construction by two-pass recurrence
The integral image is computed in a single scan over the input using two scalar recurrences. A column accumulator holds the running column sum,
which is then accumulated horizontally,
These two recurrences traverse the image once in row-major order, writing values with arithmetic per pixel. No temporary buffer larger than one column is required.
Constant-time rectangle sum
Given the integral image, the pixel sum over a rectangle spanning columns and rows is computed with four reads — , , , — and the combination . The cost is independent of and , which is what makes box-filter convolutions at multiple scales practical on a CPU.
Rotated integral images
Sums over -tilted rectangles — required by SURF and the rotated-rectangle Haar features — use a rotated integral image that accumulates pixel values along a diagonal,
A sum over a -rotated rectangle then reduces to a small fixed number of reads of , keeping tilted features as cheap as axis-aligned ones.
Higher-order tables
A squared integral image accumulates using the same two-pass recurrence. With first and squared rectangle sums and , the pixel variance over any rectangle is recovered in ,
which Viola and Jones use to variance-normalise each detection sub-window before feature evaluation.
Numerical Concerns
Accumulator magnitude and integer overflow. The bottom-right entry of the integral image equals the sum of all pixels. For a image with 8-bit pixels the maximum is ; a 32-bit unsigned accumulator overflows once , i.e. for images larger than roughly pixels. The squared integral image reaches and overflows a 32-bit accumulator at roughly pixels — so squared tables require 64-bit accumulators at any practical resolution.
Loss of significance in rectangle sums. The four-tap formula subtracts two large, nearly equal quantities when the rectangle sits near the bottom-right corner. In floating-point implementations this causes catastrophic cancellation when ; integer implementations are exact provided the accumulator is wide enough. Floating-point integral images should use 64-bit doubles when sub-pixel precision is required.
Variance-normalisation instability. Computing from two integral images subtracts two large squared quantities; for a nearly uniform sub-window and rounding can drive the argument slightly negative, producing a NaN. Implementations should clamp the argument to zero before the square root.
Border initialisation. The recurrence requires the sentinels and . Implementations typically pad the array with one zero row and column and offset all lookups by one; a missing border silently corrupts every rectangle sum that touches the image edge.
Cache behaviour. The four corner reads span two distinct cache lines for all but single-row rectangles. For feature detection with many small rectangles the access pattern is irregular enough to produce frequent cache misses; tiling construction and evaluation improves throughput.
Where it appears
- viola-jones-detector — the integral image is the enabling primitive: each two-, three-, or four-rectangle Haar feature over a sub-window is retrieved in four reads regardless of rectangle size, and a second squared integral image supplies the sub-window variance used for contrast normalisation. Both tables are computed once per image and reused across all scales and positions.
- surf — replaces continuous Gaussian second-derivative filters with box-filter approximations evaluated on the integral image in per pixel; the box-filter size grows across octaves while the image stays at full resolution, so all filter sizes are equally cheap.
- brief — integral-image box filtering is a fast approximation to the Gaussian patch smoothing that precedes the binary tests, the step that dominates BRIEF's descriptor-computation time.
- orb — reads sub-window mean intensities from an integral image for its steered binary tests, and computes the patch moments behind its intensity-centroid orientation as integral-image sub-window sums.
References
- P. Viola, M. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. IEEE CVPR, 2001.
- H. Bay, T. Tuytelaars, L. Van Gool. SURF: Speeded Up Robust Features. ECCV, 2006.
- M. Calonder, V. Lepetit, C. Strecha, P. Fua. BRIEF: Binary Robust Independent Elementary Features. ECCV, 2010.
- E. Rublee, V. Rabaud, K. Konolige, G. Bradski. ORB: An Efficient Alternative to SIFT or SURF. IEEE ICCV, 2011.
- F. C. Crow. Summed-Area Tables for Texture Mapping. ACM SIGGRAPH, 1984.