Integral Image | VitaVision
Back to atlas

Integral Image

7 min readBeginnerView in graph
Based on
Summed-area tables for texture mapping
Crow · ACM SIGGRAPH Computer Graphics 1984
DOI ↗

Definition

The integral image — also called a summed-area table — is a precomputed two-dimensional prefix-sum array derived from a grayscale input image. Every entry stores the sum of all pixel values above and to the left of that position, inclusive. Any axis-aligned rectangular sum over the original image can then be retrieved using exactly four array reads, in time independent of the rectangle's area.

Definition
Integral image and rectangle sum

Given a grayscale image II of width WW and height HH, the integral image IIII is defined at every pixel (x,y)(x, y) by

II(x,y)=xxyyI(x,y).II(x,\, y) = \sum_{\substack{x' \le x \\ y' \le y}} I(x',\, y').

For a rectangle with corners AA (top-left), BB (top-right), CC (bottom-left), DD (bottom-right), the pixel sum SS over the rectangle is

S=II(D)II(B)II(C)+II(A).S = II(D) - II(B) - II(C) + II(A).

The evaluation cost is O(1)O(1) regardless of rectangle dimensions.

Origin and naming

The data structure originates not in computer vision but in computer graphics. Frank Crow introduced it in 1984 as the summed-area table, a texture-map antialiasing technique: it let a renderer average a texture over an arbitrary axis-aligned rectangle in constant time, lifting the square-only restriction of mip-mapping so the filter footprint could match a perspective-projected pixel. Viola and Jones reintroduced the identical construction to computer vision in 2001 under the name integral image, using it to evaluate Haar-like features for face detection at every scale and position in four reads. Every later computer-vision use — SURF's box filters, BRIEF's patch smoothing, sliding-window detectors — descends from that 2001 reintroduction, but the table itself, and the four-corner sum, are Crow's.

Mathematical Description

Construction by two-pass recurrence

The integral image is computed in a single scan over the input using two scalar recurrences. A column accumulator s(x,y)s(x, y) holds the running column sum,

s(x,y)=s(x,y1)+I(x,y),s(x,1)=0,s(x,\, y) = s(x,\, y-1) + I(x,\, y), \qquad s(x,\, -1) = 0,

which is then accumulated horizontally,

II(x,y)=II(x1,y)+s(x,y),II(1,y)=0.II(x,\, y) = II(x-1,\, y) + s(x,\, y), \qquad II(-1,\, y) = 0.

These two recurrences traverse the image once in row-major order, writing O(WH)O(WH) values with O(1)O(1) arithmetic per pixel. No temporary buffer larger than one column is required.

Constant-time rectangle sum

Given the integral image, the pixel sum over a rectangle spanning columns [x1,x2][x_1, x_2] and rows [y1,y2][y_1, y_2] is computed with four reads — A=II(x11,y11)A = II(x_1-1, y_1-1), B=II(x2,y11)B = II(x_2, y_1-1), C=II(x11,y2)C = II(x_1-1, y_2), D=II(x2,y2)D = II(x_2, y_2) — and the combination S=DBC+AS = D - B - C + A. The cost is independent of (x2x1)(x_2 - x_1) and (y2y1)(y_2 - y_1), which is what makes box-filter convolutions at multiple scales practical on a CPU.

Rotated integral images

Sums over 45°45°-tilted rectangles — required by SURF and the rotated-rectangle Haar features — use a rotated integral image IIRII_R that accumulates pixel values along a diagonal,

IIR(x,y)=xxyyyyI(x,y).II_R(x,\, y) = \sum_{\substack{|x' - x| \le y - y' \\ y' \le y}} I(x',\, y').

A sum over a 45°45°-rotated rectangle then reduces to a small fixed number of reads of IIRII_R, keeping tilted features as cheap as axis-aligned ones.

Higher-order tables

A squared integral image accumulates I(x,y)2I(x', y')^2 using the same two-pass recurrence. With first and squared rectangle sums S1(R)S_1(R) and S2(R)S_2(R), the pixel variance over any rectangle is recovered in O(1)O(1),

Var(R)=S2(R)R(S1(R)R)2,\mathrm{Var}(R) = \frac{S_2(R)}{|R|} - \left(\frac{S_1(R)}{|R|}\right)^2,

which Viola and Jones use to variance-normalise each detection sub-window before feature evaluation.

Numerical Concerns

Accumulator magnitude and integer overflow. The bottom-right entry of the integral image equals the sum of all pixels. For a W×HW \times H image with 8-bit pixels the maximum is 255WH255\,WH; a 32-bit unsigned accumulator overflows once 255WH>2321255\,WH > 2^{32}-1, i.e. for images larger than roughly 4100×41004100 \times 4100 pixels. The squared integral image reaches 2552WH255^2\,WH and overflows a 32-bit accumulator at roughly 660×660660 \times 660 pixels — so squared tables require 64-bit accumulators at any practical resolution.

Loss of significance in rectangle sums. The four-tap formula subtracts two large, nearly equal quantities when the rectangle sits near the bottom-right corner. In floating-point implementations this causes catastrophic cancellation when DB+CD \approx B + C; integer implementations are exact provided the accumulator is wide enough. Floating-point integral images should use 64-bit doubles when sub-pixel precision is required.

Variance-normalisation instability. Computing σ=μ2μ2\sigma = \sqrt{\mu_2 - \mu^2} from two integral images subtracts two large squared quantities; for a nearly uniform sub-window μ2μ2\mu_2 \approx \mu^2 and rounding can drive the argument slightly negative, producing a NaN. Implementations should clamp the argument to zero before the square root.

Border initialisation. The recurrence requires the sentinels II(1,y)=0II(-1, y) = 0 and s(x,1)=0s(x, -1) = 0. Implementations typically pad the array with one zero row and column and offset all lookups by one; a missing border silently corrupts every rectangle sum that touches the image edge.

Cache behaviour. The four corner reads A,B,C,DA, B, C, D span two distinct cache lines for all but single-row rectangles. For feature detection with many small rectangles the access pattern is irregular enough to produce frequent cache misses; tiling construction and evaluation improves throughput.

Where it appears

  • viola-jones-detector — the integral image is the enabling primitive: each two-, three-, or four-rectangle Haar feature over a 24×2424 \times 24 sub-window is retrieved in four reads regardless of rectangle size, and a second squared integral image supplies the sub-window variance used for contrast normalisation. Both tables are computed once per image and reused across all scales and positions.
  • surf — replaces continuous Gaussian second-derivative filters with box-filter approximations Dxx,Dyy,DxyD_{xx}, D_{yy}, D_{xy} evaluated on the integral image in O(1)O(1) per pixel; the box-filter size grows across octaves while the image stays at full resolution, so all filter sizes are equally cheap.
  • brief — integral-image box filtering is a fast approximation to the Gaussian patch smoothing that precedes the binary tests, the step that dominates BRIEF's descriptor-computation time.
  • orb — reads 5×55 \times 5 sub-window mean intensities from an integral image for its steered binary tests, and computes the patch moments behind its intensity-centroid orientation as integral-image sub-window sums.

References

  1. P. Viola, M. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. IEEE CVPR, 2001.
  2. H. Bay, T. Tuytelaars, L. Van Gool. SURF: Speeded Up Robust Features. ECCV, 2006.
  3. M. Calonder, V. Lepetit, C. Strecha, P. Fua. BRIEF: Binary Robust Independent Elementary Features. ECCV, 2010.
  4. E. Rublee, V. Rabaud, K. Konolige, G. Bradski. ORB: An Efficient Alternative to SIFT or SURF. IEEE ICCV, 2011.
  5. F. C. Crow. Summed-Area Tables for Texture Mapping. ACM SIGGRAPH, 1984.