Definition
The image gradient at a pixel is the 2-vector of partial derivatives of the image intensity function :
It points in the direction of steepest ascent of and has magnitude equal to the rate of change in that direction. Because is defined on a discrete pixel grid, is computed by convolving with a discrete derivative kernel rather than by analytic differentiation.
The gradient is the foundational quantity in image analysis: edges, corners, texture descriptors, optical flow, and calibration target detectors are all built on and . It is not itself a feature; it is the raw material from which features are constructed.
Mathematical Description
Discrete derivative kernels
On a discrete image , differentiation in the -direction is approximated by a finite-difference kernel convolved row-wise. The three standard choices trade isotropy against noise sensitivity:
The simplest approximation; one-pixel support.
Symmetric; better approximation of the derivative at the sample point. Not separable as a 2-D kernel but used column- or row-wise.
Combines differentiation in one axis with smoothing in the perpendicular axis. Separable: where is a binomial smoother.
The Scharr kernel is a 3×3 optimised variant of Sobel with improved rotational isotropy. The Prewitt kernel is the same structure with uniform row weights instead of .
Gradient of a Gaussian
Smoothing before differentiation is the standard practice for noisy images. Let denote the Gaussian with standard deviation . By the commutativity of convolution and differentiation:
so computing the smoothed gradient is equivalent to convolving with the derivative of a Gaussian. The parameter sets the spatial scale at which structure is detected: small resolves fine structure but amplifies noise; large suppresses noise but smears thin edges.
Gradient magnitude and direction
The Euclidean norm of the gradient vector; measures edge strength.
The angle of steepest ascent, measured from the positive -axis.
For edge detection and oriented descriptor computation the gradient direction is taken modulo (unsigned orientation), since an edge gradient points perpendicular to the edge and the sign depends on which side is brighter. For optical flow and other vector-field applications the full range is retained.
Separability and the structure tensor
The outer product is a rank-1 matrix whose entries are , , and . Summing this outer product over a local window produces the structure tensor, which encodes the dominant gradient orientations in the neighbourhood. Every corner detector based on the structure tensor — Harris, Shi-Tomasi — is directly built from and .
Numerical Concerns
Discretization error. No 3×3 kernel exactly recovers the continuous derivative of the underlying scene radiance. The Sobel kernel has a flat frequency response over low frequencies but rolls off at high frequencies. The Scharr kernel minimises the angular error in the frequency domain. The central-difference kernel has better theoretical accuracy but no built-in smoothing.
Pre-smoothing is mandatory before differentiation. Differentiating a raw image amplifies high-frequency noise quadratically in the derivative power spectrum. In practice, computing with pixels is the minimum viable preprocessing. Skipping pre-smoothing produces gradient maps dominated by sensor noise rather than scene structure.
Scale selection. The choice of is a free parameter. Gradient-based detectors are only consistent across images if the same is used. Multi-scale detectors (scale-space methods) compute at several values of and select the response scale that maximizes a scale-normalized measure.
Border handling. Convolution is undefined at pixel locations within of the image boundary (for a kernel of width ). Common strategies are: replicate the border pixel (constant extrapolation), reflect the image (symmetric extension), or zero-pad. Each choice affects the gradient values in a border strip of width equal to the kernel radius.
Sub-pixel interpretation. Even though and are computed at integer pixel positions, they are used in downstream algorithms (e.g., Lucas-Kanade optical flow, subpixel corner refinement) as if they represent the derivative of a continuous function interpolated from the discrete values. The interpolation model is implicit and depends on the kernel used.
Dynamic range and normalization. The Sobel kernel output for an 8-bit image spans roughly before any normalization. Implementations that accumulate in integer arithmetic must use 32-bit accumulators to avoid overflow when forming the structure tensor.
Numerical accuracy of atan2. Computing the gradient direction via atan2(Iy, Ix) is well-defined except at , where the gradient is undefined. Downstream algorithms that bin gradient orientations (e.g., SIFT descriptor histograms) must guard against the zero-magnitude case.
Where it appears
The image gradient is the lowest-level quantity on which feature detection and image analysis are built. Nearly every algorithm on this site that operates on raw pixel data computes as its first step.
- harris-corner-detector — builds the structure tensor from , , ; the Harris response is a function of these gradient products.
- shi-tomasi-corner-detector — uses the identical gradient-based structure tensor; replaces the Harris response with the minimum eigenvalue .
- chess-corners — the ChESS detector samples gradient-derived intensity contrasts on a ring pattern; gradient orientation is used to compute the dominant direction.
- fast-corner-detector — does not use gradients directly; pixel-intensity comparisons on a circle substitute for gradient computation, which is one reason FAST is faster than Harris.
- pyramidal-blur-aware-xcorner — operates on an image pyramid, computing gradients at each pyramid level; scale selection is driven by gradient-based saddle-point measures.
- loy-fast-radial-symmetry — votes along the gradient orientation at each pixel; positively- and negatively-affected pixels at distance accumulate magnitude and orientation contributions, yielding a symmetry-contribution map per radius.
- sift — gradient magnitude and orientation are the fundamental inputs to both orientation assignment (36-bin histogram, ) and descriptor construction (4×4 array of 8-bin histograms; 128-D total). One of the most cited downstream consumers of image-gradient computation.
- lucas-kanade — enters the registration update in two roles: as the per-pixel rate of intensity change with respect to displacement (linearising the photometric residual) and as the outer-product sum that forms the coefficient matrix of the per-iteration normal equation.
- horn-schunck — extends the gradient to the temporal axis: is estimated as the average of four first differences taken along parallel edges of a spatiotemporal cube, ensuring all three partial derivatives refer to the same cube-centre in . The brightness-constancy equation links the spatiotemporal gradient to the per-pixel flow velocity.
References
- C. Harris, M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988. Defines the structure tensor directly from image partial derivatives.
- D. Forsyth, J. Ponce. Computer Vision: A Modern Approach. 2nd ed. Prentice Hall, 2011. §5 covers linear filtering and discrete differentiation.
- R. Szeliski. Computer Vision: Algorithms and Applications. 2nd ed. Springer, 2022. §3.2 covers image gradients; §3.3 covers Gaussian blur and scale-space.
- J. Prewitt. "Object Enhancement and Extraction." Picture Processing and Psychopictorics, 1970. Original Prewitt kernel.
- H. Scharr. Optimale Operatoren in der Digitalen Bildverarbeitung. Dissertation, Universität Heidelberg, 2000. Derivation of the Scharr kernel as the isotropy-optimal 3×3 derivative filter.