Goal
Estimate the parametric warp that aligns two grayscale image functions given an initial estimate of the warp parameters. Input: two images on a shared pixel domain, a region of interest , and an initial parameter vector (translation, affine, or affine plus linear photometric adjustment). Output: refined warp parameters in that minimise the squared photometric residual over . The algorithm replaces brute-force search over the parameter space with Newton-Raphson iteration on the first-order Taylor expansion of the residual, reducing the per-iteration cost to one linear solve and lowering the average complexity from to for an image with disparity range.
Algorithm
Let denote two grayscale image functions on pixel domain . Let denote the region of interest. Let denote the warp parameter vector — for pure translation and is the displacement. Let denote the spatial intensity gradient of as a column vector. Let denote a per-pixel weight.
First-order Taylor expansion of the warped image around the current estimate .
Sum of squared brightness differences between the warped and the reference over the region .
Symmetric positive-semidefinite matrix formed by summing the outer product of the gradient over .
Substituting the linearised residual into , differentiating with respect to , and setting the gradient to zero yields the closed-form parameter update:
The matrix is identical to the structure tensor used by Harris and Shi-Tomasi corner detectors; its invertibility is the precondition under which the update is well-defined. In one dimension this reduces to the scalar form (eq 9 of the paper):
The natural least-squares weighting falls out of the derivation and is the form used in practice.
- Compute the spatial gradient on .
- Set and .
- Form the gradient outer-product matrix and the right-hand side .
- Solve .
- Update .
- If or , return .
- Otherwise set and return to step 3.
Coarse-to-fine extension
The single-resolution iteration converges only when lies within the basin of attraction; for the canonical sinusoid the basin is , that is, half a wavelength. Smoothing and with a low-pass kernel widens the basin in proportion to the suppressed bandwidth, at the cost of attenuating fine-scale detail. The standard remedy applies the update at successively finer levels of a Gaussian pyramid: solve at the coarsest level, upsample by two, repeat. The pyramid extends the usable disparity to approximately half the image width.
Affine and photometric generalisations
For an affine warp the linearised residual becomes
In two dimensions this yields a normal equation per iteration, against the system of the pure-translation case. A linear photometric correction can be absorbed into the same quadratic form; jointly minimising over recovers normalised cross-correlation when is ignored, and the norm when are ignored as well.
Implementation
The 2-D translation kernel in Rust, mirroring the procedure block above:
fn lucas_kanade_step(
f: &[f32], g: &[f32], w: usize, h: usize,
roi: (usize, usize, usize, usize), // (x0, y0, x1, y1)
dx: (f32, f32),
) -> (f32, f32) {
let (x0, y0, x1, y1) = roi;
let (mut a11, mut a12, mut a22) = (0.0f32, 0.0f32, 0.0f32);
let (mut b1, mut b2) = (0.0f32, 0.0f32);
let sample = |img: &[f32], x: f32, y: f32| -> f32 {
let xi = x.clamp(0.0, (w - 1) as f32);
let yi = y.clamp(0.0, (h - 1) as f32);
let (x0i, y0i) = (xi as usize, yi as usize);
let (fx, fy) = (xi - x0i as f32, yi - y0i as f32);
let i00 = img[y0i * w + x0i];
let i10 = img[y0i * w + (x0i + 1).min(w - 1)];
let i01 = img[(y0i + 1).min(h - 1) * w + x0i];
let i11 = img[(y0i + 1).min(h - 1) * w + (x0i + 1).min(w - 1)];
(1.0 - fx) * (1.0 - fy) * i00 + fx * (1.0 - fy) * i10
+ (1.0 - fx) * fy * i01 + fx * fy * i11
};
for y in y0..y1 {
for x in x0..x1 {
let fx = 0.5 * (sample(f, x as f32 + 1.0 + dx.0, y as f32 + dx.1)
- sample(f, x as f32 - 1.0 + dx.0, y as f32 + dx.1));
let fy = 0.5 * (sample(f, x as f32 + dx.0, y as f32 + 1.0 + dx.1)
- sample(f, x as f32 + dx.0, y as f32 - 1.0 + dx.1));
let r = g[y * w + x] - sample(f, x as f32 + dx.0, y as f32 + dx.1);
a11 += fx * fx; a12 += fx * fy; a22 += fy * fy;
b1 += fx * r; b2 += fy * r;
}
}
let det = a11 * a22 - a12 * a12;
((a22 * b1 - a12 * b2) / det, (a11 * b2 - a12 * b1) / det)
}
The accumulators a11, a12, a22 form the structure tensor; b1, b2 form the gradient-weighted residual. The final two divisions invert the matrix in closed form and return .
Remarks
- Per-iteration cost is for accumulating the five sums and for the linear solve; for 2-D translation the solve is two divisions. Average iteration count is on an image with disparity range.
- Convergence is guaranteed only when the initial estimate lies within the basin of attraction of the linearisation; the canonical sinusoid bound is half a wavelength of the dominant spatial frequency. Coarse-to-fine pyramids extend the usable disparity range.
- The normal-equation matrix is rank-deficient on 1-D edges (aperture problem) and on textureless regions; both cases produce an ill-defined update. Feature-selection criteria based on the smaller eigenvalue of the same matrix avoid both.
- Brightness consistency between and is assumed; an explicit linear photometric correction can be absorbed into the same quadratic form when exposure or gain differs between the images.
- The translation, affine, and photometric variants share one derivation: linearise the residual, differentiate the squared error, solve the resulting normal equation, iterate.
- The method is the basis of sparse optical-flow trackers and the gradient-based stage of many direct visual-odometry frontends; the dense variant is Horn-Schunck rather than Lucas-Kanade.
- Replacing the squared photometric residual with a redescending M-estimator (Lorentzian or Geman-McClure) and solving by IRLS within a graduated non-convexity schedule yields a robust parametric-motion variant that handles multiple motions and outliers within ; see black-anandan-robust-flow. The same machinery extends to the piecewise-smooth dense flow case.