Lucas-Kanade Image Registration | VitaVision
Back to atlas

Lucas-Kanade Image Registration

7 min readIntermediateView in graph
Based on
An Iterative Image Registration Technique with an Application to Stereo Vision
Lucas, Kanade · IJCAI 1981 1981
Open ↗

Goal

Estimate the parametric warp that aligns two grayscale image functions F,G:ΩRF, G : \Omega \to \mathbb{R} given an initial estimate of the warp parameters. Input: two images on a shared pixel domain, a region of interest RΩR \subset \Omega, and an initial parameter vector (translation, affine, or affine plus linear photometric adjustment). Output: refined warp parameters in Rn\mathbb{R}^n that minimise the squared photometric residual over RR. The algorithm replaces brute-force search over the parameter space with Newton-Raphson iteration on the first-order Taylor expansion of the residual, reducing the per-iteration cost to one n×nn \times n linear solve and lowering the average complexity from O(M2N2)O(M^2 N^2) to O(M2logN)O(M^2 \log N) for an N×NN \times N image with M×MM \times M disparity range.

Algorithm

Let F,G:ΩRF, G : \Omega \to \mathbb{R} denote two grayscale image functions on pixel domain ΩR2\Omega \subset \mathbb{R}^2. Let RΩR \subset \Omega denote the region of interest. Let hRn\mathbf{h} \in \mathbb{R}^n denote the warp parameter vector — for pure translation n=2n = 2 and h\mathbf{h} is the displacement. Let F/x\partial F / \partial \mathbf{x} denote the spatial intensity gradient of FF as a column vector. Let w(x)w(\mathbf{x}) denote a per-pixel weight.

Definition
Linearised residual

First-order Taylor expansion of the warped image around the current estimate hk\mathbf{h}_k.

F(x+h)F(x)+hFx.F(\mathbf{x} + \mathbf{h}) \approx F(\mathbf{x}) + \mathbf{h} \cdot \frac{\partial F}{\partial \mathbf{x}}.
Definition
Squared photometric error

Sum of squared brightness differences between the warped FF and the reference GG over the region RR.

E(h)=xR[F(x+h)G(x)]2.E(\mathbf{h}) = \sum_{\mathbf{x} \in R} \bigl[\,F(\mathbf{x} + \mathbf{h}) - G(\mathbf{x})\,\bigr]^{2}.
Definition
Gradient outer-product matrix

Symmetric positive-semidefinite n×nn \times n matrix formed by summing the outer product of the gradient over RR.

M(R)=xRw(x)(Fx)(Fx)T.M(R) = \sum_{\mathbf{x} \in R} w(\mathbf{x})\,\Bigl(\tfrac{\partial F}{\partial \mathbf{x}}\Bigr)\Bigl(\tfrac{\partial F}{\partial \mathbf{x}}\Bigr)^{T}.

Substituting the linearised residual into EE, differentiating with respect to h\mathbf{h}, and setting the gradient to zero yields the closed-form parameter update:

Δh  =  M(R)1xRw(x)(Fx)[G(x)F(x+hk)].\Delta \mathbf{h} \;=\; M(R)^{-1} \sum_{\mathbf{x} \in R} w(\mathbf{x})\,\Bigl(\tfrac{\partial F}{\partial \mathbf{x}}\Bigr)\,\bigl[\,G(\mathbf{x}) - F(\mathbf{x} + \mathbf{h}_{k})\,\bigr].

The matrix M(R)M(R) is identical to the structure tensor used by Harris and Shi-Tomasi corner detectors; its invertibility is the precondition under which the update is well-defined. In one dimension this reduces to the scalar form (eq 9 of the paper):

h    xRF(x)[G(x)F(x)]xRF(x)2.h \;\approx\; \frac{\sum_{x \in R} F'(x)\,[G(x) - F(x)]}{\sum_{x \in R} F'(x)^{2}}.

The natural least-squares weighting w(x)=F/x2w(\mathbf{x}) = \|\partial F / \partial \mathbf{x}\|^{2} falls out of the derivation and is the form used in practice.

Algorithm
Lucas-Kanade iterative registration
Input: Images F,GF, G on shared domain Ω\Omega; region RΩR \subset \Omega; initial parameter vector h0Rn\mathbf{h}_0 \in \mathbb{R}^n; convergence tolerance ε\varepsilon; iteration cap KK.
Output: Refined parameter vector hRn\mathbf{h} \in \mathbb{R}^n.
  1. Compute the spatial gradient F/x\partial F / \partial \mathbf{x} on RR.
  2. Set k=0k = 0 and hk=h0\mathbf{h}_k = \mathbf{h}_0.
  3. Form the gradient outer-product matrix M(R)M(R) and the right-hand side bk=xRw(x)(F/x)[G(x)F(x+hk)]\mathbf{b}_k = \sum_{\mathbf{x} \in R} w(\mathbf{x})\,(\partial F / \partial \mathbf{x})\,[G(\mathbf{x}) - F(\mathbf{x} + \mathbf{h}_k)].
  4. Solve M(R)Δhk=bkM(R)\,\Delta \mathbf{h}_k = \mathbf{b}_k.
  5. Update hk+1=hk+Δhk\mathbf{h}_{k+1} = \mathbf{h}_k + \Delta \mathbf{h}_k.
  6. If Δhk<ε\|\Delta \mathbf{h}_k\| < \varepsilon or k=Kk = K, return hk+1\mathbf{h}_{k+1}.
  7. Otherwise set kk+1k \leftarrow k + 1 and return to step 3.

Coarse-to-fine extension

The single-resolution iteration converges only when h0\mathbf{h}_0 lies within the basin of attraction; for the canonical sinusoid F(x)=sinxF(x) = \sin x the basin is h0<π|h_0| < \pi, that is, half a wavelength. Smoothing FF and GG with a low-pass kernel widens the basin in proportion to the suppressed bandwidth, at the cost of attenuating fine-scale detail. The standard remedy applies the update at successively finer levels of a Gaussian pyramid: solve at the coarsest level, upsample h\mathbf{h} by two, repeat. The pyramid extends the usable disparity to approximately half the image width.

Affine and photometric generalisations

For an affine warp G(x)=F(xA+h)G(\mathbf{x}) = F(\mathbf{x} A + \mathbf{h}) the linearised residual becomes

F(x(A+ΔA)+(h+Δh))    F(xA+h)+(xΔA+Δh)Fx.F(\mathbf{x}(A + \Delta A) + (\mathbf{h} + \Delta \mathbf{h})) \;\approx\; F(\mathbf{x} A + \mathbf{h}) + (\mathbf{x}\, \Delta A + \Delta \mathbf{h}) \cdot \tfrac{\partial F}{\partial \mathbf{x}}.

In two dimensions this yields a 6×66 \times 6 normal equation per iteration, against the 2×22 \times 2 system of the pure-translation case. A linear photometric correction F=αG+βF = \alpha G + \beta can be absorbed into the same quadratic form; jointly minimising over (h,A,α,β)(\mathbf{h}, A, \alpha, \beta) recovers normalised cross-correlation when AA is ignored, and the L2L_2 norm when α,β\alpha, \beta are ignored as well.

Implementation

The 2-D translation kernel in Rust, mirroring the procedure block above:

fn lucas_kanade_step(
    f: &[f32], g: &[f32], w: usize, h: usize,
    roi: (usize, usize, usize, usize),  // (x0, y0, x1, y1)
    dx: (f32, f32),
) -> (f32, f32) {
    let (x0, y0, x1, y1) = roi;
    let (mut a11, mut a12, mut a22) = (0.0f32, 0.0f32, 0.0f32);
    let (mut b1, mut b2) = (0.0f32, 0.0f32);
    let sample = |img: &[f32], x: f32, y: f32| -> f32 {
        let xi = x.clamp(0.0, (w - 1) as f32);
        let yi = y.clamp(0.0, (h - 1) as f32);
        let (x0i, y0i) = (xi as usize, yi as usize);
        let (fx, fy) = (xi - x0i as f32, yi - y0i as f32);
        let i00 = img[y0i * w + x0i];
        let i10 = img[y0i * w + (x0i + 1).min(w - 1)];
        let i01 = img[(y0i + 1).min(h - 1) * w + x0i];
        let i11 = img[(y0i + 1).min(h - 1) * w + (x0i + 1).min(w - 1)];
        (1.0 - fx) * (1.0 - fy) * i00 + fx * (1.0 - fy) * i10
            + (1.0 - fx) * fy * i01 + fx * fy * i11
    };
    for y in y0..y1 {
        for x in x0..x1 {
            let fx = 0.5 * (sample(f, x as f32 + 1.0 + dx.0, y as f32 + dx.1)
                - sample(f, x as f32 - 1.0 + dx.0, y as f32 + dx.1));
            let fy = 0.5 * (sample(f, x as f32 + dx.0, y as f32 + 1.0 + dx.1)
                - sample(f, x as f32 + dx.0, y as f32 - 1.0 + dx.1));
            let r = g[y * w + x] - sample(f, x as f32 + dx.0, y as f32 + dx.1);
            a11 += fx * fx;  a12 += fx * fy;  a22 += fy * fy;
            b1  += fx * r;   b2  += fy * r;
        }
    }
    let det = a11 * a22 - a12 * a12;
    ((a22 * b1 - a12 * b2) / det, (a11 * b2 - a12 * b1) / det)
}

The accumulators a11, a12, a22 form the 2×22 \times 2 structure tensor; b1, b2 form the gradient-weighted residual. The final two divisions invert the matrix in closed form and return Δh\Delta \mathbf{h}.

Remarks

  • Per-iteration cost is O(R)O(|R|) for accumulating the five sums and O(n3)O(n^3) for the linear solve; for 2-D translation the solve is two divisions. Average iteration count is O(M2logN)O(M^2 \log N) on an N×NN \times N image with M×MM \times M disparity range.
  • Convergence is guaranteed only when the initial estimate lies within the basin of attraction of the linearisation; the canonical sinusoid bound is half a wavelength of the dominant spatial frequency. Coarse-to-fine pyramids extend the usable disparity range.
  • The normal-equation matrix is rank-deficient on 1-D edges (aperture problem) and on textureless regions; both cases produce an ill-defined update. Feature-selection criteria based on the smaller eigenvalue of the same matrix avoid both.
  • Brightness consistency between FF and GG is assumed; an explicit linear photometric correction can be absorbed into the same quadratic form when exposure or gain differs between the images.
  • The translation, affine, and photometric variants share one derivation: linearise the residual, differentiate the squared error, solve the resulting normal equation, iterate.
  • The method is the basis of sparse optical-flow trackers and the gradient-based stage of many direct visual-odometry frontends; the dense variant is Horn-Schunck rather than Lucas-Kanade.
  • Replacing the squared photometric residual with a redescending M-estimator (Lorentzian or Geman-McClure) and solving by IRLS within a graduated non-convexity schedule yields a robust parametric-motion variant that handles multiple motions and outliers within RR; see black-anandan-robust-flow. The same machinery extends to the piecewise-smooth dense flow case.

References

  1. B. D. Lucas, T. Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. IJCAI, 1981. PDF
  2. J. Shi, C. Tomasi. Good Features to Track. CVPR, 1994. PDF

Parallel foundation with

  • Horn-Schunck Optical Flow

    Dense variational vs sparse local LSQ — co-founded optical flow in 1981; pick by problem (dense flow field vs sparse displacement of features).

Extended by

  • Shi-Tomasi Corner Detector

    Shi-Tomasi derives the feature-selection threshold from the conditioning of the LK normal-equation matrix and adds a 6-DOF affine variant with dissimilarity monitoring.

  • Black-Anandan Robust Optical Flow

    Robust M-estimator version of the parametric variant; same machinery as Black-Anandan's piecewise-smooth flow.