Goal
Estimate a dense, piecewise-smooth optical flow field at every pixel site , or recover one or more parametric (affine) motion models from a two-frame image sequence, in the presence of multiple competing motions. Input: two consecutive grayscale frames and , where . Output: either (a) a dense flow field with flow in pixels per frame, or (b) a set of affine parameter vectors together with a per-pixel outlier indicator. Both outputs are obtained by replacing the quadratic (L2) penalty in the data conservation and spatial smoothness terms with a redescending robust M-estimator, solved via Iteratively Reweighted Least Squares (IRLS) within a Graduated Non-Convexity (GNC) continuation schedule.
Algorithm
Let , , denote the partial spatial and temporal image derivatives at site . Let , denote the horizontal and vertical flow at . Let denote the 4-connected spatial neighbourhood of . Let weight the smoothness term relative to the data term. Let denote a robust penalty function with scale parameter , and its influence function.
Two penalty functions are used throughout.
The IRLS weight is . The influence function is redescending: it equals zero when , defining the outlier threshold .
The IRLS weight is . The outlier threshold is .
Piecewise-smooth (dense) variant
The robust data energy penalises violation of the brightness-constancy constraint at each pixel.
where is the chosen M-estimator applied to the first-order brightness-constancy residual.
The robust smoothness energy penalises large flow differences between neighbouring sites.
where is the chosen M-estimator for the inter-pixel flow difference.
Minimisation proceeds by SOR within a GNC continuation schedule. The SOR update for at iteration is
where is the overrelaxation parameter and is an upper bound on . The partial derivative expands as
The optimal overrelaxation parameter for an image is
To avoid local minima from the non-convex , the scale is initialised large enough that the objective is convex. For the Lorentzian, convexity holds when for all , which requires , where is the maximum expected residual magnitude. is then reduced linearly toward the target scale over 6 continuation stages. A 3-level Gaussian image pyramid is used; at each level, 20 SOR iterations are performed per continuation stage and the flow estimate is upsampled as initialisation for the finer level. Reference parameter ranges from §6: from to ; from to ; , .
- Build an -level Gaussian image pyramid for each frame.
- Initialise at the coarsest level.
- For each pyramid level from coarsest to finest: warp toward using the current flow (backward warp); recompute , , from the warped frame pair; set ; for each of GNC stages run SOR iterations updating and at every site, then reduce linearly toward the target; upsample to the next finer level.
- Classify site as outlier if .
Parametric (affine) variant
The 6-parameter affine flow model at pixel is
The robust parametric energy over a region is
Minimisation proceeds by SOR over the six parameters with the GNC -schedule. The SOR update for each is
where bounds the second derivative. The overrelaxation parameter is fixed at and the GNC schedule reduces geometrically: .
After recovering the dominant motion parameters , pixels where are collected as outliers. The residual set is re-submitted to the same robust regression to recover a second motion; the process repeats until no consistent motion remains in the residuals.
- Build a 4-level Gaussian image pyramid for each frame.
- Initialise at the coarsest level.
- For each pyramid level from coarsest to finest: set ; repeat until or iterations elapsed — for each compute and apply the SOR update, then reduce by ; upsample to the next finer level.
- Collect outlier pixels where .
- If the outlier set is large enough, submit it to step 2 to recover .
- Repeat step 5 until no consistent motion remains.
Implementation
The per-pixel IRLS update for the dense variant in Rust, implementing the data-plus-smoothness gradient and the SOR step. The snippet covers one horizontal-component update; the vertical component follows the same pattern with in place of .
/// Lorentzian IRLS weight: w(x, σ) = 2σ² / (2σ² + x²).
#[inline]
fn w_lorentzian(x: f32, sigma: f32) -> f32 {
let s2 = 2.0 * sigma * sigma;
s2 / (s2 + x * x)
}
/// One SOR update for u_s (horizontal flow at site s).
fn sor_update_u(
r: f32, // brightness-constancy residual I_x·u_s + I_y·v_s + I_t
ix: f32, // spatial gradient I_x at s
neighbours: &[f32],// (u_n) for n ∈ N(s); length 4 on a 4-connected grid
u_s: f32,
sigma_d: f32,
sigma_s: f32,
lambda: f32,
omega: f32,
t_upper: f32, // upper bound on ∂²E/∂u_s²
) -> f32 {
// ψ_D(r, σ_D) · I_x, with ψ = w · x.
let grad_data = ix * (w_lorentzian(r, sigma_d) * r);
// λ · Σ_{n∈N(s)} ψ_S(u_s − u_n, σ_S).
let grad_smooth: f32 = lambda * neighbours.iter().map(|&u_n| {
let diff = u_s - u_n;
w_lorentzian(diff, sigma_s) * diff
}).sum::<f32>();
u_s - (omega / t_upper) * (grad_data + grad_smooth)
}
The vectorised inner loop in Python:
import numpy as np
def w_lorentzian(x, sigma):
s2 = 2.0 * sigma ** 2
return s2 / (s2 + x ** 2)
def sor_update_u(r, ix, u, u_nbrs, sigma_d, sigma_s, lam, omega, t_upper):
grad_data = ix * (w_lorentzian(r, sigma_d) * r)
diffs = u[None] - u_nbrs # (4, H, W)
grad_smooth = lam * (w_lorentzian(diffs, sigma_s) * diffs).sum(axis=0)
return u - (omega / t_upper) * (grad_data + grad_smooth)
Remarks
- work per pyramid level for the dense variant, where is the number of GNC continuation stages, is the SOR iteration count per stage, and is the number of pixels. Reference values are , for the dense variant, and per pyramid level for the parametric variant.
- Both the data term and the smoothness term must use robust penalties. Robustifying the smoothness term alone — the approach taken by classical line-process methods — leaves the data term quadratic and can increase flow error relative to the fully-quadratic baseline.
- The Lorentzian outlier threshold is , derived from where . The Geman-McClure threshold is . Sites where the final residual magnitude exceeds are classified as motion outliers; this yields an explicit outlier map as a by-product of estimation.
- Local minima in the non-convex energy are avoided via the GNC -schedule: is initialised at (Lorentzian) or (Geman-McClure) so the objective is globally convex, then lowered until the true non-convex penalty is recovered. A geometric schedule is used for the parametric variant; a linear ramp is preferred for the dense variant.
- The Gaussian pyramid handles displacements larger than one pixel per frame. Motions exceeding the coarse-level resolution cause hard failure; transparency that does not yield a dominant motion in a local region and global illumination changes are not handled by the brightness-constancy data term.
- The redescending M-estimator is equivalent to an analog (continuous-valued) outlier process: an explicit line-process layer on the flow field is therefore unnecessary when a redescending is used.
References
- M. J. Black and P. Anandan. The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields. Computer Vision and Image Understanding, 63(1)–104, 1996. doi.1006/cviu.1996.0006
- B. K. P. Horn and B. G. Schunck. Determining Optical Flow. Artificial Intelligence, 17–203, 1981. hdl.1/6337
- B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. Proc. IJCAI, 1981. pdf