Black-Anandan Robust Optical Flow | VitaVision
Back to atlas

Black-Anandan Robust Optical Flow

11 min readAdvancedView in graph
Based on
The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields
Black, Anandan · Computer Vision and Image Understanding 1996
DOI ↗

Goal

Estimate a dense, piecewise-smooth optical flow field (u(s),v(s))(u(s), v(s)) at every pixel site ss, or recover one or more parametric (affine) motion models from a two-frame image sequence, in the presence of multiple competing motions. Input: two consecutive grayscale frames I(x,t)I(\mathbf{x}, t) and I(x,t+1)I(\mathbf{x}, t{+}1), where x=(x,y)\mathbf{x} = (x, y). Output: either (a) a dense flow field with flow in pixels per frame, or (b) a set of affine parameter vectors a\mathbf{a} together with a per-pixel outlier indicator. Both outputs are obtained by replacing the quadratic (L2) penalty in the data conservation and spatial smoothness terms with a redescending robust M-estimator, solved via Iteratively Reweighted Least Squares (IRLS) within a Graduated Non-Convexity (GNC) continuation schedule.

Algorithm

Let IxI_x, IyI_y, ItI_t denote the partial spatial and temporal image derivatives at site ss. Let usu_s, vsv_s denote the horizontal and vertical flow at ss. Let N(s)N(s) denote the 4-connected spatial neighbourhood of ss. Let λ>0\lambda > 0 weight the smoothness term relative to the data term. Let ρ(,σ)\rho(\cdot, \sigma) denote a robust penalty function with scale parameter σ\sigma, and ψ(x,σ)=ρ/x\psi(x, \sigma) = \partial \rho / \partial x its influence function.

Two penalty functions are used throughout.

Definition
Lorentzian M-estimator

ρL(x,σ)=log ⁣(1+12 ⁣(xσ) ⁣2),ψL(x,σ)=2x2σ2+x2.\rho_L(x, \sigma) = \log\!\left(1 + \tfrac{1}{2}\!\left(\frac{x}{\sigma}\right)^{\!2}\right), \qquad \psi_L(x, \sigma) = \frac{2x}{2\sigma^2 + x^2}.

The IRLS weight is wL(x,σ)=ψL(x,σ)/x=2σ2/(2σ2+x2)w_L(x, \sigma) = \psi_L(x, \sigma) / x = 2\sigma^2 / (2\sigma^2 + x^2). The influence function ψL\psi_L is redescending: it equals zero when x=±2σx = \pm\sqrt{2}\,\sigma, defining the outlier threshold τL=2σ\tau_L = \sqrt{2}\,\sigma.

Definition
Geman-McClure M-estimator

ρGM(x,σ)=σx2σ+x2,ψGM(x,σ)=2σx(σ+x2)2.\rho_{GM}(x, \sigma) = \frac{\sigma\,x^2}{\sigma + x^2}, \qquad \psi_{GM}(x, \sigma) = \frac{2\sigma\,x}{(\sigma + x^2)^2}.

The IRLS weight is wGM(x,σ)=2σ/(σ+x2)2w_{GM}(x, \sigma) = 2\sigma / (\sigma + x^2)^2. The outlier threshold is τGM=σ/3\tau_{GM} = \sigma / \sqrt{3}.

Piecewise-smooth (dense) variant

The robust data energy penalises violation of the brightness-constancy constraint at each pixel.

Definition
Robust data energy E_D

ED(u,v)=sρD ⁣(Ixus+Iyvs+It,  σD),E_D(u, v) = \sum_s \rho_D\!\left(I_x u_s + I_y v_s + I_t,\; \sigma_D\right),

where ρD\rho_D is the chosen M-estimator applied to the first-order brightness-constancy residual.

The robust smoothness energy penalises large flow differences between neighbouring sites.

Definition
Robust smoothness energy E_S

ES(u,v)=snN(s) ⁣[ρS ⁣(usun,  σS)+ρS ⁣(vsvn,  σS)],E_S(u, v) = \sum_s \sum_{n \in N(s)} \!\left[\rho_S\!\left(u_s - u_n,\; \sigma_S\right) + \rho_S\!\left(v_s - v_n,\; \sigma_S\right)\right],

where ρS\rho_S is the chosen M-estimator for the inter-pixel flow difference.

Definition
Total robust energy E

E(u,v)=ED(u,v)+λES(u,v).E(u, v) = E_D(u, v) + \lambda\, E_S(u, v).

Minimisation proceeds by SOR within a GNC continuation schedule. The SOR update for usu_s at iteration nn is

usn+1=usnωT(us)Eus,u_s^{n+1} = u_s^n - \frac{\omega}{T(u_s)} \cdot \frac{\partial E}{\partial u_s},

where ω(0,2)\omega \in (0, 2) is the overrelaxation parameter and T(us)T(u_s) is an upper bound on 2E/us2\partial^2 E / \partial u_s^2. The partial derivative expands as

Eus=IxψD(Ixus+Iyvs+It,  σD)+λnN(s)ψS(usun,  σS).\frac{\partial E}{\partial u_s} = I_x\,\psi_D(I_x u_s + I_y v_s + I_t,\; \sigma_D) + \lambda \sum_{n \in N(s)} \psi_S(u_s - u_n,\; \sigma_S).

The optimal overrelaxation parameter for an n×nn \times n image is

βmax=cos ⁣(πn+1),ωopt=21+1βmax2.\beta_{\max} = \cos\!\left(\frac{\pi}{n+1}\right), \qquad \omega_{\mathrm{opt}} = \frac{2}{1 + \sqrt{1 - \beta_{\max}^2}}.

To avoid local minima from the non-convex ρ\rho, the scale σ\sigma is initialised large enough that the objective is convex. For the Lorentzian, convexity holds when ρL>0\rho_L'' > 0 for all xx, which requires σinitrmax/2\sigma_{\mathrm{init}} \geq r_{\max} / \sqrt{2}, where rmaxr_{\max} is the maximum expected residual magnitude. σ\sigma is then reduced linearly toward the target scale over 6 continuation stages. A 3-level Gaussian image pyramid is used; at each level, 20 SOR iterations are performed per continuation stage and the flow estimate is upsampled as initialisation for the finer level. Reference parameter ranges from §6: σD\sigma_D from 18/218/\sqrt{2} to 5/25/\sqrt{2}; σS\sigma_S from 3/23/\sqrt{2} to 0.03/20.03/\sqrt{2}; λD=5\lambda_D = 5, λS=1\lambda_S = 1.

Algorithm
Dense piecewise-smooth flow
Input: Two grayscale frames ItI_t, It+1I_{t+1}; scale parameters σD\sigma_D, σS\sigma_S; smoothness weight λ\lambda; GNC stages KK; pyramid levels LL; SOR iterations MM per stage.
Output: Dense flow field (us,vs)(u_s, v_s) for all sites ss; per-pixel outlier indicator os=1[rsτ]o_s = \mathbf{1}[|r_s| \geq \tau].
  1. Build an LL-level Gaussian image pyramid for each frame.
  2. Initialise (u,v)=(0,0)(u, v) = (0, 0) at the coarsest level.
  3. For each pyramid level from coarsest to finest: warp It+1I_{t+1} toward ItI_t using the current flow (backward warp); recompute IxI_x, IyI_y, ItI_t from the warped frame pair; set σσinit\sigma \leftarrow \sigma_{\mathrm{init}}; for each of KK GNC stages run MM SOR iterations updating usu_s and vsv_s at every site, then reduce σ\sigma linearly toward the target; upsample (u,v)(u, v) to the next finer level.
  4. Classify site ss as outlier if Ixus+Iyvs+Itτ|I_x u_s + I_y v_s + I_t| \geq \tau.

Parametric (affine) variant

The 6-parameter affine flow model at pixel x=(x,y)\mathbf{x} = (x, y) is

u(x;a)=(a0+a1x+a2ya3+a4x+a5y).\mathbf{u}(\mathbf{x}; \mathbf{a}) = \begin{pmatrix} a_0 + a_1 x + a_2 y \\ a_3 + a_4 x + a_5 y \end{pmatrix}.

The robust parametric energy over a region RR is

Ep(a)=xRρ ⁣((I)Tu(x;a)+It,  σ).E_p(\mathbf{a}) = \sum_{\mathbf{x} \in R} \rho\!\left((\nabla I)^T \mathbf{u}(\mathbf{x}; \mathbf{a}) + I_t,\; \sigma\right).

Minimisation proceeds by SOR over the six parameters a0,,a5a_0, \ldots, a_5 with the GNC σ\sigma-schedule. The SOR update for each aia_i is

ain+1=ainωTaiEpai,a_i^{n+1} = a_i^n - \frac{\omega}{T_{a_i}} \cdot \frac{\partial E_p}{\partial a_i},

where Tai=x{x2,y2,1,}maxxρ(x,σ)T_{a_i} = \sum_{\mathbf{x}} \{x^2, y^2, 1, \ldots\} \cdot \max_x \rho''(x, \sigma) bounds the second derivative. The overrelaxation parameter is fixed at ω=1.995\omega = 1.995 and the GNC schedule reduces σ\sigma geometrically: σi+1=0.95σi\sigma_{i+1} = 0.95\,\sigma_i.

After recovering the dominant motion parameters a\mathbf{a}^*, pixels where rx=(I)Tu(x;a)+Itτ|r_\mathbf{x}| = |(\nabla I)^T \mathbf{u}(\mathbf{x}; \mathbf{a}^*) + I_t| \geq \tau are collected as outliers. The residual set is re-submitted to the same robust regression to recover a second motion; the process repeats until no consistent motion remains in the residuals.

Algorithm
Parametric (affine) motion estimation
Input: Two grayscale frames ItI_t, It+1I_{t+1}; region RR; initial σinit\sigma_{\mathrm{init}}; ω=1.995\omega = 1.995; convergence threshold ϵ=105\epsilon = 10^{-5}; pyramid levels L=4L = 4; SOR iterations M=30M = 30 per level.
Output: One or more affine parameter vectors a(1),a(2),\mathbf{a}^{(1)}, \mathbf{a}^{(2)}, \ldots; per-pixel outlier map.
  1. Build a 4-level Gaussian image pyramid for each frame.
  2. Initialise a=0\mathbf{a} = \mathbf{0} at the coarsest level.
  3. For each pyramid level from coarsest to finest: set σσinit\sigma \leftarrow \sigma_{\mathrm{init}}; repeat until Δa<ϵ\|\Delta \mathbf{a}\|_\infty < \epsilon or MM iterations elapsed — for each aia_i compute Ep/ai\partial E_p / \partial a_i and apply the SOR update, then reduce σ\sigma by σ0.95σ\sigma \leftarrow 0.95\,\sigma; upsample a\mathbf{a} to the next finer level.
  4. Collect outlier pixels where rxτ|r_\mathbf{x}| \geq \tau.
  5. If the outlier set is large enough, submit it to step 2 to recover a(2)\mathbf{a}^{(2)}.
  6. Repeat step 5 until no consistent motion remains.

Implementation

The per-pixel IRLS update for the dense variant in Rust, implementing the data-plus-smoothness gradient E/us\partial E / \partial u_s and the SOR step. The snippet covers one horizontal-component update; the vertical component vsv_s follows the same pattern with IyI_y in place of IxI_x.

/// Lorentzian IRLS weight: w(x, σ) = 2σ² / (2σ² + x²).
#[inline]
fn w_lorentzian(x: f32, sigma: f32) -> f32 {
    let s2 = 2.0 * sigma * sigma;
    s2 / (s2 + x * x)
}

/// One SOR update for u_s (horizontal flow at site s).
fn sor_update_u(
    r: f32,            // brightness-constancy residual I_x·u_s + I_y·v_s + I_t
    ix: f32,           // spatial gradient I_x at s
    neighbours: &[f32],// (u_n) for n ∈ N(s); length 4 on a 4-connected grid
    u_s: f32,
    sigma_d: f32,
    sigma_s: f32,
    lambda: f32,
    omega: f32,
    t_upper: f32,      // upper bound on ∂²E/∂u_s²
) -> f32 {
    // ψ_D(r, σ_D) · I_x, with ψ = w · x.
    let grad_data = ix * (w_lorentzian(r, sigma_d) * r);
    // λ · Σ_{n∈N(s)} ψ_S(u_s − u_n, σ_S).
    let grad_smooth: f32 = lambda * neighbours.iter().map(|&u_n| {
        let diff = u_s - u_n;
        w_lorentzian(diff, sigma_s) * diff
    }).sum::<f32>();
    u_s - (omega / t_upper) * (grad_data + grad_smooth)
}

The vectorised inner loop in Python:

import numpy as np

def w_lorentzian(x, sigma):
    s2 = 2.0 * sigma ** 2
    return s2 / (s2 + x ** 2)

def sor_update_u(r, ix, u, u_nbrs, sigma_d, sigma_s, lam, omega, t_upper):
    grad_data = ix * (w_lorentzian(r, sigma_d) * r)
    diffs = u[None] - u_nbrs                              # (4, H, W)
    grad_smooth = lam * (w_lorentzian(diffs, sigma_s) * diffs).sum(axis=0)
    return u - (omega / t_upper) * (grad_data + grad_smooth)

Remarks

  • O(KMN)O(K \cdot M \cdot N) work per pyramid level for the dense variant, where KK is the number of GNC continuation stages, MM is the SOR iteration count per stage, and NN is the number of pixels. Reference values are K=6K = 6, M=20M = 20 for the dense variant, and M=30M = 30 per pyramid level for the parametric variant.
  • Both the data term and the smoothness term must use robust penalties. Robustifying the smoothness term alone — the approach taken by classical line-process methods — leaves the data term quadratic and can increase flow error relative to the fully-quadratic baseline.
  • The Lorentzian outlier threshold is τL=2σ\tau_L = \sqrt{2}\,\sigma, derived from where ρL=0\rho_L'' = 0. The Geman-McClure threshold is τGM=σ/3\tau_{GM} = \sigma / \sqrt{3}. Sites where the final residual magnitude exceeds τ\tau are classified as motion outliers; this yields an explicit outlier map as a by-product of estimation.
  • Local minima in the non-convex energy are avoided via the GNC σ\sigma-schedule: σ\sigma is initialised at σinitrmax/2\sigma_{\mathrm{init}} \geq r_{\max} / \sqrt{2} (Lorentzian) or σinitrmax3\sigma_{\mathrm{init}} \geq r_{\max} \cdot \sqrt{3} (Geman-McClure) so the objective is globally convex, then lowered until the true non-convex penalty is recovered. A geometric schedule σi+1=0.95σi\sigma_{i+1} = 0.95\,\sigma_i is used for the parametric variant; a linear ramp is preferred for the dense variant.
  • The Gaussian pyramid handles displacements larger than one pixel per frame. Motions exceeding the coarse-level resolution cause hard failure; transparency that does not yield a dominant motion in a local region and global illumination changes are not handled by the brightness-constancy data term.
  • The redescending M-estimator is equivalent to an analog (continuous-valued) outlier process: an explicit line-process layer on the flow field is therefore unnecessary when a redescending ρ\rho is used.

References

  1. M. J. Black and P. Anandan. The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields. Computer Vision and Image Understanding, 63(1)
    –104, 1996. doi
    .1006/cviu.1996.0006
  2. B. K. P. Horn and B. G. Schunck. Determining Optical Flow. Artificial Intelligence, 17
    –203, 1981. hdl
    .1/6337
  3. B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. Proc. IJCAI, 1981. pdf

Extends

  • Horn-Schunck Optical Flow

    Robust M-estimator extension of the quadratic data and smoothness terms; non-convex but more tolerant of outliers and motion discontinuities.

  • Lucas-Kanade Image Registration

    Robust M-estimator version of the parametric variant; same machinery as Black-Anandan's piecewise-smooth flow.