As-Projective-As-Possible Image Stitching | VitaVision
Back to atlas

As-Projective-As-Possible Image Stitching

8 min readAdvancedView in graph
Based on
As-Projective-As-Possible Image Stitching with Moving DLT
Zaragoza, Chin, Brown, Suter · IEEE CVPR 2013
DOI ↗

Goal

Stitch two overlapping images II and II' given a set of point correspondences {(xi,xi)}i=1N\{(x_i, x'_i)\}_{i=1}^{N} across them. Compute, for every pixel xx_* in II, a 3×33 \times 3 projective transformation HH_* that maps the local neighbourhood of xx_* into II'. The field {H}xI\{H_*\}_{x_* \in I} varies smoothly with xx_* and reduces to a single global homography when the data are consistent with the projective model; elsewhere it deviates locally to absorb model inadequacy (parallax, non-rotational camera motion, non-planar scene). Globally projective, locally adjustable.

Algorithm

Let x~=[x,y,1]T\tilde{x} = [x, y, 1]^T denote x=[x,y]Tx = [x, y]^T in homogeneous coordinates. A homography HR3×3H \in \mathbb{R}^{3 \times 3} acts as x~=Hx~\tilde{x}' = H \tilde{x}, with hjTh_j^T the jj-th row of HH and h=vec(H)R9h = \mathrm{vec}(H) \in \mathbb{R}^9 the row-stacked vector.

For each correspondence (xi,xi)(x_i, x'_i), the constraint 03×1=x~i×Hx~i\mathbf{0}_{3 \times 1} = \tilde{x}'_i \times H \tilde{x}_i linearises to three rows in hh, of which two are independent:

ai=[01×3x~iTyix~iTx~iT01×3xix~iT]R2×9.a_i = \begin{bmatrix} \mathbf{0}_{1 \times 3} & -\tilde{x}_i^{T} & y'_i \tilde{x}_i^{T} \\ \tilde{x}_i^{T} & \mathbf{0}_{1 \times 3} & -x'_i \tilde{x}_i^{T} \end{bmatrix} \in \mathbb{R}^{2 \times 9}.

Stacking aia_i for all ii yields the DLT design matrix AR2N×9A \in \mathbb{R}^{2N \times 9}.

Definition
Global DLT homography (\hat{h})

The single homography that minimises the algebraic residual on all correspondences.

h^=argminh=1Ah2,\hat{h} = \arg\min_{\|h\| = 1} \|A h\|^2,

obtained as the right singular vector of AA with smallest singular value.

Definition
Moving DLT weights (w_*^i)

A Gaussian locality kernel on the source-image distance from a query point xx_* to each correspondence's source position xix_i, clipped from below by γ\gamma for stability.

wi=max ⁣(exp ⁣(xxi2σ2),  γ),w_*^i = \max\!\left(\exp\!\left(-\frac{\|x_* - x_i\|^2}{\sigma^2}\right),\; \gamma\right),

with bandwidth σ>0\sigma > 0 and floor γ[0,1]\gamma \in [0, 1]. As γ1\gamma \to 1 the field collapses to the global homography; as γ0\gamma \to 0 extrapolation regions become numerically singular.

Definition
Moving-DLT homography (h_*)

The location-dependent homography at query xx_*, fit by weighting each correspondence's DLT contribution by wiw_*^i.

h=argminh=1WAh2,h_* = \arg\min_{\|h\| = 1} \|W_* A h\|^2,

with W=diag([w1,w1,w2,w2,,wN,wN])R2N×2NW_* = \mathrm{diag}([w_*^1,\, w_*^1,\, w_*^2,\, w_*^2,\, \ldots,\, w_*^N,\, w_*^N]) \in \mathbb{R}^{2N \times 2N} — each weight repeats once for each of the two rows of aia_i. The solution is the right singular vector of WAW_* A with smallest singular value.

Procedure

Algorithm
As-projective-as-possible image stitching
Input: Source image II, target image II', correspondences {(xi,xi)}i=1N\{(x_i, x'_i)\}_{i=1}^{N}, bandwidth σ\sigma, weight floor γ\gamma, grid resolution C1×C2C_1 \times C_2.
Output: A per-cell homography field {H}\{H_*\} and the warped, blended composite of II and II'.
  1. Detect and match keypoints between II and II'. Run RANSAC with a DLT minimal solver to discard outlier correspondences.
  2. Apply Hartley pre-conditioning: translate each point set to centroid zero and isotropically scale so the mean distance from the origin is 2\sqrt{2}. Build the conditioned design matrix AA once.
  3. Partition II into a uniform C1×C2C_1 \times C_2 grid of cells. Take each cell's centre as the query xx_*.
  4. For every cell, compute the weight vector {wi}i=1N\{w_*^i\}_{i=1}^{N} from xx_* and {xi}\{x_i\}.
  5. For every cell, take the right singular vector of WAW_* A with smallest singular value as hh_*, reshape to HH_*, and de-normalise via HT1HTH_* \leftarrow T'^{-1} H_* T.
  6. Warp every pixel inside a cell using that cell's HH_*. Composite the warped source onto the target.

Implementation

Per-cell Moving DLT in Rust. The function takes the once-conditioned design matrix AA and returns the local homography at one cell centre — the kernel that steps 4–5 of the procedure invoke C1C2C_1 \cdot C_2 times.

use nalgebra::{DMatrix, Matrix3};

fn moving_dlt(
    a: &DMatrix<f64>,        // 2N×9 conditioned DLT design matrix
    src_pts: &[[f64; 2]],    // source-image positions of the N matches
    cell_centre: [f64; 2],   // query x_*
    sigma: f64,
    gamma: f64,
) -> Matrix3<f64> {
    let n = src_pts.len();
    let s2 = sigma * sigma;
    // Eq. 11: per-match Gaussian weights with floor γ.
    let w: Vec<f64> = src_pts.iter().map(|p| {
        let dx = p[0] - cell_centre[0];
        let dy = p[1] - cell_centre[1];
        (-(dx * dx + dy * dy) / s2).exp().max(gamma)
    }).collect();
    // Eq. 10: scale row 2i and row 2i+1 of A by w_i so that ‖W_* A h‖² = ∑ w_i² ‖a_i h‖².
    let mut wa = a.clone();
    for i in 0..n {
        let wi = w[i];
        for j in 0..9 {
            wa[(2 * i, j)]     *= wi;
            wa[(2 * i + 1, j)] *= wi;
        }
    }
    // Eq. 9: smallest right singular vector of W_* A.
    let svd = wa.svd(false, true);
    let v_t = svd.v_t.expect("SVD V^T");
    let h = v_t.row(8).transpose();      // last row of V^T → smallest singular direction
    Matrix3::new(
        h[0], h[1], h[2],
        h[3], h[4], h[5],
        h[6], h[7], h[8],
    )
}

Each branch of the function corresponds to one equation: the weight loop is Eq. (11), the row scaling is Eq. (10), and the SVD line is Eq. (9). Hartley pre-conditioning (and de-normalisation HT1HTH_* \leftarrow T'^{-1} H_* T) live one level up, since they are applied once outside the per-cell loop.

Remarks

  • Complexity: each per-cell solve is one SVD of a 2N×92N \times 9 matrix, O(N92+93)=O(N)O(N \cdot 9^2 + 9^3) = O(N) in NN; the full warp is O(C1C2N)O(C_1 C_2 \cdot N) for the field plus O(Ω)O(|\Omega|) for the pixel warp. The paper documents an O(m2log2m)O(m^2 \log^2 m) rank-one SVD update to exploit the observation that most cells share most weights.
  • The bandwidth σ\sigma and the floor γ\gamma trade locality against stability. Small σ\sigma tightens locality and improves overlap-region alignment; small γ\gamma frees the warp in data-poor regions and risks degeneracy. The paper's defaults span σ[8,12]\sigma \in [8, 12] pixels and γ[0.0025,0.025]\gamma \in [0.0025, 0.025] for 1024×7681024 \times 768 to 1500×20001500 \times 2000 images.
  • The warp reduces gracefully to a global homography in two limits: as γ1\gamma \to 1 all weights equalise and every cell solves the same DLT; as the inter-camera translation tends to zero, the data become projectively consistent and every weighted DLT recovers the same HH.
  • The estimator is local in the source image only. Two correspondences with similar xix_i but very different xix'_i — moving objects, occluding edges — receive equal weights and pull the local homography toward an average, producing visible misalignment. The paper relies on RANSAC to remove such matches before MDLT and on downstream blending or seam cutting to absorb residuals.
  • Cell partitioning is a computational shortcut, not a regularisation. Within a cell every pixel uses the same HH_*; the field is piecewise-constant in HH but still continuous in the warped pixel position to within cell-boundary precision, since neighbouring cells solve nearly identical weighted SVDs.
  • Common extensions name the limitation they address: the as-natural-as-possible warp (Lin 2015) attaches a global similarity prior to the boundary cells to suppress perspective distortion in extrapolation regions; bundle-adjusted multi-image stitching iterates MDLT and a shared similarity over >2> 2 views.
  • Compared with Gao DHW: see When to choose Gao DHW over APAP on the Gao page, which hosts the comparison per the older-paper-hosts rule.
  • Compared with Lin SVA: see When to choose Lin SVA over APAP on the Lin SVA page, which hosts the comparison per the older-paper-hosts rule.

References

  1. J. Zaragoza, T.-J. Chin, M. S. Brown, D. Suter. As-Projective-As-Possible Image Stitching with Moving DLT. IEEE CVPR, 2013. DOI: 10.1109/CVPR.2013.303
  2. R. I. Hartley. In Defense of the Eight-Point Algorithm. IEEE TPAMI, 1997. DOI: 10.1109/34.601246
  3. S. Schaefer, T. McPhail, J. Warren. Image Deformation Using Moving Least Squares. ACM SIGGRAPH, 2006. DOI: 10.1145/1141911.1141920
  4. J. Gao, S. J. Kim, M. S. Brown. Constructing Image Panoramas Using Dual-Homography Warping. IEEE CVPR, 2011. DOI: 10.1109/CVPR.2011.5995433
  5. W.-Y. Lin, S. Liu, Y. Matsushita, T.-T. Ng, L.-F. Cheong. Smoothly Varying Affine Stitching. IEEE CVPR, 2011. DOI: 10.1109/CVPR.2011.5995314

Generalises

  • Gao Dual-Homography Stitching

    APAP's continuous grid of per-cell homographies subsumes the two-plane parametrisation; the two methods are not peer practitioner choices.

  • medium
    Lin Smoothly Varying Affine Stitching

    Affine deviation field remains a useful baseline; APAP's projective per-cell grid is more general but not strictly necessary for moderate-parallax planar-scene panoramas.