Goal
Stitch two overlapping images and given a sparse set of SIFT correspondences across them, without requiring the views to be relatable by a single global parametric warp. Output: a per-pixel warp from to defined as a smoothly varying affine field that interpolates over the source feature positions and extrapolates smoothly into non-overlapping regions. Joint estimation of the field and the soft correspondence assignments is performed by an EM-style loop derived from the Coherent Point Drift (CPD) framework.
Algorithm
Let be SIFT feature positions in the base image and in the target. The global affine is RANSAC-initialised from corresponding pairs. Each base feature carries its own affine
where is regularised to be smooth across the image. The matrix collects all deviations.
The Gaussian-Fourier weight on the deviation field is shown to reduce (Appendix A of paper) to the closed form
where is the Gaussian affinity matrix between feature positions. This is a Tikhonov-style penalty: deviations correlated by spatial proximity (high ) are inexpensive; high-frequency deviations are heavily penalised.
The alignment cost is a Gaussian mixture between warped base features and target features , with a uniform-component for outlier handling:
Total objective: (Eq. 6 of paper).
- Initialise. , .
- Outer annealing loop: for each :
- E-step. Compute soft assignment weights from the current .
- M-step. Solve the closed-form linear system (Eq. 8): , where is the matrix of weighted residuals from the E-step.
- Stitching field at query point in the source image:
- Compose with the global affine to produce the warped pixel position; render via Poisson blending with optimal seam finding.
flowchart TB
A["SIFT correspondences"] --> B["RANSAC<br/>global affine"]
B --> C["Init ΔA = 0, σ_t = 1"]
C --> D["E-step: soft assign φ_ij"]
D --> E["M-step: ΔA = -CG/(2λ)"]
E --> F{"σ_t < 0.1?"}
F -- "no" --> G["σ_t ← 0.97 · σ_t"]
G --> D
F -- "yes" --> H["Field v(z)<br/>via Gaussian kernel"]
H --> I["Poisson blend +<br/>seam find"]
Remarks
- Affine vs projective extrapolation. SVA's affine field interpolates correctly within the overlap region but extrapolates as a smoothly varying affine map. For a translating camera observing a non-planar scene, the correct extrapolation is projective. The APAP paper's Fig. 1b shows this explicitly: SVA's extrapolated portion drifts from ground truth where APAP's projective extrapolation tracks. This is APAP's stated motivation for upgrading the local model from affine to projective.
- Hard failure: depth discontinuities. The smoothness regulariser cannot represent a step change in motion; foreground objects at sharply different depths from the background produce mean errors 2–3× higher than smooth-depth scenes (Fig. 5 in paper: 1.92 px vs 4.57 px on 500×500 synthetic).
- Coordinate normalisation. §3 normalises feature positions to zero mean and unit variance before EM, making approximately image-size-invariant. Scene-to-scene depth variation is not normalised; tuning is still required for novel scenes.
- pseudo-inverse cost. The Gaussian affinity matrix is ; is . For , this is the runtime bottleneck. APAP avoids this entirely by solving a separate SVD per cell, each — closed-form, no iteration.
- Annealing schedule. decreases by factor 0.97 from 1.0 to 0.1, giving outer iterations. Each outer iteration warm-starts from the previous; the inner M-step is closed-form. The schedule is the convergence criterion in practice.
- Best matcher use. §5.3 of the paper shows that the joint EM also produces ~40% more correct matches than SIFT nearest-neighbour and beats A-SIFT on hard pairs — useful as a downstream matcher even when the warp itself is replaced by a different stitching method.
When to choose Lin SVA over APAP
APAP (Zaragoza 2013) replaces SVA's affine field with a per-cell projective Moving-DLT homography. The two papers are contemporary entries in the spatially-varying-warp family; APAP improves on SVA on every test in its benchmark.
| Lin SVA (2011) | APAP (2013) | |
|---|---|---|
| Local model | affine (6 DOF) | homography (8 DOF, projective) |
| Per-cell solve | iterative EM (CPD-style) over the global field | closed-form weighted DLT per cell |
| Extrapolation | affine | projective |
| Runtime | ~15 min for 1024×768 (MATLAB) | "tens of seconds" same hardware |
| Scaling cost | pseudo-inverse on feature Gram matrix | per cell, parallelisable |
| Test RMSE on APAP's temple pair | 12.3 px | 1.4 px |
Choose Lin SVA when (1) the scene's depth variation is genuinely smooth and the affine extrapolation is adequate (gentle parallax, no large foreground objects); (2) the joint EM's correspondence-refinement side-effect is itself useful — SVA produces ~40% more matches than SIFT-NN and outperforms A-SIFT on hard pairs (§5.3); (3) the implementation simplicity of the CPD framework is attractive (the algorithm fits on one page of pseudocode). Choose APAP when accuracy or runtime is the gating requirement — APAP outperforms SVA on every test in its 5-pair benchmark, and the per-cell DLT structure is much cheaper at large feature counts. The runtime gap (15 min vs sub-minute) makes APAP the practical default whenever both are available.
References
- W.-Y. Lin, S. Liu, Y. Matsushita, T.-T. Ng, L.-F. Cheong. Smoothly Varying Affine Stitching. IEEE CVPR 2011, pp. 272–279. pdf
- J. Zaragoza, T.-J. Chin, M. S. Brown, D. Suter. As-Projective-As-Possible Image Stitching with Moving DLT. IEEE CVPR 2013. (Direct successor; replaces affine with projective per cell.)
- A. Myronenko, X. Song, M. Carreira-Perpinan. Non-rigid point set registration: Coherent Point Drift. NIPS 2007. (CPD framework that SVA's EM adapts to image stitching.)
- J. Gao, S. J. Kim, M. S. Brown. Constructing Image Panoramas Using Dual-Homography Warping. IEEE CVPR 2011. (Contemporary alternative; two-cluster RANSAC instead of smoothly varying field.)
- T. Igarashi, T. Moscovich, J. F. Hughes. As-Rigid-As-Possible Shape Manipulation. ACM Transactions on Graphics, 2005. (Conceptual ancestor for smooth-field warps.)