Bundle Adjustment | VitaVision
Back to atlas

Bundle Adjustment

7 min readAdvancedView in graph
Based on
A Flexible New Technique for Camera Calibration
Zhang · IEEE Transactions on Pattern Analysis and Machine Intelligence 2000
DOI ↗

Definition

Bundle adjustment is the joint nonlinear least-squares refinement of all camera parameters — intrinsics, lens-distortion coefficients, and per-view extrinsics — and, in the general structure-from-motion case, all 3-D point positions, that minimises the total reprojection error across every observed image point.

Definition
Bundle-adjustment objective

Given nn views and mm 3-D points, with mijm_{ij} the observed image coordinates of point jj in view ii, the bundle-adjustment problem is

minK,k,{Ri,ti},{Mj}ijmijm^(K,k,Ri,ti,Mj)2,\min_{K,\,k,\,\{R_i, t_i\},\,\{M_j\}} \sum_i \sum_j \bigl\| m_{ij} - \hat{m}(K, k, R_i, t_i, M_j) \bigr\|^2,

where m^()\hat{m}(\cdot) is the predicted pixel position obtained by projecting MjM_j through the pinhole model. Input: initial parameter estimates and observed correspondences. Output: the maximum-likelihood parameter set under i.i.d. Gaussian image noise.

In camera calibration the 3-D point positions MjM_j are fixed by the target geometry and only the camera parameters are free; in structure-from-motion the points are unknown and refined jointly.

Mathematical Description

Reprojection-error objective

Under i.i.d. zero-mean Gaussian image noise of equal variance in both pixel coordinates, the maximum-likelihood estimate of all parameters is exactly the minimiser of the sum of squared Euclidean reprojection errors,

minijmijm^(K,k,Ri,ti,Mj)2.\min \sum_i \sum_j \bigl\| m_{ij} - \hat{m}(K, k, R_i, t_i, M_j) \bigr\|^2.

Zhang's final calibration stage minimises this jointly over the five intrinsic parameters, the two radial-distortion coefficients, and the per-view extrinsics — a full bundle adjustment restricted to the camera variables. Weng et al. minimise the identical pixel-residual sum over the non-distortion intrinsics and the five distortion coefficients.

Levenberg-Marquardt solver

The standard solver is Levenberg-Marquardt, a damped Gauss-Newton method. With residual vector r(θ)r(\theta) and Jacobian J=r/θJ = \partial r/\partial\theta, one iteration solves the damped normal equations

(JJ+λD)δ=Jr,\bigl(J^\top J + \lambda D\bigr)\,\delta = -J^\top r,

where λ>0\lambda > 0 is the damping parameter and DD is typically diag(JJ)\mathrm{diag}(J^\top J). Large λ\lambda makes δ\delta a scaled steepest-descent step; small λ\lambda approaches the Gauss-Newton step. The algorithm raises λ\lambda when a step fails to decrease the cost and lowers it when a step is accepted, interpolating between robust gradient descent far from the minimum and fast quadratic convergence near it. Zhang reports convergence in three to five iterations from a closed-form linear initialisation; Tsai's stage-2 refinement is a two-iteration solve restricted to three unknowns.

Sparse block structure

In the full structure-from-motion case each scalar residual depends only on the parameters of one camera and one point. The resulting JJJ^\top J is block-sparse — camera-camera and point-point diagonal blocks with camera-point off-diagonal blocks. The Schur complement eliminates the numerous small point blocks, leaving a reduced camera-only system; this is what keeps bundle adjustment tractable for thousands of views and millions of points. In the calibration-only case the 3-D points are fixed, there are no point blocks, and the normal equations are dense in the camera parameters.

Initialisation from linear estimates

The objective is non-convex and Levenberg-Marquardt converges only to a local minimum, so a sufficiently accurate closed-form initialisation is required to place the solver in the basin of the global minimum. Zhang's linear stage solves a homogeneous system for the image of the absolute conic, extracts the intrinsics in closed form, and recovers per-view extrinsics from the homographies — the seed for the nonlinear refinement. Tsai's radial-alignment-constraint linear stage and Weng's central-point linear solve play the same role. In every case the final nonlinear refinement is a bundle adjustment seeded by a linear estimate.

Numerical Concerns

Initialisation dependence. The reprojection-error objective is non-convex; solution quality depends critically on the initial estimate. Zhang notes that the closed-form linear estimate of the first distortion coefficient can carry the wrong sign — a local-minimum risk that joint refinement resolves only because the other parameters are already well initialised.

Rotation parameterisation. Rotations lie on the three-dimensional manifold SO(3)SO(3); adding an unconstrained perturbation to a rotation matrix breaks orthogonality. The Rodrigues 3-vector keeps the nine matrix entries consistent and the Jacobian unconstrained; a unit quaternion is the alternative, with a unit-norm constraint enforced by normalisation after each update.

Gauge freedom. In unconstrained structure-from-motion the absolute scale is unobservable — rescaling all points and translations leaves every reprojection error unchanged — so JJJ^\top J is rank-deficient and a gauge-fixing convention is required. With a known calibration target the point positions are fixed and gauge freedom does not arise.

Jacobian conditioning. Parameters of very different scale — focal lengths in pixels (103\sim 10^3) versus distortion coefficients (103\sim 10^{-3}) — produce JJ columns of very different norm and a poorly scaled JJJ^\top J. Marquardt damping with D=diag(JJ)D = \mathrm{diag}(J^\top J) normalises each parameter direction to its own curvature; double precision is required for the small intermediate quantities.

Schur-complement cost. Eliminating point blocks introduces fill-in in the reduced camera matrix whenever two cameras share a point; for large problems the reduced system's sparsity must be analysed before choosing a direct Cholesky or a preconditioned-conjugate-gradient solver.

Outlier sensitivity. The least-squares objective penalises all residuals quadratically, so a single mismatch can dominate the Jacobian. Robust loss functions — Huber, Cauchy, truncated quadratic — grow more slowly for large residuals, approximating maximum likelihood under a heavy-tailed error distribution; standard practice alternates robust bundle adjustment with outlier rejection.

Where it appears

Bundle adjustment — in the restricted form of joint camera-parameter refinement over a fixed 3-D pattern — is the final and most accurate step of every classical camera-calibration pipeline in the atlas.

  • zhang-planar-calibration — the nonlinear Levenberg-Marquardt refinement is the bundle-adjustment stage: all intrinsics, both distortion coefficients, and all per-view extrinsics refined jointly by minimising the total reprojection error, seeded by the closed-form IAC solve.
  • tsai-versatile-calibration — stage 2 is a restricted bundle adjustment: with rotation and the lateral translation fixed from the linear stage, only focal length, depth translation, and one distortion coefficient are refined.
  • scaramuzza-omni-calibration — the omnidirectional model's final nonlinear refinement minimises the same reprojection-error objective with the same solver, over the polynomial coefficients and per-view extrinsics.

Bundle adjustment is the numerical procedure that recovers the parameters of the pinhole-camera-model to maximum-likelihood accuracy from image observations.

References

  1. B. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon. Bundle Adjustment — A Modern Synthesis. In Vision Algorithms: Theory and Practice, LNCS 1883, Springer, 2000.
  2. Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11)
    –1334, 2000.
  3. R. Y. Tsai. A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology. IEEE Journal on Robotics and Automation, 3(4)
    –344, 1987.
  4. J. Weng, P. Cohen, M. Herniou. Camera Calibration with Distortion Models and Accuracy Evaluation. IEEE TPAMI, 14(10)
    –980, 1992.
  5. R. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004.