Goal
Given images of a planar target with known metric coordinates, compute the camera intrinsic matrix , two radial distortion coefficients , and the per-view rigid pose . No 3D calibration object is required and the motion between views need not be known; the pattern and the camera may move freely relative to each other. The defining property is a closed-form linear initialization that turns each view's plane-to-image homography into two linear constraints on the image of the absolute conic.
Algorithm
Let be an image of a planar target. Let denote a target point in the world frame (plane on ) and its image projection; their homogeneous forms are and .
The pinhole model with intrinsic matrix and extrinsic reads
where is the -th column of , is the principal point, are axis scale factors, and is the skew. Since , the third column of drops out and the model collapses to a homography
with an unknown non-zero scalar absorbing the homography's scale ambiguity.
A symmetric matrix encoding the intrinsic parameters; its vectorisation is what the linear step estimates.
The row induced by ; two such rows per homography constrain .
Orthonormality of and yields two linear constraints on per view:
Stacking the rows from views produces a matrix and the homogeneous system .
-
For each view , estimate the homography by DLT on normalized point coordinates followed by Levenberg-Marquardt refinement of the geometric reprojection error.
-
For each , build the row pair and ; stack them into . If , append the zero-skew row .
-
Solve by SVD of ; take as the right singular vector associated with the smallest singular value.
-
Recover the intrinsics from in closed form:
-
For each view, set and
Project to by SVD ().
-
Initialise (or by linear least squares on the residuals of the distortion-free model).
-
Refine by Levenberg-Marquardt minimisation of the total reprojection error
where is the pinhole projection of followed by radial distortion
with the normalised image coordinates and the analogous expression for .
Implementation
The linear initialization — building from homographies and extracting from the right null vector — in Rust:
use nalgebra::{Matrix3, Matrix6xX, Vector6, DMatrix};
fn v_row(h: &Matrix3<f64>, i: usize, j: usize) -> Vector6<f64> {
let (hi, hj) = (h.column(i), h.column(j));
Vector6::new(
hi[0]*hj[0],
hi[0]*hj[1] + hi[1]*hj[0],
hi[1]*hj[1],
hi[2]*hj[0] + hi[0]*hj[2],
hi[2]*hj[1] + hi[1]*hj[2],
hi[2]*hj[2],
)
}
fn intrinsics_from_homographies(hs: &[Matrix3<f64>]) -> Matrix3<f64> {
let mut v = DMatrix::<f64>::zeros(2 * hs.len(), 6);
for (i, h) in hs.iter().enumerate() {
let v12 = v_row(h, 0, 1);
let d = v_row(h, 0, 0) - v_row(h, 1, 1);
v.row_mut(2*i ).copy_from(&v12.transpose());
v.row_mut(2*i + 1).copy_from(&d.transpose());
}
let svd = v.svd(false, true);
let vt = svd.v_t.expect("right singular vectors");
let b = vt.row(vt.nrows() - 1);
let (b11, b12, b22, b13, b23, b33) =
(b[0], b[1], b[2], b[3], b[4], b[5]);
let den = b11 * b22 - b12 * b12;
let v0 = (b12 * b13 - b11 * b23) / den;
let eta = b33 - (b13 * b13 + v0 * (b12 * b13 - b11 * b23)) / b11;
let alpha = (eta / b11).sqrt();
let beta = (eta * b11 / den).sqrt();
let gamma = -b12 * alpha * alpha * beta / eta;
let u0 = gamma * v0 / beta - b13 * alpha * alpha / eta;
Matrix3::new(
alpha, gamma, u0,
0.0, beta, v0,
0.0, 0.0, 1.0,
)
}
The per-view pose from , with the final rotation projected to :
fn pose_from_homography(a_inv: &Matrix3<f64>, h: &Matrix3<f64>)
-> (Matrix3<f64>, nalgebra::Vector3<f64>)
{
let lambda = 1.0 / (a_inv * h.column(0)).norm();
let r1 = lambda * (a_inv * h.column(0));
let r2 = lambda * (a_inv * h.column(1));
let r3 = r1.cross(&r2);
let t = lambda * (a_inv * h.column(2));
let q = Matrix3::from_columns(&[r1, r2, r3]);
let svd = q.svd(true, true);
let r = svd.u.unwrap() * svd.v_t.unwrap();
(r, t)
}
Nonlinear refinement of over the total reprojection error is standard Levenberg-Marquardt and is not reproduced here; the rotations are parameterised by the Rodrigues 3-vector to keep the Jacobian unconstrained.
Remarks
- Complexity per calibration is dominated by the per-view homography fit ( in the number of correspondences) and the closed-form SVD of ; the LM refinement is per iteration in the image count and target-point count .
- The linear step minimises an algebraic distance on and does not enforce positive definiteness of ; the recovered can turn imaginary under heavy noise. The LM stage on the geometric reprojection error is what delivers calibration-grade accuracy — the closed form supplies the initial guess only.
- Parallel orientations of the planar target are degenerate: a second view obtained from the first by a rotation about the plane normal (and any translation) contributes constraints linearly dependent on those of the first view.
- Minimum view count depends on which intrinsics are fixed: for the full five-parameter , with skew fixed to , and can only solve two parameters (typical choice: with principal point at the image centre).
- Distortion scope is two-term radial only; tangential (decentering) and thin-prism terms belong to the Brown-Conrady / Weng model and are a drop-in extension of the projection function and its Jacobian in the LM step.
- Compared with Tsai 1987: see When to choose Tsai over Zhang on the Tsai page, which hosts the comparison per the older-paper-hosts rule.
- Image-level RANSAC extension: CCS (Zhang et al., RA-L 2022) augments the LM stage with a view-selection RANSAC — randomly sample a subset of views, estimate intrinsics via the linear + LM steps above, score by reprojection error on all views, retain views below an inlier threshold, repeat until inlier count is sufficient (§III-C). View-level (rather than corner-level) RANSAC suffices because the upstream UNet detector already rejects per-corner outliers via the heatmap variance .
- The CCS RANSAC extension reduces real-data reprojection error from 0.45 px (STD 0.10) for a Matlab Zhang implementation to 0.37 px (STD 0.02) on a HIKROBOT 1440×1080 sensor (Table II) — a five-fold reduction in run-to-run standard deviation attributed to the view-selection step.
When to choose Zhang over Sturm-Maybank
Sturm-Maybank (CVPR 1999) and Zhang (ICCV 1999 / TPAMI 2000) are concurrent derivations of the same two linear IAC constraints per planar homography. Neither paper predates the other; both share the core algorithm. Per the "same year → more general scope" tiebreaker, Zhang hosts this comparison: Zhang's page covers the end-to-end calibration pipeline (homography fit, IAC linear initialisation, intrinsic + extrinsic recovery, two-term radial distortion, LM refinement) while Sturm-Maybank focuses on the linear IAC step plus singularity analysis and the variable-intrinsics generalisation.
| Zhang 2000 | Sturm-Maybank 1999 | |
|---|---|---|
| Core constraint | Eq. 3-4 (rotation column orthonormality) | Eq. 4 (identical content) |
| Distortion model | two-term radial + LM refinement | none (pinhole-only; future work in §4.3) |
| Variable intrinsics | constant across views | per-view (or ) accommodated as new design-matrix columns (§4.2) |
| Singularity catalogue | implicit ("parallel orientations are degenerate") | exhaustive (Tables 1 and 2 enumerate every singular plane configuration) |
| Closed-form extraction | Appendix B formulas | Eq. 5 (algebraically equivalent) |
| End-to-end pipeline | yes (full chain to LM-refined intrinsics + extrinsics + distortion) | no (linear IAC step only; downstream refinement deferred) |
Choose Zhang when (1) you want a complete calibration pipeline including distortion and LM refinement — Zhang's page is the practical end-to-end recipe used by OpenCV, MATLAB, and Kalibr; (2) the camera intrinsics are constant across views (the standard machine-vision case). Choose Sturm-Maybank when (1) the camera intrinsics vary per view (zooming or focusing camera) — this is Sturm-Maybank's unique extension; (2) you need to diagnose a calibration failure and want the explicit singularity catalogue (Tables 1 and 2) to identify which plane orientations make a parameter unrecoverable; (3) you want the cleaner geometric framing via circular-point projections to the IAC, which makes the singularity arguments transparent.
References
- Z. Zhang. A Flexible New Technique for Camera Calibration. MSR-TR-98-71 (also IEEE TPAMI 22(11), 2000). pdf
- R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal on Robotics and Automation 3(4)–344, 1987. pdf
- P. Sturm and S. J. Maybank. On plane-based camera calibration: A general algorithm, singularities, applications. IEEE CVPR, 1999. pdf
- J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and accuracy evaluation. IEEE TPAMI 14(10)–980, 1992. pdf
- Y. Zhang, X. Zhao, D. Qian. Learning-Based Distortion Correction and Feature Detection for High Precision and Robust Camera Calibration. IEEE Robotics and Automation Letters 7(4)–10477, 2022. arXiv