Zhang's Planar Camera Calibration | VitaVision
Back to atlas

Zhang's Planar Camera Calibration

10 min readAdvancedView in graph
Based on
A Flexible New Technique for Camera Calibration
Zhang · IEEE Transactions on Pattern Analysis and Machine Intelligence 2000
DOI ↗

Goal

Given n3n \geq 3 images of a planar target with known metric coordinates, compute the camera intrinsic matrix AR3×3A \in \mathbb{R}^{3 \times 3}, two radial distortion coefficients (k1,k2)(k_1, k_2), and the per-view rigid pose (Ri,ti)SO(3)×R3(R_i, t_i) \in SO(3) \times \mathbb{R}^3. No 3D calibration object is required and the motion between views need not be known; the pattern and the camera may move freely relative to each other. The defining property is a closed-form linear initialization that turns each view's plane-to-image homography into two linear constraints on the image of the absolute conic.

Algorithm

Let II be an image of a planar target. Let M=[X,Y,0]TM = [X, Y, 0]^T denote a target point in the world frame (plane on Z=0Z = 0) and m=[u,v]Tm = [u, v]^T its image projection; their homogeneous forms are M~\tilde M and m~\tilde m.

The pinhole model with intrinsic matrix AA and extrinsic [Rt][R \mid t] reads

sm~=A[r1r2r3t]M~,A=[αγu00βv0001],s\,\tilde m = A \begin{bmatrix} r_1 & r_2 & r_3 & t \end{bmatrix} \tilde M, \qquad A = \begin{bmatrix} \alpha & \gamma & u_0 \\ 0 & \beta & v_0 \\ 0 & 0 & 1 \end{bmatrix},

where rir_i is the ii-th column of RR, (u0,v0)(u_0, v_0) is the principal point, (α,β)(\alpha, \beta) are axis scale factors, and γ\gamma is the skew. Since Z=0Z = 0, the third column of RR drops out and the model collapses to a 3×33 \times 3 homography

sm~=HM~,H=[h1h2h3]=λA[r1r2t],s\,\tilde m = H \tilde M, \qquad H = \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} = \lambda\,A \begin{bmatrix} r_1 & r_2 & t \end{bmatrix},

with λ\lambda an unknown non-zero scalar absorbing the homography's scale ambiguity.

Definition
Image of the absolute conic (IAC)

A symmetric 3×33 \times 3 matrix encoding the intrinsic parameters; its vectorisation bb is what the linear step estimates.

B=ATA1,b=[B11,  B12,  B22,  B13,  B23,  B33]T.B = A^{-T} A^{-1}, \qquad b = \bigl[B_{11},\; B_{12},\; B_{22},\; B_{13},\; B_{23},\; B_{33}\bigr]^T.
Definition
Constraint vector v_{ij}

The row induced by hiTBhj=vijTbh_i^T B h_j = v_{ij}^T b; two such rows per homography constrain bb.

vij=[hi1hj1,  hi1hj2+hi2hj1,  hi2hj2,  hi3hj1+hi1hj3,  hi3hj2+hi2hj3,  hi3hj3]T.v_{ij} = \bigl[\,h_{i1}h_{j1},\; h_{i1}h_{j2} + h_{i2}h_{j1},\; h_{i2}h_{j2},\; h_{i3}h_{j1} + h_{i1}h_{j3},\; h_{i3}h_{j2} + h_{i2}h_{j3},\; h_{i3}h_{j3}\,\bigr]^T.

Orthonormality of r1r_1 and r2r_2 yields two linear constraints on bb per view:

v12Tb=0,(v11v22)Tb=0.v_{12}^T\,b = 0, \qquad \bigl(v_{11} - v_{22}\bigr)^T b = 0.

Stacking the rows from nn views produces a 2n×62n \times 6 matrix VV and the homogeneous system Vb=0V b = 0.

Algorithm
Zhang's planar calibration
Input: Images {Ii}i=1n\{I_i\}_{i=1}^n with n3n \geq 3 of a planar target at different orientations; correspondences {(Mj,mij)}\{(M_j, m_{ij})\} in each view.
Output: Intrinsics AA, radial distortion (k1,k2)(k_1, k_2), per-view extrinsics {(Ri,ti)}i=1n\{(R_i, t_i)\}_{i=1}^n.
  1. For each view ii, estimate the homography HiH_i by DLT on normalized point coordinates followed by Levenberg-Marquardt refinement of the geometric reprojection error.

  2. For each HiH_i, build the row pair v12Tv_{12}^T and (v11v22)T(v_{11} - v_{22})^T; stack them into VR2n×6V \in \mathbb{R}^{2n \times 6}. If n=2n = 2, append the zero-skew row [0,1,0,0,0,0][0, 1, 0, 0, 0, 0].

  3. Solve Vb=0V b = 0 by SVD of VV; take bb as the right singular vector associated with the smallest singular value.

  4. Recover the intrinsics from bb in closed form:

    v0=B12B13B11B23B11B22B122,η=B33B132+v0(B12B13B11B23)B11,α=η/B11,β=ηB11/(B11B22B122),γ=B12α2β/η,u0=γv0/βB13α2/η.\begin{aligned} v_0 &= \frac{B_{12} B_{13} - B_{11} B_{23}}{B_{11} B_{22} - B_{12}^2}, \\ \eta &= B_{33} - \frac{B_{13}^2 + v_0 (B_{12} B_{13} - B_{11} B_{23})}{B_{11}}, \\ \alpha &= \sqrt{\eta / B_{11}}, \\ \beta &= \sqrt{\eta B_{11} / (B_{11} B_{22} - B_{12}^2)}, \\ \gamma &= -B_{12} \alpha^2 \beta / \eta, \\ u_0 &= \gamma v_0 / \beta - B_{13} \alpha^2 / \eta. \end{aligned}
  5. For each view, set λi=1/A1h1(i)\lambda_i = 1 / \lVert A^{-1} h_1^{(i)} \rVert and

    r1(i)=λiA1h1(i),r2(i)=λiA1h2(i),r3(i)=r1(i)×r2(i),ti=λiA1h3(i).r_1^{(i)} = \lambda_i A^{-1} h_1^{(i)}, \quad r_2^{(i)} = \lambda_i A^{-1} h_2^{(i)}, \quad r_3^{(i)} = r_1^{(i)} \times r_2^{(i)}, \quad t_i = \lambda_i A^{-1} h_3^{(i)}.

    Project [r1(i)  r2(i)  r3(i)][r_1^{(i)}\;r_2^{(i)}\;r_3^{(i)}] to SO(3)SO(3) by SVD (R=UVTR = UV^T).

  6. Initialise k1=k2=0k_1 = k_2 = 0 (or by linear least squares on the residuals of the distortion-free model).

  7. Refine (A,k1,k2,{Ri,ti})(A, k_1, k_2, \{R_i, t_i\}) by Levenberg-Marquardt minimisation of the total reprojection error

    i=1nj=1mmijm˘(A,k1,k2,Ri,ti,Mj)2,\sum_{i=1}^{n} \sum_{j=1}^{m} \bigl\lVert\, m_{ij} - \breve m\bigl(A, k_1, k_2, R_i, t_i, M_j\bigr)\bigr\rVert^2,

    where m˘\breve m is the pinhole projection of MjM_j followed by radial distortion

    u˘=u+(uu0)[k1r2+k2r4],r2=x2+y2,\breve u = u + (u - u_0)\bigl[k_1 r^2 + k_2 r^4\bigr], \qquad r^2 = x^2 + y^2,

    with (x,y)(x, y) the normalised image coordinates and the analogous expression for v˘\breve v.

Implementation

The linear initialization — building VV from homographies and extracting AA from the right null vector — in Rust:

use nalgebra::{Matrix3, Matrix6xX, Vector6, DMatrix};

fn v_row(h: &Matrix3<f64>, i: usize, j: usize) -> Vector6<f64> {
    let (hi, hj) = (h.column(i), h.column(j));
    Vector6::new(
        hi[0]*hj[0],
        hi[0]*hj[1] + hi[1]*hj[0],
        hi[1]*hj[1],
        hi[2]*hj[0] + hi[0]*hj[2],
        hi[2]*hj[1] + hi[1]*hj[2],
        hi[2]*hj[2],
    )
}

fn intrinsics_from_homographies(hs: &[Matrix3<f64>]) -> Matrix3<f64> {
    let mut v = DMatrix::<f64>::zeros(2 * hs.len(), 6);
    for (i, h) in hs.iter().enumerate() {
        let v12 = v_row(h, 0, 1);
        let d   = v_row(h, 0, 0) - v_row(h, 1, 1);
        v.row_mut(2*i    ).copy_from(&v12.transpose());
        v.row_mut(2*i + 1).copy_from(&d.transpose());
    }
    let svd = v.svd(false, true);
    let vt  = svd.v_t.expect("right singular vectors");
    let b   = vt.row(vt.nrows() - 1);
    let (b11, b12, b22, b13, b23, b33) =
        (b[0], b[1], b[2], b[3], b[4], b[5]);

    let den  = b11 * b22 - b12 * b12;
    let v0   = (b12 * b13 - b11 * b23) / den;
    let eta  = b33 - (b13 * b13 + v0 * (b12 * b13 - b11 * b23)) / b11;
    let alpha = (eta / b11).sqrt();
    let beta  = (eta * b11 / den).sqrt();
    let gamma = -b12 * alpha * alpha * beta / eta;
    let u0    = gamma * v0 / beta - b13 * alpha * alpha / eta;

    Matrix3::new(
        alpha, gamma, u0,
        0.0,   beta,  v0,
        0.0,   0.0,   1.0,
    )
}

The per-view pose from (A,H)(A, H), with the final rotation projected to SO(3)SO(3):

fn pose_from_homography(a_inv: &Matrix3<f64>, h: &Matrix3<f64>)
    -> (Matrix3<f64>, nalgebra::Vector3<f64>)
{
    let lambda = 1.0 / (a_inv * h.column(0)).norm();
    let r1 = lambda * (a_inv * h.column(0));
    let r2 = lambda * (a_inv * h.column(1));
    let r3 = r1.cross(&r2);
    let t  = lambda * (a_inv * h.column(2));
    let q  = Matrix3::from_columns(&[r1, r2, r3]);
    let svd = q.svd(true, true);
    let r = svd.u.unwrap() * svd.v_t.unwrap();
    (r, t)
}

Nonlinear refinement of (A,k1,k2,{Ri,ti})(A, k_1, k_2, \{R_i, t_i\}) over the total reprojection error is standard Levenberg-Marquardt and is not reproduced here; the rotations are parameterised by the Rodrigues 3-vector to keep the Jacobian unconstrained.

Remarks

  • Complexity per calibration is dominated by the per-view homography fit (O(N)O(N) in the number of correspondences) and the closed-form 2n×62n \times 6 SVD of VV; the LM refinement is O(nm)O(n \cdot m) per iteration in the image count nn and target-point count mm.
  • The linear step minimises an algebraic distance on bb and does not enforce positive definiteness of BB; the recovered α,β\alpha, \beta can turn imaginary under heavy noise. The LM stage on the geometric reprojection error is what delivers calibration-grade accuracy — the closed form supplies the initial guess only.
  • Parallel orientations of the planar target are degenerate: a second view obtained from the first by a rotation about the plane normal (and any translation) contributes constraints linearly dependent on those of the first view.
  • Minimum view count depends on which intrinsics are fixed: n3n \geq 3 for the full five-parameter AA, n2n \geq 2 with skew fixed to γ=0\gamma = 0, and n=1n = 1 can only solve two parameters (typical choice: (α,β)(\alpha, \beta) with principal point at the image centre).
  • Distortion scope is two-term radial only; tangential (decentering) and thin-prism terms belong to the Brown-Conrady / Weng model and are a drop-in extension of the projection function m˘\breve m and its Jacobian in the LM step.
  • Compared with Tsai 1987: see When to choose Tsai over Zhang on the Tsai page, which hosts the comparison per the older-paper-hosts rule.
  • Image-level RANSAC extension: CCS (Zhang et al., RA-L 2022) augments the LM stage with a view-selection RANSAC — randomly sample a subset of views, estimate intrinsics via the linear + LM steps above, score by reprojection error on all views, retain views below an inlier threshold, repeat until inlier count is sufficient (§III-C). View-level (rather than corner-level) RANSAC suffices because the upstream UNet detector already rejects per-corner outliers via the heatmap variance σ\sigma.
  • The CCS RANSAC extension reduces real-data reprojection error from 0.45 px (STD 0.10) for a Matlab Zhang implementation to 0.37 px (STD 0.02) on a HIKROBOT 1440×1080 sensor (Table II) — a five-fold reduction in run-to-run standard deviation attributed to the view-selection step.

When to choose Zhang over Sturm-Maybank

Sturm-Maybank (CVPR 1999) and Zhang (ICCV 1999 / TPAMI 2000) are concurrent derivations of the same two linear IAC constraints per planar homography. Neither paper predates the other; both share the core algorithm. Per the "same year → more general scope" tiebreaker, Zhang hosts this comparison: Zhang's page covers the end-to-end calibration pipeline (homography fit, IAC linear initialisation, intrinsic + extrinsic recovery, two-term radial distortion, LM refinement) while Sturm-Maybank focuses on the linear IAC step plus singularity analysis and the variable-intrinsics generalisation.

Zhang 2000 Sturm-Maybank 1999
Core constraint Eq. 3-4 (rotation column orthonormality) Eq. 4 (identical content)
Distortion model two-term radial (k1,k2)(k_1, k_2) + LM refinement none (pinhole-only; future work in §4.3)
Variable intrinsics constant across views per-view ff (or f,u0,v0f, u_0, v_0) accommodated as new design-matrix columns (§4.2)
Singularity catalogue implicit ("parallel orientations are degenerate") exhaustive (Tables 1 and 2 enumerate every singular plane configuration)
Closed-form KK extraction Appendix B formulas Eq. 5 (algebraically equivalent)
End-to-end pipeline yes (full chain to LM-refined intrinsics + extrinsics + distortion) no (linear IAC step only; downstream refinement deferred)

Choose Zhang when (1) you want a complete calibration pipeline including distortion and LM refinement — Zhang's page is the practical end-to-end recipe used by OpenCV, MATLAB, and Kalibr; (2) the camera intrinsics are constant across views (the standard machine-vision case). Choose Sturm-Maybank when (1) the camera intrinsics vary per view (zooming or focusing camera) — this is Sturm-Maybank's unique extension; (2) you need to diagnose a calibration failure and want the explicit singularity catalogue (Tables 1 and 2) to identify which plane orientations make a parameter unrecoverable; (3) you want the cleaner geometric framing via circular-point projections to the IAC, which makes the singularity arguments transparent.

References

  1. Z. Zhang. A Flexible New Technique for Camera Calibration. MSR-TR-98-71 (also IEEE TPAMI 22(11), 2000). pdf
  2. R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal on Robotics and Automation 3(4)
    –344, 1987. pdf
  3. P. Sturm and S. J. Maybank. On plane-based camera calibration: A general algorithm, singularities, applications. IEEE CVPR, 1999. pdf
  4. J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and accuracy evaluation. IEEE TPAMI 14(10)
    –980, 1992. pdf
  5. Y. Zhang, X. Zhao, D. Qian. Learning-Based Distortion Correction and Feature Detection for High Precision and Robust Camera Calibration. IEEE Robotics and Automation Letters 7(4)
    –10477, 2022. arXiv

Parallel foundation with