Zhang's Planar Camera Calibration | VitaVision
Back to atlas

Zhang's Planar Camera Calibration

10 min readAdvancedView in graph
Based on
A Flexible New Technique for Camera Calibration
Zhang · IEEE Transactions on Pattern Analysis and Machine Intelligence 2000
DOI ↗

Goal

Given n3n \geq 3 images of a planar target with known metric coordinates, compute the camera intrinsic matrix AR3×3A \in \mathbb{R}^{3 \times 3}, two radial distortion coefficients (k1,k2)(k_1, k_2), and the per-view rigid pose (Ri,ti)SO(3)×R3(R_i, t_i) \in SO(3) \times \mathbb{R}^3. No 3D calibration object is required and the motion between views need not be known; the pattern and the camera may move freely relative to each other. The defining property is a closed-form linear initialization that turns each view's plane-to-image homography into two linear constraints on the image of the absolute conic.

Algorithm

Let II be an image of a planar target. Let M=[X,Y,0]TM = [X, Y, 0]^T denote a target point in the world frame (plane on Z=0Z = 0) and m=[u,v]Tm = [u, v]^T its image projection; their homogeneous forms are M~\tilde M and m~\tilde m.

The pinhole model with intrinsic matrix AA and extrinsic [Rt][R \mid t] reads

sm~=A[r1r2r3t]M~,A=[αγu00βv0001],s\,\tilde m = A \begin{bmatrix} r_1 & r_2 & r_3 & t \end{bmatrix} \tilde M, \qquad A = \begin{bmatrix} \alpha & \gamma & u_0 \\ 0 & \beta & v_0 \\ 0 & 0 & 1 \end{bmatrix},

where rir_i is the ii-th column of RR, (u0,v0)(u_0, v_0) is the principal point, (α,β)(\alpha, \beta) are axis scale factors, and γ\gamma is the skew. Since Z=0Z = 0, the third column of RR drops out and the model collapses to a 3×33 \times 3 homography

sm~=HM~,H=[h1h2h3]=λA[r1r2t],s\,\tilde m = H \tilde M, \qquad H = \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} = \lambda\,A \begin{bmatrix} r_1 & r_2 & t \end{bmatrix},

with λ\lambda an unknown non-zero scalar absorbing the homography's scale ambiguity.

Definition
Image of the absolute conic (IAC)

A symmetric 3×33 \times 3 matrix encoding the intrinsic parameters; its vectorisation bb is what the linear step estimates.

B=ATA1,b=[B11,  B12,  B22,  B13,  B23,  B33]T.B = A^{-T} A^{-1}, \qquad b = \bigl[B_{11},\; B_{12},\; B_{22},\; B_{13},\; B_{23},\; B_{33}\bigr]^T.
Definition
Constraint vector v_{ij}

The row induced by hiTBhj=vijTbh_i^T B h_j = v_{ij}^T b; two such rows per homography constrain bb.

vij=[hi1hj1,  hi1hj2+hi2hj1,  hi2hj2,  hi3hj1+hi1hj3,  hi3hj2+hi2hj3,  hi3hj3]T.v_{ij} = \bigl[\,h_{i1}h_{j1},\; h_{i1}h_{j2} + h_{i2}h_{j1},\; h_{i2}h_{j2},\; h_{i3}h_{j1} + h_{i1}h_{j3},\; h_{i3}h_{j2} + h_{i2}h_{j3},\; h_{i3}h_{j3}\,\bigr]^T.

Orthonormality of r1r_1 and r2r_2 yields two linear constraints on bb per view:

v12Tb=0,(v11v22)Tb=0.v_{12}^T\,b = 0, \qquad \bigl(v_{11} - v_{22}\bigr)^T b = 0.

Stacking the rows from nn views produces a 2n×62n \times 6 matrix VV and the homogeneous system Vb=0V b = 0.

Algorithm
Zhang's planar calibration
Input: Images {Ii}i=1n\{I_i\}_{i=1}^n with n3n \geq 3 of a planar target at different orientations; correspondences {(Mj,mij)}\{(M_j, m_{ij})\} in each view.
Output: Intrinsics AA, radial distortion (k1,k2)(k_1, k_2), per-view extrinsics {(Ri,ti)}i=1n\{(R_i, t_i)\}_{i=1}^n.
  1. For each view ii, estimate the homography HiH_i by DLT on normalized point coordinates followed by Levenberg-Marquardt refinement of the geometric reprojection error.

  2. For each HiH_i, build the row pair v12Tv_{12}^T and (v11v22)T(v_{11} - v_{22})^T; stack them into VR2n×6V \in \mathbb{R}^{2n \times 6}. If n=2n = 2, append the zero-skew row [0,1,0,0,0,0][0, 1, 0, 0, 0, 0].

  3. Solve Vb=0V b = 0 by SVD of VV; take bb as the right singular vector associated with the smallest singular value.

  4. Recover the intrinsics from bb in closed form:

    v0=B12B13B11B23B11B22B122,η=B33B132+v0(B12B13B11B23)B11,α=η/B11,β=ηB11/(B11B22B122),γ=B12α2β/η,u0=γv0/βB13α2/η.\begin{aligned} v_0 &= \frac{B_{12} B_{13} - B_{11} B_{23}}{B_{11} B_{22} - B_{12}^2}, \\ \eta &= B_{33} - \frac{B_{13}^2 + v_0 (B_{12} B_{13} - B_{11} B_{23})}{B_{11}}, \\ \alpha &= \sqrt{\eta / B_{11}}, \\ \beta &= \sqrt{\eta B_{11} / (B_{11} B_{22} - B_{12}^2)}, \\ \gamma &= -B_{12} \alpha^2 \beta / \eta, \\ u_0 &= \gamma v_0 / \beta - B_{13} \alpha^2 / \eta. \end{aligned}
  5. For each view, set λi=1/A1h1(i)\lambda_i = 1 / \lVert A^{-1} h_1^{(i)} \rVert and

    r1(i)=λiA1h1(i),r2(i)=λiA1h2(i),r3(i)=r1(i)×r2(i),ti=λiA1h3(i).r_1^{(i)} = \lambda_i A^{-1} h_1^{(i)}, \quad r_2^{(i)} = \lambda_i A^{-1} h_2^{(i)}, \quad r_3^{(i)} = r_1^{(i)} \times r_2^{(i)}, \quad t_i = \lambda_i A^{-1} h_3^{(i)}.

    Project [r1(i)  r2(i)  r3(i)][r_1^{(i)}\;r_2^{(i)}\;r_3^{(i)}] to SO(3)SO(3) by SVD (R=UVTR = UV^T).

  6. Initialise k1=k2=0k_1 = k_2 = 0 (or by linear least squares on the residuals of the distortion-free model).

  7. Refine (A,k1,k2,{Ri,ti})(A, k_1, k_2, \{R_i, t_i\}) by Levenberg-Marquardt minimisation of the total reprojection error

    i=1nj=1mmijm˘(A,k1,k2,Ri,ti,Mj)2,\sum_{i=1}^{n} \sum_{j=1}^{m} \bigl\lVert\, m_{ij} - \breve m\bigl(A, k_1, k_2, R_i, t_i, M_j\bigr)\bigr\rVert^2,

    where m˘\breve m is the pinhole projection of MjM_j followed by radial distortion

    u˘=u+(uu0)[k1r2+k2r4],r2=x2+y2,\breve u = u + (u - u_0)\bigl[k_1 r^2 + k_2 r^4\bigr], \qquad r^2 = x^2 + y^2,

    with (x,y)(x, y) the normalised image coordinates and the analogous expression for v˘\breve v.

Implementation

The linear initialization — building VV from homographies and extracting AA from the right null vector — in Rust:

use nalgebra::{Matrix3, Matrix6xX, Vector6, DMatrix};

fn v_row(h: &Matrix3<f64>, i: usize, j: usize) -> Vector6<f64> {
    let (hi, hj) = (h.column(i), h.column(j));
    Vector6::new(
        hi[0]*hj[0],
        hi[0]*hj[1] + hi[1]*hj[0],
        hi[1]*hj[1],
        hi[2]*hj[0] + hi[0]*hj[2],
        hi[2]*hj[1] + hi[1]*hj[2],
        hi[2]*hj[2],
    )
}

fn intrinsics_from_homographies(hs: &[Matrix3<f64>]) -> Matrix3<f64> {
    let mut v = DMatrix::<f64>::zeros(2 * hs.len(), 6);
    for (i, h) in hs.iter().enumerate() {
        let v12 = v_row(h, 0, 1);
        let d   = v_row(h, 0, 0) - v_row(h, 1, 1);
        v.row_mut(2*i    ).copy_from(&v12.transpose());
        v.row_mut(2*i + 1).copy_from(&d.transpose());
    }
    let svd = v.svd(false, true);
    let vt  = svd.v_t.expect("right singular vectors");
    let b   = vt.row(vt.nrows() - 1);
    let (b11, b12, b22, b13, b23, b33) =
        (b[0], b[1], b[2], b[3], b[4], b[5]);

    let den  = b11 * b22 - b12 * b12;
    let v0   = (b12 * b13 - b11 * b23) / den;
    let eta  = b33 - (b13 * b13 + v0 * (b12 * b13 - b11 * b23)) / b11;
    let alpha = (eta / b11).sqrt();
    let beta  = (eta * b11 / den).sqrt();
    let gamma = -b12 * alpha * alpha * beta / eta;
    let u0    = gamma * v0 / beta - b13 * alpha * alpha / eta;

    Matrix3::new(
        alpha, gamma, u0,
        0.0,   beta,  v0,
        0.0,   0.0,   1.0,
    )
}

The per-view pose from (A,H)(A, H), with the final rotation projected to SO(3)SO(3):

fn pose_from_homography(a_inv: &Matrix3<f64>, h: &Matrix3<f64>)
    -> (Matrix3<f64>, nalgebra::Vector3<f64>)
{
    let lambda = 1.0 / (a_inv * h.column(0)).norm();
    let r1 = lambda * (a_inv * h.column(0));
    let r2 = lambda * (a_inv * h.column(1));
    let r3 = r1.cross(&r2);
    let t  = lambda * (a_inv * h.column(2));
    let q  = Matrix3::from_columns(&[r1, r2, r3]);
    let svd = q.svd(true, true);
    let r = svd.u.unwrap() * svd.v_t.unwrap();
    (r, t)
}

Nonlinear refinement of (A,k1,k2,{Ri,ti})(A, k_1, k_2, \{R_i, t_i\}) over the total reprojection error is standard Levenberg-Marquardt and is not reproduced here; the rotations are parameterised by the Rodrigues 3-vector to keep the Jacobian unconstrained.

Remarks

  • Complexity per calibration is dominated by the per-view homography fit (O(N)O(N) in the number of correspondences) and the closed-form 2n×62n \times 6 SVD of VV; the LM refinement is O(nm)O(n \cdot m) per iteration in the image count nn and target-point count mm.
  • The linear step minimises an algebraic distance on bb and does not enforce positive definiteness of BB; the recovered α,β\alpha, \beta can turn imaginary under heavy noise. The LM stage on the geometric reprojection error is what delivers calibration-grade accuracy — the closed form supplies the initial guess only.
  • Parallel orientations of the planar target are degenerate: a second view obtained from the first by a rotation about the plane normal (and any translation) contributes constraints linearly dependent on those of the first view.
  • Minimum view count depends on which intrinsics are fixed: n3n \geq 3 for the full five-parameter AA, n2n \geq 2 with skew fixed to γ=0\gamma = 0, and n=1n = 1 can only solve two parameters (typical choice: (α,β)(\alpha, \beta) with principal point at the image centre).
  • Distortion scope is two-term radial only; tangential (decentering) and thin-prism terms belong to the Brown-Conrady / Weng model and are a drop-in extension of the projection function m˘\breve m and its Jacobian in the LM step.
  • Compared with Tsai 1987: see When to choose Tsai over Zhang on the Tsai page, which hosts the comparison per the older-paper-hosts rule.
  • Image-level RANSAC extension: CCS (Zhang et al., RA-L 2022) augments the LM stage with a view-selection RANSAC — randomly sample a subset of views, estimate intrinsics via the linear + LM steps above, score by reprojection error on all views, retain views below an inlier threshold, repeat until inlier count is sufficient (§III-C). View-level (rather than corner-level) RANSAC suffices because the upstream UNet detector already rejects per-corner outliers via the heatmap variance σ\sigma.
  • The CCS RANSAC extension reduces real-data reprojection error from 0.45 px (STD 0.10) for a Matlab Zhang implementation to 0.37 px (STD 0.02) on a HIKROBOT 1440×1080 sensor (Table II) — a five-fold reduction in run-to-run standard deviation attributed to the view-selection step.

When to choose Zhang over Sturm-Maybank

Sturm-Maybank (CVPR 1999) and Zhang (ICCV 1999 / TPAMI 2000) are concurrent derivations of the same two linear IAC constraints per planar homography. Neither paper predates the other; both share the core algorithm. Per the "same year → more general scope" tiebreaker, Zhang hosts this comparison: Zhang's page covers the end-to-end calibration pipeline (homography fit, IAC linear initialisation, intrinsic + extrinsic recovery, two-term radial distortion, LM refinement) while Sturm-Maybank focuses on the linear IAC step plus singularity analysis and the variable-intrinsics generalisation.

Zhang 2000 Sturm-Maybank 1999
Core constraint Eq. 3-4 (rotation column orthonormality) Eq. 4 (identical content)
Distortion model two-term radial (k1,k2)(k_1, k_2) + LM refinement none (pinhole-only; future work in §4.3)
Variable intrinsics constant across views per-view ff (or f,u0,v0f, u_0, v_0) accommodated as new design-matrix columns (§4.2)
Singularity catalogue implicit ("parallel orientations are degenerate") exhaustive (Tables 1 and 2 enumerate every singular plane configuration)
Closed-form KK extraction Appendix B formulas Eq. 5 (algebraically equivalent)
End-to-end pipeline yes (full chain to LM-refined intrinsics + extrinsics + distortion) no (linear IAC step only; downstream refinement deferred)

Choose Zhang when (1) you want a complete calibration pipeline including distortion and LM refinement — Zhang's page is the practical end-to-end recipe used by OpenCV, MATLAB, and Kalibr; (2) the camera intrinsics are constant across views (the standard machine-vision case). Choose Sturm-Maybank when (1) the camera intrinsics vary per view (zooming or focusing camera) — this is Sturm-Maybank's unique extension; (2) you need to diagnose a calibration failure and want the explicit singularity catalogue (Tables 1 and 2) to identify which plane orientations make a parameter unrecoverable; (3) you want the cleaner geometric framing via circular-point projections to the IAC, which makes the singularity arguments transparent.

References

  1. Z. Zhang. A Flexible New Technique for Camera Calibration. MSR-TR-98-71 (also IEEE TPAMI 22(11), 2000). pdf
  2. R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal on Robotics and Automation 3(4):323–344, 1987. pdf
  3. P. Sturm and S. J. Maybank. On plane-based camera calibration: A general algorithm, singularities, applications. IEEE CVPR, 1999. pdf
  4. J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and accuracy evaluation. IEEE TPAMI 14(10):965–980, 1992. pdf
  5. Y. Zhang, X. Zhao, D. Qian. Learning-Based Distortion Correction and Feature Detection for High Precision and Robust Camera Calibration. IEEE Robotics and Automation Letters 7(4):10470–10477, 2022. arXiv

Parallel foundation with