Goal
Estimate the intrinsic and extrinsic parameters of any central omnidirectional camera — catadioptric systems (mirror plus conventional camera) and dioptric fisheye lenses — from a small set of images of a planar checkerboard pattern at unknown poses. The input is pixel coordinates of checkerboard corners across multiple views together with the known metric geometry of the calibration pattern. The output is a Taylor polynomial encoding the radially-symmetric imaging function, an affine pixel-to-sensor transform , the image center in pixel coordinates, and per-view rotation and translation. The method requires no prior knowledge of the mirror or lens model and no 3-D calibration fixture; the central-projection constraint — all 3-D rays passing through a single effective viewpoint — is the only governing assumption.
Algorithm
Let denote pixel coordinates and denote sensor-plane coordinates, related by the affine map
where is a stretch matrix and is the image-center translation. Let denote the radial distance on the sensor plane.
The sensor-plane point and its corresponding 3-D viewing ray are related by the rotationally-symmetric vector function
where is a polynomial in .
For all standard mirror and fisheye models , forcing . The simplified Taylor form is
with coefficients to be estimated. Typical degree .
Let denote the number of calibration views () and the number of checkerboard corners per view. Let denote the 3-D coordinates of corner in the pattern frame (). Let and denote the per-view rotation and translation.
For view , corner , the collinearity of the back-projected ray and the camera–point vector produces three scalar equations. The third is linear in the six unknowns ; stacking corners gives
Substituting the recovered per-view extrinsics into the remaining two equations and stacking all views yields the overdetermined system
with — the Taylor coefficients plus one depth per view.
where is the observed pixel coordinate and is the reprojection predicted by the current parameter estimate.
Procedure
- For each view , stack the cross-product constraint into and solve by SVD with a unit-norm constraint to recover . Apply orthonormality and to fix the scale and recover .
- Substitute all recovered extrinsics into the global system and solve by pseudoinverse to obtain and per-view depths . Select by incrementing from until the mean reprojection error reaches a local minimum.
- Run a two-pass linear refinement: re-estimate extrinsics with the updated intrinsics (Step 1), then re-estimate intrinsics with the updated extrinsics (Step 2).
- Conduct a coarse-to-fine grid search over candidate image-center positions . At each candidate evaluate the sum of squared reprojection errors. Halt when successive candidates differ by pixels.
- Minimise by Levenberg–Marquardt, initialised from the linear result with . Solve in two sequential sub-steps: extrinsics first, then intrinsics.
flowchart LR
A["Per-view linear extrinsics<br/>SVD on cross-product"] --> B["Global Taylor coefficients<br/>Pseudoinverse over views"]
B --> C["Two-pass linear refinement<br/>extrinsics ↔ intrinsics"]
C --> D["Image-center search<br/>iterative SSRE minimum"]
D --> E["Levenberg–Marquardt MLE<br/>final refinement"]
Implementation
The per-view linear stage in Rust:
use nalgebra::{DMatrix, DVector, SVD};
/// Solve M H = 0 for H = [r11, r12, r21, r22, t1, t2]^T from the
/// cross-product third-component constraint, given sensor-plane
/// corner coordinates and pattern-plane metric coordinates for one view.
fn solve_per_view_extrinsics(
corners_sensor: &[(f64, f64)], // (u'', v'')
pattern_xy: &[(f64, f64)], // (X, Y), Z = 0
) -> DVector<f64> {
let l = corners_sensor.len();
let mut m = DMatrix::<f64>::zeros(l, 6);
for (k, (&(u, v), &(x, y))) in corners_sensor
.iter()
.zip(pattern_xy.iter())
.enumerate()
{
// Third row of the cross product (Eq. 10.3): linear in H.
m[(k, 0)] = v * x; // r11
m[(k, 1)] = v * y; // r12
m[(k, 2)] = -u * x; // r21
m[(k, 3)] = -u * y; // r22
m[(k, 4)] = v; // t1
m[(k, 5)] = -u; // t2
}
// H is the right singular vector for the smallest singular value.
let svd = SVD::new(m, true, true);
let vt = svd.v_t.expect("SVD failed");
vt.row(vt.nrows() - 1).transpose().into()
}
The intrinsic stage stacks rows from equations (10.1) and (10.2) into and the recovery from orthonormality closes the per-view rotation; both follow the same SVD-and-substitute pattern as above and are omitted for brevity.
Remarks
- The intrinsic step solves linear equations in unknowns ( Taylor coefficients plus depths). Conditioning improves with more views and more corners per view; is the minimum for rank-sufficiency.
- The polynomial degree controls model capacity. The increment-and-stop heuristic (start at , raise until mean reprojection error stops decreasing) provides modest protection against overfitting; is typical for both catadioptric and fisheye sensors.
- The central-projection assumption (, single effective viewpoint) is a hard constraint. Non-central catadioptric systems with significant misalignment between mirror focus and camera optical centre violate it and produce irreducible systematic residuals that the Levenberg–Marquardt stage cannot eliminate.
- Near-coplanar viewing geometries make ill-conditioned: the per-view SVD approaches rank deficiency when corner configurations across views span a low-dimensional subspace.
- The algorithm does not recover pixel skew or non-unit aspect ratio in the linear phase — is initialised to the identity and refined only by Levenberg–Marquardt. Sensors with strong axis misalignment or large aspect ratio degrade the linear initialisation.
References
- D. Scaramuzza, A. Martinelli, R. Siegwart. A Toolbox for Easily Calibrating Omnidirectional Cameras. IEEE/RSJ IROS, 2006. PDF
- Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE TPAMI 22(11), 2000.
- M. Rufli, D. Scaramuzza, R. Siegwart. Automatic Detection of Checkerboards on Blurred and Distorted Images. IEEE/RSJ IROS, 2008.