Pinhole Camera Model | VitaVision
Back to atlas

Pinhole Camera Model

9 min readIntermediateView in graph
Based on
A Flexible New Technique for Camera Calibration
Zhang · IEEE Transactions on Pattern Analysis and Machine Intelligence 2000
DOI ↗

Definition

The pinhole camera model is the projective map from a 3-D scene point to a 2-D image pixel through a single centre of projection — the optical centre — in which every ray from the scene passes through that centre and strikes the image plane at a unique location. No lens optics, aperture, or depth-of-field effects are modelled: the camera is an ideal perspective projector.

Definition
Central projection equation

Given a scene point M~=[X,Y,Z,1]T\tilde{M} = [X, Y, Z, 1]^T in homogeneous world coordinates, its image in homogeneous pixel coordinates m~=[u,v,1]T\tilde{m} = [u, v, 1]^T satisfies

m~K[Rt]M~,\tilde{m} \sim K\,[R \mid t]\,\tilde{M},

where KK is the 3×33 \times 3 intrinsic matrix, [Rt][R \mid t] is the 3×43 \times 4 extrinsic matrix encoding the rigid transformation from world to camera frame, and \sim denotes equality up to a nonzero scale factor. Input: a 3-D point in world coordinates. Output: a 2-D point in pixel coordinates.

The up-to-scale relation reflects the homogeneous-coordinate ambiguity: multiplying m~\tilde{m} by any nonzero scalar gives the same pixel. Recovering the absolute depth ZZ from m~\tilde{m} alone is impossible; depth is the information irreversibly discarded by the projection.

Mathematical Description

Intrinsic matrix

The intrinsic matrix encodes how the 3-D optical geometry maps to the sensor's discrete pixel grid:

K=[fxγcx0fycy001].K = \begin{bmatrix} f_x & \gamma & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}.

  • fx=f/dxf_x = f / d_x and fy=f/dyf_y = f / d_y are the focal lengths in pixels — the physical focal length ff divided by the pixel pitches dxd_x, dyd_y. Zhang writes these (α,β)(\alpha, \beta), Weng et al. (fu,fv)(f_u, f_v).
  • (cx,cy)(c_x, c_y) is the principal point — the pixel coordinates where the optical axis meets the image plane. Tsai fixes it at the image centre and instead calibrates a scale factor sxs_x to absorb CCD scanning uncertainty.
  • γ\gamma is the skew, non-zero only when the pixel axes are not perpendicular. Modern digital sensors have γ0\gamma \approx 0; the parameter is retained for generality.

KK has five degrees of freedom in general, four when zero skew is enforced.

Extrinsic transform

The extrinsic matrix [Rt][R \mid t] concatenates a rotation RSO(3)R \in SO(3) and translation tR3t \in \mathbb{R}^3, mapping a world point MM to camera coordinates Mc=RM+tM_c = RM + t. Tsai parameterises the rotation by yaw, pitch, and roll; Zhang uses the Rodrigues 3-vector to keep the Jacobian unconstrained during Levenberg-Marquardt refinement. The extrinsic parameters are per-view: each image of a calibration target yields its own (Ri,ti)(R_i, t_i).

Projection matrix and normalised coordinates

The 3×43 \times 4 projection matrix combines intrinsics and extrinsics, P=K[Rt]P = K\,[R \mid t], giving the expanded pixel projection

u=fxr1TM+txr3TM+tz+cx,v=fyr2TM+tyr3TM+tz+cy,u = f_x \frac{r_1^T M + t_x}{r_3^T M + t_z} + c_x, \qquad v = f_y \frac{r_2^T M + t_y}{r_3^T M + t_z} + c_y,

with riTr_i^T the ii-th row of RR. The normalised image coordinates are the camera-frame coordinates before intrinsic scaling,

xn=Xc/Zc,yn=Yc/Zc,x_n = X_c / Z_c, \qquad y_n = Y_c / Z_c,

so that u=fxxn+cxu = f_x x_n + c_x and v=fyyn+cyv = f_y y_n + c_y. These normalised coordinates are the input to the distortion model.

Calibration homography and planar targets

When the calibration target is planar — placed at Z=0Z = 0 — the third column of RR drops out and the 3×43 \times 4 projection reduces to a 3×33 \times 3 plane-to-image homography,

sm~=HM~2D,H=K[r1    r2    t],s\,\tilde{m} = H\,\tilde{M}_{2D}, \qquad H = K\,[r_1 \;\; r_2 \;\; t],

with r1,r2r_1, r_2 the first two columns of RR. Because r1r_1 and r2r_2 are orthonormal, the product B=KTK1B = K^{-T}K^{-1} — the image of the absolute conic — satisfies two linear constraints per homography, h1TBh2=0h_1^T B h_2 = 0 and h1TBh1h2TBh2=0h_1^T B h_1 - h_2^T B h_2 = 0. Stacking two rows per view across n3n \geq 3 views yields a homogeneous system whose null vector encodes the five intrinsic parameters. Sturm and Maybank derive the same constraints independently in the form h1Tωh1h2Tωh2=0h_1^T \omega h_1 - h_2^T \omega h_2 = 0, h1Tωh2=0h_1^T \omega h_2 = 0 with ω=KTK1\omega = K^{-T}K^{-1}.

Departure from the ideal model

Real lenses displace the normalised coordinates (xn,yn)(x_n, y_n) from their ideal positions by radial, tangential, and thin-prism components. The pinhole model describes the undistorted ideal case; the additive correction is treated separately in camera-distortion-models.

Numerical Concerns

Homogeneous-coordinate scale ambiguity. The third component of PM~P\tilde{M} is the scene depth ZcZ_c, the divisor that recovers pixel coordinates. A near-zero ZcZ_c — a point at or behind the camera — makes the projection ill-defined and must be guarded in any implementation.

Principal-point and focal-length correlation. The principal point is statistically correlated with the radial distortion coefficients and, to a lesser degree, the focal length. Calibration sets with narrow angular diversity — all views near fronto-parallel — yield a poorly conditioned constraint system; inter-view rotations near 45°45° from the image plane give the best conditioning.

Pixel vs metric units. Focal lengths in pixels are dimensionless ratios, whereas the physical focal length and pixel pitches carry units. Mixing the two conventions in a single Jacobian is a common error source.

Skew near zero. For virtually all digital sensors γ0\gamma \approx 0, which drives the IAC entry B120B_{12} \approx 0 and makes the extraction of KK from BB numerically stable. Cameras with genuine skew face a less stable extraction.

Planar degeneracy and minimum views. A single view of a planar target gives only 2 independent constraints on the 5-DOF intrinsic matrix; at least 3 views at non-parallel orientations are required for a fully determined system, or 2 with the zero-skew prior. Parallel planes contribute linearly dependent rows regardless of view count — a provable rank deficiency, not a conditioning issue.

Normalisation for the linear estimate. The DLT system for the calibration homography is poorly conditioned when raw pixel and world coordinates are mixed; isotropic normalisation of both point sets is required before assembly and inverted afterwards.

Where it appears

The pinhole camera model is the shared projection foundation of every calibration and pose-estimation algorithm in the atlas.

  • zhang-planar-calibration — recovers KK and per-view (Ri,ti)(R_i, t_i) from multiple planar views via the IAC linear system derived from the H=K[r1  r2  t]H = K[r_1\;r_2\;t] factorisation.
  • tsai-versatile-calibration — recovers KK and (R,t)(R, t) from a 3-D rig via the radial alignment constraint, parameterising KK by effective focal length and scale factor sxs_x.
  • sturm-plane-based-calibration — derives the same two IAC constraints per homography independently, with an exhaustive singularity catalogue for degenerate plane configurations.
  • scaramuzza-omni-calibration — replaces the perspective projection with a polynomial omnidirectional model; the standard pinhole projection is its small-field-of-view limit.
  • epnp — solves for the extrinsic pose [Rt][R \mid t] given a calibrated camera and nn 2D-3D correspondences, assuming the pinhole projection.
  • camera-distortion-models — the additive correction to the normalised image coordinates that accounts for real-lens departure from the ideal map.

References

  1. Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11)
    –1334, 2000.
  2. R. Y. Tsai. A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses. IEEE Journal on Robotics and Automation, 3(4)
    –344, 1987.
  3. P. F. Sturm, S. J. Maybank. On Plane-Based Camera Calibration: A General Algorithm, Singularities, Applications. IEEE CVPR, 1999.
  4. J. Weng, P. Cohen, M. Herniou. Camera Calibration with Distortion Models and Accuracy Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10)
    –980, 1992.
  5. R. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004.