Definition
The pinhole camera model is the projective map from a 3-D scene point to a 2-D image pixel through a single centre of projection — the optical centre — in which every ray from the scene passes through that centre and strikes the image plane at a unique location. No lens optics, aperture, or depth-of-field effects are modelled: the camera is an ideal perspective projector.
Given a scene point in homogeneous world coordinates, its image in homogeneous pixel coordinates satisfies
where is the intrinsic matrix, is the extrinsic matrix encoding the rigid transformation from world to camera frame, and denotes equality up to a nonzero scale factor. Input: a 3-D point in world coordinates. Output: a 2-D point in pixel coordinates.
The up-to-scale relation reflects the homogeneous-coordinate ambiguity: multiplying by any nonzero scalar gives the same pixel. Recovering the absolute depth from alone is impossible; depth is the information irreversibly discarded by the projection.
Mathematical Description
Intrinsic matrix
The intrinsic matrix encodes how the 3-D optical geometry maps to the sensor's discrete pixel grid:
- and are the focal lengths in pixels — the physical focal length divided by the pixel pitches , . Zhang writes these , Weng et al. .
- is the principal point — the pixel coordinates where the optical axis meets the image plane. Tsai fixes it at the image centre and instead calibrates a scale factor to absorb CCD scanning uncertainty.
- is the skew, non-zero only when the pixel axes are not perpendicular. Modern digital sensors have ; the parameter is retained for generality.
has five degrees of freedom in general, four when zero skew is enforced.
Extrinsic transform
The extrinsic matrix concatenates a rotation and translation , mapping a world point to camera coordinates . Tsai parameterises the rotation by yaw, pitch, and roll; Zhang uses the Rodrigues 3-vector to keep the Jacobian unconstrained during Levenberg-Marquardt refinement. The extrinsic parameters are per-view: each image of a calibration target yields its own .
Projection matrix and normalised coordinates
The projection matrix combines intrinsics and extrinsics, , giving the expanded pixel projection
with the -th row of . The normalised image coordinates are the camera-frame coordinates before intrinsic scaling,
so that and . These normalised coordinates are the input to the distortion model.
Calibration homography and planar targets
When the calibration target is planar — placed at — the third column of drops out and the projection reduces to a plane-to-image homography,
with the first two columns of . Because and are orthonormal, the product — the image of the absolute conic — satisfies two linear constraints per homography, and . Stacking two rows per view across views yields a homogeneous system whose null vector encodes the five intrinsic parameters. Sturm and Maybank derive the same constraints independently in the form , with .
Departure from the ideal model
Real lenses displace the normalised coordinates from their ideal positions by radial, tangential, and thin-prism components. The pinhole model describes the undistorted ideal case; the additive correction is treated separately in camera-distortion-models.
Numerical Concerns
Homogeneous-coordinate scale ambiguity. The third component of is the scene depth , the divisor that recovers pixel coordinates. A near-zero — a point at or behind the camera — makes the projection ill-defined and must be guarded in any implementation.
Principal-point and focal-length correlation. The principal point is statistically correlated with the radial distortion coefficients and, to a lesser degree, the focal length. Calibration sets with narrow angular diversity — all views near fronto-parallel — yield a poorly conditioned constraint system; inter-view rotations near from the image plane give the best conditioning.
Pixel vs metric units. Focal lengths in pixels are dimensionless ratios, whereas the physical focal length and pixel pitches carry units. Mixing the two conventions in a single Jacobian is a common error source.
Skew near zero. For virtually all digital sensors , which drives the IAC entry and makes the extraction of from numerically stable. Cameras with genuine skew face a less stable extraction.
Planar degeneracy and minimum views. A single view of a planar target gives only 2 independent constraints on the 5-DOF intrinsic matrix; at least 3 views at non-parallel orientations are required for a fully determined system, or 2 with the zero-skew prior. Parallel planes contribute linearly dependent rows regardless of view count — a provable rank deficiency, not a conditioning issue.
Normalisation for the linear estimate. The DLT system for the calibration homography is poorly conditioned when raw pixel and world coordinates are mixed; isotropic normalisation of both point sets is required before assembly and inverted afterwards.
Where it appears
The pinhole camera model is the shared projection foundation of every calibration and pose-estimation algorithm in the atlas.
- zhang-planar-calibration — recovers and per-view from multiple planar views via the IAC linear system derived from the factorisation.
- tsai-versatile-calibration — recovers and from a 3-D rig via the radial alignment constraint, parameterising by effective focal length and scale factor .
- sturm-plane-based-calibration — derives the same two IAC constraints per homography independently, with an exhaustive singularity catalogue for degenerate plane configurations.
- scaramuzza-omni-calibration — replaces the perspective projection with a polynomial omnidirectional model; the standard pinhole projection is its small-field-of-view limit.
- epnp — solves for the extrinsic pose given a calibrated camera and 2D-3D correspondences, assuming the pinhole projection.
- camera-distortion-models — the additive correction to the normalised image coordinates that accounts for real-lens departure from the ideal map.
References
- Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11)–1334, 2000.
- R. Y. Tsai. A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses. IEEE Journal on Robotics and Automation, 3(4)–344, 1987.
- P. F. Sturm, S. J. Maybank. On Plane-Based Camera Calibration: A General Algorithm, Singularities, Applications. IEEE CVPR, 1999.
- J. Weng, P. Cohen, M. Herniou. Camera Calibration with Distortion Models and Accuracy Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10)–980, 1992.
- R. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004.