Definition
Pose estimation is the recovery of the 6-DOF rigid transformation that places a camera relative to a scene, an object, or a second camera. The transformation is a rotation and translation ; together they form the extrinsic matrix of the pinhole camera model. Two structurally distinct sub-problems are in common use, distinguished by the input correspondences.
Given a calibrated camera with intrinsic matrix and correspondences between known 3-D world points and their observed image projections , recover such that for all . Input: 2D-3D point pairs and . Output: the camera pose .
Given two calibrated views with corresponding image points in normalised coordinates, recover the rotation and translation direction (unit norm, scale unobservable) satisfying the epipolar constraint , where is the essential matrix. Input: corresponding normalised points. Output: and up to scale, with a four-fold ambiguity resolved by cheirality.
Mathematical Description
Absolute pose: Perspective-n-Point
The calibrated PnP problem asks for given 2D-3D correspondences and . EPnP solves it in by expressing every reference point as a barycentric combination of four virtual control points,
Because affine combinations are preserved by rigid motion, the same weights hold in camera coordinates, . Substituting the camera-frame decomposition into the perspective projection and eliminating the projective scale yields two linear equations per point; stacking points gives the homogeneous system , whose solution lies in the null space of the constant-size matrix ,
with the effective null-space dimension. The are fixed by requiring inter-control-point distances in camera coordinates to match the world distances, and the pose is extracted from the recovered control points by absolute orientation. The minimal cases are (planar) and (general position).
Relative pose: the essential matrix
For two calibrated views with normalised coordinates, the essential matrix encodes the relative pose through the bilinear epipolar constraint , with
the skew-symmetric matrix of the unit translation . Each correspondence supplies one linear constraint on the nine entries of ; eight correspondences determine their ratios by linear least squares, and the scale is fixed by . The translation components follow from , whose diagonal entries are and off-diagonal entries ; the rotation follows in closed form. The recovered pose carries a four-fold sign ambiguity, resolved by cheirality — all reconstructed points must lie in front of both cameras.
Pose from a calibration target
During calibration, per-view extrinsic pose is an implicit output. For a planar target the projection reduces to the homography with ; the rotation columns and translation are extracted directly,
with enforcing unit norm and completing the rotation.
Nonlinear refinement
Closed-form solutions seed a nonlinear refinement that minimises the total reprojection error over the six pose degrees of freedom — and, during calibration, jointly over the intrinsics.
Numerical Concerns
Rotation parameterisation. A rotation matrix has nine entries under six constraints; unconstrained optimisation of the entries violates orthogonality. The Rodrigues 3-vector keeps all nine entries consistent and the Jacobian unconstrained; a unit quaternion is the alternative, with a unit-norm side constraint handled by normalisation.
Translation scale ambiguity. Relative pose recovers only as a direction; absolute scale is unobservable from image data alone and requires a metric anchor — a known distance, a stereo baseline, or an IMU.
Planar and collinear degeneracy. A coplanar, tilted point set admits two valid PnP solutions; collinear points make the constraint matrix rank-deficient regardless of . For the eight-point algorithm, coplanar or collinear configurations similarly cause rank loss.
Sensitivity to correspondence noise. PnP solvers and the eight-point algorithm are pure model fitters — a single mismatch degrades the solution. The standard remedy wraps the solver in a RANSAC loop over minimal samples.
Minimal vs overdetermined solvers. The minimal calibrated absolute-pose solver is P3P; the minimal relative-pose solver is the five-point algorithm. EPnP targets the overdetermined regime () and is suited to run after RANSAC inlier selection.
Reprojection error vs pose error. Reprojection error is the measurable surrogate for pose error, but minimising it does not strictly minimise rotation or translation error; near degeneracy or at high noise the two criteria disagree.
Where it appears
Pose estimation appears at every layer of the calibration and localisation pipeline.
- epnp — the calibrated PnP solver: absolute pose in from 2D-3D correspondences with a known intrinsic matrix.
- longuet-higgins-eight-point — the foundational linear algorithm for relative pose from calibrated correspondences; introduces the essential matrix and cheirality resolution.
- zhang-planar-calibration — recovers per-view extrinsic pose alongside intrinsics; pose extraction follows from the per-view homography factorisation.
- tsai-versatile-calibration — recovers extrinsic pose from a 3-D calibration rig via the radial alignment constraint.
Pose is defined within the pinhole-camera-model: it is precisely the extrinsic component of the central projection equation.
References
- V. Lepetit, F. Moreno-Noguer, P. Fua. EPnP: An Accurate O(n) Solution to the PnP Problem. International Journal of Computer Vision, 81(2)–166, 2009.
- H. C. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 293–135, 1981.
- Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11)–1334, 2000.
- G. Schweighofer, A. Pinz. Robust Pose Estimation from a Planar Target. IEEE TPAMI, 28(12)–2030, 2006.
- D. Nistér. An Efficient Solution to the Five-Point Relative Pose Problem. IEEE TPAMI, 26(6)–777, 2004.