Goal
Recover camera pose from known 3D-to-2D point correspondences and a calibrated intrinsic matrix . Input: reference points with known world coordinates, their 2D image projections , and containing focal lengths , and principal point . Output: rotation and translation placing the world frame in the camera frame. The defining property is an O(n) non-iterative formulation: every reference point is expressed as a weighted sum of four virtual control points, reducing the unknowns to a constant-size 12-vector regardless of . The dominant cost — forming — is linear in and dominates for .
Algorithm
Let denote the -th reference point in world coordinates, . Let denote its measured 2D projection. Let denote the intrinsic matrix with focal lengths , and principal point . Let , , denote the four virtual control points in world coordinates (three for the planar case). Let denote the barycentric weight of control point for reference point . Let denote the camera-frame coordinates of control point . Let denote the unknown 12-vector of camera-frame control-point coordinates. Let denote the effective null-space dimension of . Let denote the -th null eigenvector of , and its coefficient.
Each reference point decomposes as an affine combination of the control points:
Rigid motion preserves affine combinations, so the same weights hold in camera coordinates:
Substituting the camera-frame decomposition into the perspective projection and eliminating the projective scale yields two linear constraints per reference point:
Stacking all pairs gives the homogeneous system
where is (or for the planar case). The matrix is always and is computed in time.
The solution lies in the null space of , expressed as a linear combination of the null eigenvectors of :
The candidate is selected by minimum reprojection error over all correspondences:
β recovery
The coefficients are determined by requiring that inter-control-point distances computed from match the known world-frame distances — six quadratic constraints. Four closed-form cases handle the effective null-space dimension .
Case . A single eigenvector spans the null space; is obtained in closed form as a ratio of dot products over squared norms,
The sign of is chosen so that all camera-frame control points have positive -coordinates.
Cases and . Expanding the pairwise distance constraints and linearising the bilinear products as new unknowns yields a linear system
where is the 6-vector of squared world-frame inter-control-point distances . For , and — solved by pseudoinverse. For , and — solved by direct inverse.
Case . With ten products and only six distance constraints, the system is underdetermined. The relinearisation technique adds commutativity equations,
where is any permutation of . These four additional scalar equations close a linear system. The same relinearisation is required in the planar case for , where the six distance constraints drop to three.
The optional constant-time refinement minimises pairwise camera-frame distance residuals over only four scalar parameters:
with control-point camera coordinates parameterised as
Fewer than 10 Gauss-Newton iterations are required; the overall complexity remains .
Procedure
- Select control points. Set to the centroid of the reference points; set to the scaled principal directions of the point cloud — analogous to DLT normalisation. For planar input, use three control points.
- Solve barycentric weights. For each , compute from with . The weights transfer unchanged to camera coordinates.
- Build . For each reference point, form two rows from the linear constraints above. is (or for the planar case).
- Compute . The product is and costs . Extract the smallest-eigenvalue eigenvectors .
- Solve for for each candidate . Apply the appropriate closed-form case from the β recovery section. Compute the reprojection residual for each candidate and retain the that minimises it.
- Reconstruct camera-frame control points. Form ; partition into . Choose signs of so that all .
- Recover . Apply absolute orientation (Horn / Arun) aligning onto . This step operates on exactly four point pairs and is constant-time.
- Optionally refine. Minimise the Gauss-Newton objective over with control-point coordinates parameterised by the four-eigenvector combination; fewer than 10 iterations suffice.
Implementation
The per-correspondence row builder for , corresponding line-by-line to the two linear constraints derived above:
/// Build two rows of M for one reference point.
/// `alpha`: barycentric weights α_{ij}, j = 0..4
/// `u`, `v`: 2D projection of the reference point
/// `fu`, `fv`, `uc`, `vc`: calibrated intrinsics
fn build_m_rows(
alpha: &[f32; 4],
u: f32, v: f32,
fu: f32, fv: f32,
uc: f32, vc: f32,
) -> [[f32; 12]; 2] {
let mut row_u = [0.0f32; 12];
let mut row_v = [0.0f32; 12];
for j in 0..4 {
let a = alpha[j];
// α_ij · fu · x_j^c + α_ij · (uc − u) · z_j^c = 0
row_u[3 * j] = a * fu;
row_u[3 * j + 1] = 0.0;
row_u[3 * j + 2] = a * (uc - u);
// α_ij · fv · y_j^c + α_ij · (vc − v) · z_j^c = 0
row_v[3 * j] = 0.0;
row_v[3 * j + 1] = a * fv;
row_v[3 * j + 2] = a * (vc - v);
}
[row_u, row_v]
}
The full is the vertical stack of build_m_rows over all correspondences. The next steps form (, ), extract its null eigenvectors, solve for in each of the four cases, and recover from the four control-point pairs by absolute orientation.
Remarks
- The product is and costs ; its eigendecomposition is constant-time. For the product dominates the total runtime; below that threshold the constant-time eigendecomposition and -solve remain the bottleneck.
- The effective null-space dimension depends on focal length and measurement noise. As focal length grows toward orthographic projection, all four smallest eigenvalues of approach zero simultaneously. At higher noise, the distribution of shifts toward and . All four candidates are always computed; the reprojection selector chooses the best.
- Planar reference-point configurations require three control points. The rank-9 system has only three pairwise distance constraints (not six); relinearisation is required for .
- Tilted planar reference points carry an inherent two-fold pose ambiguity. The closed-form solution does not arbitrate between the two valid poses, and the Gauss-Newton step on inter-control-point distances cannot break the ambiguity either.
- Wrap EPnP in Fischler–Bolles RANSAC for outlier-robust pose estimation; the original real-image experiment uses sample size 7 and runs EPnP on the resulting inlier set for final pose refinement. The optional Gauss-Newton step is independent of the outlier-rejection loop.
References
- V. Lepetit, F. Moreno-Noguer, P. Fua. EPnP: An Accurate O(n) Solution to the PnP Problem. International Journal of Computer Vision, 2009. hdl/2117/10327
- M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6)–395, 1981. dl.acm.org