EPnP: O(n) Perspective-n-Point | VitaVision
Back to atlas

EPnP: O(n) Perspective-n-Point

10 min readAdvancedView in graph
Based on
EPnP: An Accurate O(n) Solution to the PnP Problem
Lepetit, Moreno-Noguer, Fua · International Journal of Computer Vision 2009
DOI ↗

Goal

Recover camera pose [Rt][\mathbf{R} \mid \mathbf{t}] from n4n \geq 4 known 3D-to-2D point correspondences and a calibrated intrinsic matrix A\mathbf{A}. Input: nn reference points piwR3\mathbf{p}_i^w \in \mathbb{R}^3 with known world coordinates, their 2D image projections uiR2\mathbf{u}_i \in \mathbb{R}^2, and A\mathbf{A} containing focal lengths fuf_u, fvf_v and principal point (uc,vc)(u_c, v_c). Output: rotation R\mathbf{R} and translation t\mathbf{t} placing the world frame in the camera frame. The defining property is an O(n) non-iterative formulation: every reference point is expressed as a weighted sum of four virtual control points, reducing the unknowns to a constant-size 12-vector regardless of nn. The dominant cost — forming MM\mathbf{M}^\top\mathbf{M} — is linear in nn and dominates for n15n \gtrsim 15.

Algorithm

Let piwR3\mathbf{p}_i^w \in \mathbb{R}^3 denote the ii-th reference point in world coordinates, i=1,,ni = 1, \ldots, n. Let ui=(ui,vi)R2\mathbf{u}_i = (u_i, v_i) \in \mathbb{R}^2 denote its measured 2D projection. Let A\mathbf{A} denote the intrinsic matrix with focal lengths fuf_u, fvf_v and principal point (uc,vc)(u_c, v_c). Let cjw\mathbf{c}_j^w, j=1,,4j = 1, \ldots, 4, denote the four virtual control points in world coordinates (three for the planar case). Let αijR\alpha_{ij} \in \mathbb{R} denote the barycentric weight of control point jj for reference point ii. Let cjcR3\mathbf{c}_j^c \in \mathbb{R}^3 denote the camera-frame coordinates of control point jj. Let x=[c1c,c2c,c3c,c4c]R12\mathbf{x} = [\mathbf{c}_1^{c\top}, \mathbf{c}_2^{c\top}, \mathbf{c}_3^{c\top}, \mathbf{c}_4^{c\top}]^\top \in \mathbb{R}^{12} denote the unknown 12-vector of camera-frame control-point coordinates. Let N{1,2,3,4}N \in \{1, 2, 3, 4\} denote the effective null-space dimension of MM\mathbf{M}^\top\mathbf{M}. Let viR12\mathbf{v}_i \in \mathbb{R}^{12} denote the ii-th null eigenvector of MM\mathbf{M}^\top\mathbf{M}, and βi\beta_i its coefficient.

Each reference point decomposes as an affine combination of the control points:

Definition
Barycentric decomposition
piw=j=14αijcjw,j=14αij=1.\mathbf{p}_i^w = \sum_{j=1}^{4} \alpha_{ij}\, \mathbf{c}_j^w, \qquad \sum_{j=1}^{4} \alpha_{ij} = 1.

Rigid motion preserves affine combinations, so the same weights hold in camera coordinates:

pic=j=14αijcjc.\mathbf{p}_i^c = \sum_{j=1}^{4} \alpha_{ij}\, \mathbf{c}_j^c.

Substituting the camera-frame decomposition into the perspective projection and eliminating the projective scale wi=jαijzjcw_i = \sum_j \alpha_{ij} z_j^c yields two linear constraints per reference point:

j=14αijfuxjc+αij(ucui)zjc=0,j=14αijfvyjc+αij(vcvi)zjc=0.\begin{aligned} \sum_{j=1}^{4} \alpha_{ij} f_u\, x_j^c + \alpha_{ij}(u_c - u_i)\, z_j^c &= 0, \\ \sum_{j=1}^{4} \alpha_{ij} f_v\, y_j^c + \alpha_{ij}(v_c - v_i)\, z_j^c &= 0. \end{aligned}

Stacking all nn pairs gives the homogeneous system

Mx=0,M\mathbf{x} = \mathbf{0},

where M\mathbf{M} is 2n×122n \times 12 (or 2n×92n \times 9 for the planar case). The matrix MM\mathbf{M}^\top\mathbf{M} is always 12×1212 \times 12 and is computed in O(n)O(n) time.

Definition
Null-space combination

The solution lies in the null space of M\mathbf{M}, expressed as a linear combination of the NN null eigenvectors vi\mathbf{v}_i of MM\mathbf{M}^\top\mathbf{M}:

x=i=1Nβivi.\mathbf{x} = \sum_{i=1}^{N} \beta_i\, \mathbf{v}_i.
Definition
Reprojection residual

The candidate NN is selected by minimum reprojection error over all nn correspondences:

res=idist2 ⁣(A[Rt][piw1],ui).\mathrm{res} = \sum_i \mathrm{dist}^2\!\left(\mathbf{A}[\mathbf{R}\mid\mathbf{t}]\begin{bmatrix}\mathbf{p}_i^w \\ 1\end{bmatrix},\, \mathbf{u}_i\right).

β recovery

The βi\beta_i coefficients are determined by requiring that inter-control-point distances computed from x\mathbf{x} match the known world-frame distances — six quadratic constraints. Four closed-form cases handle the effective null-space dimension NN.

Case N=1N = 1. A single eigenvector v1\mathbf{v}_1 spans the null space; β1\beta_1 is obtained in closed form as a ratio of dot products over squared norms,

β=ciwcjw2v1[i]v1[j]2.\beta = \sqrt{\frac{\|\mathbf{c}_i^w - \mathbf{c}_j^w\|^2}{\|\mathbf{v}_1^{[i]} - \mathbf{v}_1^{[j]}\|^2}}.

The sign of β1\beta_1 is chosen so that all camera-frame control points have positive zz-coordinates.

Cases N=2N = 2 and N=3N = 3. Expanding the pairwise distance constraints and linearising the bilinear products βab=βaβb\beta_{ab} = \beta_a \beta_b as new unknowns yields a linear system

Lβ=ρ,L\boldsymbol{\beta} = \boldsymbol{\rho},

where ρ\boldsymbol{\rho} is the 6-vector of squared world-frame inter-control-point distances ciwcjw2\|\mathbf{c}_i^w - \mathbf{c}_j^w\|^2. For N=2N = 2, LR6×3L \in \mathbb{R}^{6 \times 3} and β=[β11,β12,β22]\boldsymbol{\beta} = [\beta_{11}, \beta_{12}, \beta_{22}]^\top — solved by pseudoinverse. For N=3N = 3, LR6×6L \in \mathbb{R}^{6 \times 6} and β=[β11,β12,β13,β22,β23,β33]\boldsymbol{\beta} = [\beta_{11}, \beta_{12}, \beta_{13}, \beta_{22}, \beta_{23}, \beta_{33}]^\top — solved by direct inverse.

Case N=4N = 4. With ten βab\beta_{ab} products and only six distance constraints, the system is underdetermined. The relinearisation technique adds commutativity equations,

βabβcd=βaβbβcβd=βabβcd,\beta_{ab}\beta_{cd} = \beta_a\beta_b\beta_c\beta_d = \beta_{a'b'}\beta_{c'd'},

where {a,b,c,d}\{a', b', c', d'\} is any permutation of {a,b,c,d}\{a, b, c, d\}. These four additional scalar equations close a 10×1010 \times 10 linear system. The same relinearisation is required in the planar case for N3N \geq 3, where the six distance constraints drop to three.

Definition
Gauss-Newton objective

The optional constant-time refinement minimises pairwise camera-frame distance residuals over only four scalar βj\beta_j parameters:

Error(β)=(i,j)i<j(ciccjc2ciwcjw2)2,\mathrm{Error}(\boldsymbol{\beta}) = \sum_{\substack{(i,j) \\ i < j}} \left(\|\mathbf{c}_i^c - \mathbf{c}_j^c\|^2 - \|\mathbf{c}_i^w - \mathbf{c}_j^w\|^2\right)^2,

with control-point camera coordinates parameterised as

cic=j=14βjvj[i].\mathbf{c}_i^c = \sum_{j=1}^{4} \beta_j\, \mathbf{v}_j^{[i]}.

Fewer than 10 Gauss-Newton iterations are required; the overall complexity remains O(n)O(n).

Procedure

Algorithm
EPnP pose estimation
Input: Reference points piw\mathbf{p}_i^w (i=1,,ni = 1, \ldots, n, n4n \geq 4); 2D projections ui\mathbf{u}_i; intrinsic matrix A\mathbf{A} with fu,fv,uc,vcf_u, f_v, u_c, v_c.
Output: Camera pose [Rt][\mathbf{R} \mid \mathbf{t}].
  1. Select control points. Set c1w\mathbf{c}_1^w to the centroid of the nn reference points; set c2w,c3w,c4w\mathbf{c}_2^w, \mathbf{c}_3^w, \mathbf{c}_4^w to the scaled principal directions of the point cloud — analogous to DLT normalisation. For planar input, use three control points.
  2. Solve barycentric weights. For each ii, compute αij\alpha_{ij} from piw=jαijcjw\mathbf{p}_i^w = \sum_j \alpha_{ij} \mathbf{c}_j^w with jαij=1\sum_j \alpha_{ij} = 1. The weights transfer unchanged to camera coordinates.
  3. Build M\mathbf{M}. For each reference point, form two rows from the linear constraints above. M\mathbf{M} is 2n×122n \times 12 (or 2n×92n \times 9 for the planar case).
  4. Compute MM\mathbf{M}^\top\mathbf{M}. The product is 12×1212 \times 12 and costs O(n)O(n). Extract the NN smallest-eigenvalue eigenvectors vi\mathbf{v}_i.
  5. Solve for β\boldsymbol{\beta} for each candidate N{1,2,3,4}N \in \{1, 2, 3, 4\}. Apply the appropriate closed-form case from the β recovery section. Compute the reprojection residual for each candidate and retain the NN that minimises it.
  6. Reconstruct camera-frame control points. Form x=iβivi\mathbf{x} = \sum_i \beta_i \mathbf{v}_i; partition into cjc\mathbf{c}_j^c. Choose signs of βa\beta_a so that all zjc>0z_j^c > 0.
  7. Recover [Rt][\mathbf{R} \mid \mathbf{t}]. Apply absolute orientation (Horn / Arun) aligning {cjc}\{\mathbf{c}_j^c\} onto {cjw}\{\mathbf{c}_j^w\}. This step operates on exactly four point pairs and is constant-time.
  8. Optionally refine. Minimise the Gauss-Newton objective over β=[β1,β2,β3,β4]\boldsymbol{\beta} = [\beta_1, \beta_2, \beta_3, \beta_4]^\top with control-point coordinates parameterised by the four-eigenvector combination; fewer than 10 iterations suffice.

Implementation

The per-correspondence row builder for M\mathbf{M}, corresponding line-by-line to the two linear constraints derived above:

/// Build two rows of M for one reference point.
/// `alpha`: barycentric weights α_{ij}, j = 0..4
/// `u`, `v`: 2D projection of the reference point
/// `fu`, `fv`, `uc`, `vc`: calibrated intrinsics
fn build_m_rows(
    alpha: &[f32; 4],
    u: f32, v: f32,
    fu: f32, fv: f32,
    uc: f32, vc: f32,
) -> [[f32; 12]; 2] {
    let mut row_u = [0.0f32; 12];
    let mut row_v = [0.0f32; 12];
    for j in 0..4 {
        let a = alpha[j];
        // α_ij · fu · x_j^c + α_ij · (uc − u) · z_j^c = 0
        row_u[3 * j]     = a * fu;
        row_u[3 * j + 1] = 0.0;
        row_u[3 * j + 2] = a * (uc - u);
        // α_ij · fv · y_j^c + α_ij · (vc − v) · z_j^c = 0
        row_v[3 * j]     = 0.0;
        row_v[3 * j + 1] = a * fv;
        row_v[3 * j + 2] = a * (vc - v);
    }
    [row_u, row_v]
}

The full M\mathbf{M} is the vertical stack of build_m_rows over all nn correspondences. The next steps form MM\mathbf{M}^\top\mathbf{M} (12×1212 \times 12, O(n)O(n)), extract its null eigenvectors, solve for β\boldsymbol{\beta} in each of the four NN cases, and recover [Rt][\mathbf{R}\mid\mathbf{t}] from the four control-point pairs by absolute orientation.

Remarks

  • The MM\mathbf{M}^\top\mathbf{M} product is 12×1212 \times 12 and costs O(n)O(n); its eigendecomposition is constant-time. For n15n \gtrsim 15 the product dominates the total runtime; below that threshold the constant-time eigendecomposition and β\boldsymbol{\beta}-solve remain the bottleneck.
  • The effective null-space dimension NN depends on focal length and measurement noise. As focal length grows toward orthographic projection, all four smallest eigenvalues of MM\mathbf{M}^\top\mathbf{M} approach zero simultaneously. At higher noise, the distribution of NN shifts toward N=3N = 3 and N=4N = 4. All four candidates are always computed; the reprojection selector chooses the best.
  • Planar reference-point configurations require three control points. The rank-9 system has only three pairwise distance constraints (not six); relinearisation is required for N3N \geq 3.
  • Tilted planar reference points carry an inherent two-fold pose ambiguity. The closed-form solution does not arbitrate between the two valid poses, and the Gauss-Newton step on inter-control-point distances cannot break the ambiguity either.
  • Wrap EPnP in Fischler–Bolles RANSAC for outlier-robust pose estimation; the original real-image experiment uses sample size 7 and runs EPnP on the resulting inlier set for final pose refinement. The optional Gauss-Newton step is independent of the outlier-rejection loop.

References

  1. V. Lepetit, F. Moreno-Noguer, P. Fua. EPnP: An Accurate O(n) Solution to the PnP Problem. International Journal of Computer Vision, 2009. hdl/2117/10327
  2. M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6)
    –395, 1981. dl.acm.org