ORB: Oriented FAST and Rotated BRIEF | VitaVision
Back to atlas

ORB: Oriented FAST and Rotated BRIEF

8 min readIntermediateView in graph
Based on
ORB: An efficient alternative to SIFT or SURF
Rublee, Rabaud, Konolige, Bradski · ICCV 2011
DOI ↗

Goal

Detect oriented keypoints from a greyscale image at multiple scales and compute a 256-bit binary descriptor per keypoint suitable for Hamming-distance matching. Input: a greyscale image II. Output: a list of keypoints (x,y,θ,)(x, y, \theta, \ell) — pixel location, orientation angle, and pyramid level — each paired with a 256-bit binary descriptor d{0,1}256\mathbf{d} \in \{0,1\}^{256}. The defining property is a rotation-invariant binary descriptor pipeline (oFAST + rBRIEF) that operates at video rate on a single CPU core without GPU or SIMD acceleration.

Algorithm

Let I:ΩRI: \Omega \to \mathbb{R} denote the greyscale input image. Let II_\ell denote the image at pyramid level \ell, with I0=II_0 = I. Let s=2s = \sqrt{2} denote the scale factor between successive pyramid levels. Let L=5L = 5 denote the total number of pyramid levels. Let NN denote the target total keypoint count. Let r=15r = 15 denote the half-width of the 31-pixel orientation patch (so the patch covers [r,r][-r, r] in both dimensions). Let RHR_H denote the Harris cornerness response used to rank FAST candidates. Let pp denote the integral-image-smoothed patch on a 31×3131 \times 31 window centred at a keypoint. Let wp=31w_p = 31 denote the patch side length. Let wt=5w_t = 5 denote the sub-window side length used in each binary test. Let n=256n = 256 denote the descriptor bit length. Let SS denote the 2×n2 \times n matrix of test-pair offsets. Let RθR_\theta denote the 2×22 \times 2 rotation matrix for angle θ\theta. Let Sθ=RθSS_\theta = R_\theta S denote the steered offset matrix. Let τ\tau denote a single pixel-pair test. Let fnf_n denote the packed binary descriptor. Let gng_n denote the steered descriptor operator.

oFAST: oriented FAST

Definition
Intensity moment

First- and zeroth-order image moments integrated over the circular patch of radius rr centred at the keypoint. Equation 1 of Rublee et al. 2011.

mpq=x[r,r]  y[r,r]xpyqI(x,y).m_{pq} = \sum_{x \in [-r,\, r]}\; \sum_{y \in [-r,\, r]} x^p\, y^q\, I(x, y).
Definition
Intensity centroid

The centre of mass of the patch intensity distribution, used to define a rotation-stable reference direction. Equation 2 of Rublee et al. 2011.

C= ⁣(m10m00,  m01m00).C = \!\left(\frac{m_{10}}{m_{00}},\;\frac{m_{01}}{m_{00}}\right).
Definition
Patch orientation

The angle from the patch centre to the intensity centroid, quantised to the nearest of 30 bins of width 2π/302\pi/30 for the lookup table. Equation 3 of Rublee et al. 2011.

θ=atan2(m01,  m10).\theta = \operatorname{atan2}(m_{01},\; m_{10}).
Algorithm
oFAST keypoint detection
Input: Greyscale image II; pyramid levels L=5L = 5; scale factor s=2s = \sqrt{2}; target count NN; FAST-9 threshold tt; patch radius r=15r = 15.
Output: Oriented keypoints {(xk,yk,θk,k)}\{(x_k, y_k, \theta_k, \ell_k)\}.
  1. Build a 55-level image pyramid {I}=04\{I_\ell\}_{\ell=0}^{4} by downsampling II at successive scale factor 2\sqrt{2}, using area-based interpolation.
  2. At each level \ell, run the FAST-9 segment test with threshold tt set adaptively so that at least N/LN/L candidate keypoints are produced.
  3. Compute the Harris cornerness response RHR_H at each candidate; retain the top N/LN/L candidates per level ranked by RHR_H.
  4. For each surviving keypoint, build the integral image of II_\ell over the [r,r][-r, r] window and accumulate m00m_{00}, m10m_{10}, m01m_{01}.
  5. Compute θ=atan2(m01,m10)\theta = \operatorname{atan2}(m_{01}, m_{10}) and quantise to the nearest bin of the 2π/302\pi/30 lookup table.

rBRIEF: rotated BRIEF

Definition
Pixel-pair test

A single bit that compares the mean intensities of two 5×55 \times 5 sub-windows within the 31×3131 \times 31 patch, read from the integral image. Equation 4 of Rublee et al. 2011.

τ(p;  x,y)={1if p(x)<p(y),0otherwise.\tau(p;\; x, y) = \begin{cases} 1 & \text{if } p(x) < p(y), \\ 0 & \text{otherwise.} \end{cases}
Definition
Binary descriptor

Packed concatenation of n=256n = 256 pixel-pair tests on the offset table {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^{n}. Equation 5 of Rublee et al. 2011.

fn(p)=1in2i1τ(p;  xi,  yi).f_n(p) = \sum_{1 \le i \le n} 2^{i-1}\, \tau(p;\; x_i,\; y_i).
Definition
Steered descriptor

The descriptor evaluated with the test offsets rotated to the keypoint's orientation θ\theta, using the precomputed steered set Sθ=RθSS_\theta = R_\theta S. Equation 6 of Rublee et al. 2011.

gn(p,θ)=fn(p)    (xi,yi)Sθ.g_n(p,\, \theta) = f_n(p)\;\big|\;(x_i, y_i) \in S_\theta.

The candidate test pool is drawn from the (wpwt)2=676(w_p - w_t)^2 = 676 possible non-overlapping 5×55 \times 5 sub-window positions inside the 31×3131 \times 31 patch. Pairing two sub-windows and discarding pairs whose support overlaps yields M=205,590M = 205{,}590 usable test pairs.

Algorithm
rBRIEF descriptor extraction
Input: Oriented keypoint (xk,yk,θk)(x_k, y_k, \theta_k); integral image of II_\ell; precomputed steered table T\mathcal{T} indexed by 30 angle bins; n=256n = 256.
Output: 256-bit descriptor dk\mathbf{d}_k stored as 4×644 \times 64-bit words.
  1. Quantise θk\theta_k to the nearest of 30 bins: b=θk/(2π/30)mod30b = \lfloor \theta_k / (2\pi/30) \rceil \bmod 30.
  2. Look up the steered offset pair list Sθk=T[b]S_{\theta_k} = \mathcal{T}[b], a precomputed 2×2562 \times 256 integer table.
  3. For each test i{1,,256}i \in \{1, \ldots, 256\}, read two 5×55 \times 5 sub-window sums from the integral image at offsets (xi,yi)(x_i, y_i) centred on (xk,yk)(x_k, y_k); compute τi\tau_i as a single integer comparison of the sub-window means.
  4. Pack the 256 bits τi\tau_i into 4 consecutive 64-bit words to obtain dk\mathbf{d}_k.

Greedy uncorrelated-test selection

This one-time offline step produces the steered table T\mathcal{T}. It is run once on the PASCAL 2006 training set and the resulting table is compiled into the algorithm.

Algorithm
Greedy decorrelated test selection
Input: Training set of  ⁣300,000\approx\!300{,}000 keypoint patches from PASCAL 2006; candidate pool of M=205,590M = 205{,}590 test pairs; target length n=256n = 256.
Output: Ordered list of 256 test pairs forming the rBRIEF offset table.
  1. For each of the MM candidate pairs, evaluate τ\tau over all training patches; compute the mean τˉ\bar\tau and sort pairs in ascending order of τˉ0.5|\bar\tau - 0.5| (pairs near 0.5 have maximum variance).
  2. Initialise the selected set to the highest-variance pair.
  3. For each subsequent candidate (in sorted order), compute the absolute pairwise correlation with every already-selected test; admit the candidate if its maximum absolute correlation with the current selection is below the adaptive threshold.
  4. If fewer than 256 tests are admitted, raise the correlation threshold and restart from step 2.
  5. The final 256-test ordered list defines SS; the 30 steered versions Sθ=RθSS_\theta = R_\theta S are precomputed for θ{0,2π/30,,292π/30}\theta \in \{0,\, 2\pi/30,\, \ldots,\, 29 \cdot 2\pi/30\} and stored as integer tables in T\mathcal{T}.

Implementation

The runtime descriptor packing in Rust, given a precomputed steered offset table indexed by angle bin:

const N: usize = 256;
const WORDS: usize = N / 64;
const BINS: usize = 30;

type Descriptor = [u64; WORDS];

struct SteerTable {
    offsets: [[(i32, i32); 2]; N],
}

fn box_sum(ii: &[u32], stride: usize, h: usize, cx: i32, cy: i32, r: i32) -> u32 {
    let x0 = (cx - r).max(0) as usize;
    let y0 = (cy - r).max(0) as usize;
    let x1 = (cx + r).min(stride as i32 - 1) as usize;
    let y1 = (cy + r).min(h as i32 - 1) as usize;
    ii[y1 * stride + x1]
        .wrapping_sub(ii[y0 * stride + x1])
        .wrapping_sub(ii[y1 * stride + x0])
        .wrapping_add(ii[y0 * stride + x0])
}

fn rbrief(
    ii: &[u32], stride: usize, h: usize,
    kx: i32, ky: i32, theta: f32,
    table: &[SteerTable; BINS],
) -> Descriptor {
    let step = 2.0 * std::f32::consts::PI / BINS as f32;
    let bin = ((theta / step).round() as isize).rem_euclid(BINS as isize) as usize;
    let st = &table[bin];
    let half_wt: i32 = 2;
    let mut d: Descriptor = [0u64; WORDS];
    for (i, pair) in st.offsets.iter().enumerate() {
        let s1 = box_sum(ii, stride, h, kx + pair[0].0, ky + pair[0].1, half_wt);
        let s2 = box_sum(ii, stride, h, kx + pair[1].0, ky + pair[1].1, half_wt);
        if s1 < s2 {
            d[i / 64] |= 1u64 << (i % 64);
        }
    }
    d
}

fn hamming(a: &Descriptor, b: &Descriptor) -> u32 {
    a.iter().zip(b.iter()).map(|(x, y)| (x ^ y).count_ones()).sum()
}

Remarks

  • Per-keypoint description costs O(n)O(n) integer comparisons — one integral-image box-sum pair per test — plus n/64\lceil n / 64 \rceil word-level XOR + popcount for matching. Pyramid construction, FAST, and the Harris ranking dominate frame time over the descriptor packing itself.
  • Scale is handled by a discrete 5-level 2\sqrt{2} pyramid; a keypoint's scale is the pyramid level at which it survived non-maximum suppression, not a continuous sub-level interpolation. Rotation is handled by the intensity-centroid orientation, which is reliable only when C|C| is bounded away from zero.
  • Naively rotating the BRIEF test set collapses descriptor variance: the eigenvalue spectrum after PCA on steered-but-unlearned BRIEF is far sharper than for unsteered BRIEF. The greedy decorrelated selection is what restores the variance — without the learned offset table, steering hurts more than it helps.
  • ORB is in-plane rotation invariant only. The intensity-centroid orientation is a single scalar and cannot encode 3-D tilt independently; out-of-plane rotation degrades matching, and the binary descriptor provides no recovery mechanism.
  • Common extensions include per-keypoint scale interpolation across pyramid levels, GPU and SIMD popcount acceleration, and learned descriptor stacks that replace the entire oFAST + rBRIEF pipeline with a jointly trained network.

References

  1. E. Rublee, V. Rabaud, K. Konolige, G. Bradski. ORB: An efficient alternative to SIFT or SURF. ICCV, 2011. doi.org
  2. M. Calonder, V. Lepetit, C. Strecha, P. Fua. BRIEF: Binary Robust Independent Elementary Features. Lecture Notes in Computer Science, 2010. link.springer.com
  3. E. Rosten, T. Drummond. Machine Learning for High-Speed Corner Detection. European Conference on Computer Vision, 2006.
  4. C. Harris, M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988.
  5. D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004.

Extends

Fed by

Has learned alternative

  • SuperPoint

    SuperPoint replaces ORB's oFAST + rBRIEF detector-descriptor bundle with a single learned encoder + dual decoder heads; descriptor matching is float-valued L2 instead of Hamming.

  • XFeat

    XFeat targets ORB-class deployment budgets (mobile, real-time, low-power CPU) and replaces ORB's hand-crafted oFAST + rBRIEF binary pipeline with a learned 64-D float descriptor.