Goal
Detect oriented keypoints from a greyscale image at multiple scales and compute a 256-bit binary descriptor per keypoint suitable for Hamming-distance matching. Input: a greyscale image . Output: a list of keypoints — pixel location, orientation angle, and pyramid level — each paired with a 256-bit binary descriptor . The defining property is a rotation-invariant binary descriptor pipeline (oFAST + rBRIEF) that operates at video rate on a single CPU core without GPU or SIMD acceleration.
Algorithm
Let denote the greyscale input image. Let denote the image at pyramid level , with . Let denote the scale factor between successive pyramid levels. Let denote the total number of pyramid levels. Let denote the target total keypoint count. Let denote the half-width of the 31-pixel orientation patch (so the patch covers in both dimensions). Let denote the Harris cornerness response used to rank FAST candidates. Let denote the integral-image-smoothed patch on a window centred at a keypoint. Let denote the patch side length. Let denote the sub-window side length used in each binary test. Let denote the descriptor bit length. Let denote the matrix of test-pair offsets. Let denote the rotation matrix for angle . Let denote the steered offset matrix. Let denote a single pixel-pair test. Let denote the packed binary descriptor. Let denote the steered descriptor operator.
oFAST: oriented FAST
First- and zeroth-order image moments integrated over the circular patch of radius centred at the keypoint. Equation 1 of Rublee et al. 2011.
The centre of mass of the patch intensity distribution, used to define a rotation-stable reference direction. Equation 2 of Rublee et al. 2011.
The angle from the patch centre to the intensity centroid, quantised to the nearest of 30 bins of width for the lookup table. Equation 3 of Rublee et al. 2011.
- Build a -level image pyramid by downsampling at successive scale factor , using area-based interpolation.
- At each level , run the FAST-9 segment test with threshold set adaptively so that at least candidate keypoints are produced.
- Compute the Harris cornerness response at each candidate; retain the top candidates per level ranked by .
- For each surviving keypoint, build the integral image of over the window and accumulate , , .
- Compute and quantise to the nearest bin of the lookup table.
rBRIEF: rotated BRIEF
A single bit that compares the mean intensities of two sub-windows within the patch, read from the integral image. Equation 4 of Rublee et al. 2011.
Packed concatenation of pixel-pair tests on the offset table . Equation 5 of Rublee et al. 2011.
The descriptor evaluated with the test offsets rotated to the keypoint's orientation , using the precomputed steered set . Equation 6 of Rublee et al. 2011.
The candidate test pool is drawn from the possible non-overlapping sub-window positions inside the patch. Pairing two sub-windows and discarding pairs whose support overlaps yields usable test pairs.
- Quantise to the nearest of 30 bins: .
- Look up the steered offset pair list , a precomputed integer table.
- For each test , read two sub-window sums from the integral image at offsets centred on ; compute as a single integer comparison of the sub-window means.
- Pack the 256 bits into 4 consecutive 64-bit words to obtain .
Greedy uncorrelated-test selection
This one-time offline step produces the steered table . It is run once on the PASCAL 2006 training set and the resulting table is compiled into the algorithm.
- For each of the candidate pairs, evaluate over all training patches; compute the mean and sort pairs in ascending order of (pairs near 0.5 have maximum variance).
- Initialise the selected set to the highest-variance pair.
- For each subsequent candidate (in sorted order), compute the absolute pairwise correlation with every already-selected test; admit the candidate if its maximum absolute correlation with the current selection is below the adaptive threshold.
- If fewer than 256 tests are admitted, raise the correlation threshold and restart from step 2.
- The final 256-test ordered list defines ; the 30 steered versions are precomputed for and stored as integer tables in .
Implementation
The runtime descriptor packing in Rust, given a precomputed steered offset table indexed by angle bin:
const N: usize = 256;
const WORDS: usize = N / 64;
const BINS: usize = 30;
type Descriptor = [u64; WORDS];
struct SteerTable {
offsets: [[(i32, i32); 2]; N],
}
fn box_sum(ii: &[u32], stride: usize, h: usize, cx: i32, cy: i32, r: i32) -> u32 {
let x0 = (cx - r).max(0) as usize;
let y0 = (cy - r).max(0) as usize;
let x1 = (cx + r).min(stride as i32 - 1) as usize;
let y1 = (cy + r).min(h as i32 - 1) as usize;
ii[y1 * stride + x1]
.wrapping_sub(ii[y0 * stride + x1])
.wrapping_sub(ii[y1 * stride + x0])
.wrapping_add(ii[y0 * stride + x0])
}
fn rbrief(
ii: &[u32], stride: usize, h: usize,
kx: i32, ky: i32, theta: f32,
table: &[SteerTable; BINS],
) -> Descriptor {
let step = 2.0 * std::f32::consts::PI / BINS as f32;
let bin = ((theta / step).round() as isize).rem_euclid(BINS as isize) as usize;
let st = &table[bin];
let half_wt: i32 = 2;
let mut d: Descriptor = [0u64; WORDS];
for (i, pair) in st.offsets.iter().enumerate() {
let s1 = box_sum(ii, stride, h, kx + pair[0].0, ky + pair[0].1, half_wt);
let s2 = box_sum(ii, stride, h, kx + pair[1].0, ky + pair[1].1, half_wt);
if s1 < s2 {
d[i / 64] |= 1u64 << (i % 64);
}
}
d
}
fn hamming(a: &Descriptor, b: &Descriptor) -> u32 {
a.iter().zip(b.iter()).map(|(x, y)| (x ^ y).count_ones()).sum()
}
Remarks
- Per-keypoint description costs integer comparisons — one integral-image box-sum pair per test — plus word-level XOR + popcount for matching. Pyramid construction, FAST, and the Harris ranking dominate frame time over the descriptor packing itself.
- Scale is handled by a discrete 5-level pyramid; a keypoint's scale is the pyramid level at which it survived non-maximum suppression, not a continuous sub-level interpolation. Rotation is handled by the intensity-centroid orientation, which is reliable only when is bounded away from zero.
- Naively rotating the BRIEF test set collapses descriptor variance: the eigenvalue spectrum after PCA on steered-but-unlearned BRIEF is far sharper than for unsteered BRIEF. The greedy decorrelated selection is what restores the variance — without the learned offset table, steering hurts more than it helps.
- ORB is in-plane rotation invariant only. The intensity-centroid orientation is a single scalar and cannot encode 3-D tilt independently; out-of-plane rotation degrades matching, and the binary descriptor provides no recovery mechanism.
- Common extensions include per-keypoint scale interpolation across pyramid levels, GPU and SIMD popcount acceleration, and learned descriptor stacks that replace the entire oFAST + rBRIEF pipeline with a jointly trained network.
References
- E. Rublee, V. Rabaud, K. Konolige, G. Bradski. ORB: An efficient alternative to SIFT or SURF. ICCV, 2011. doi.org
- M. Calonder, V. Lepetit, C. Strecha, P. Fua. BRIEF: Binary Robust Independent Elementary Features. Lecture Notes in Computer Science, 2010. link.springer.com
- E. Rosten, T. Drummond. Machine Learning for High-Speed Corner Detection. European Conference on Computer Vision, 2006.
- C. Harris, M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988.
- D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004.