BRIEF: Binary Robust Independent Elementary Features | VitaVision
Back to atlas

BRIEF: Binary Robust Independent Elementary Features

7 min readIntermediateView in graph
Based on
BRIEF: Binary Robust Independent Elementary Features
Calonder, Lepetit, Strecha, Fua · Lecture notes in computer science 2010
DOI ↗

Goal

Encode a smoothed image patch around a previously detected keypoint as a ndn_d-bit binary string by running a fixed table of ndn_d pairwise pixel-intensity tests, then match between images by Hamming distance via bitwise XOR and popcount. Input: a greyscale image I:ΩRI: \Omega \to \mathbb{R} and a list of keypoint locations from any external detector. Output: a binary descriptor f{0,1}nd\mathbf{f} \in \{0, 1\}^{n_d} per keypoint, with nd{128,256,512}n_d \in \{128, 256, 512\}. The algorithm is specific to the use of binary tests on a fixed integer offset table — there is no learned codebook, no orientation assignment, no scale-space construction; description and matching are integer-only after the smoothing stage.

Algorithm

Let I:ΩRI: \Omega \to \mathbb{R} denote the greyscale image. Let p:PRp: \mathcal{P} \to \mathbb{R} denote the Gaussian-smoothed patch on the keypoint-centred window P\mathcal{P} of size S×SS \times S. Let SS denote the patch side length in pixels. Let σ\sigma denote the standard deviation of the smoothing Gaussian and w×ww \times w the discrete kernel window. Let ndn_d denote the descriptor bit length. Let (xi,yi)(x_i, y_i) denote the ii-th integer pixel-pair offset within P\mathcal{P}. Let τ\tau denote a single pixel-pair test and fndf_{n_d} the packed descriptor.

Definition
Pixel-pair test

Single-bit comparison between two pixels of the smoothed patch. Equation 1 in the paper.

τ(p;x,y)={1if p(x)<p(y),0otherwise.\tau(p;\, x, y) = \begin{cases} 1 & \text{if } p(x) < p(y), \\ 0 & \text{otherwise}. \end{cases}
Definition
Binary descriptor

Packed concatenation of ndn_d tests on a fixed offset table {(xi,yi)}i=1nd\{(x_i, y_i)\}_{i=1}^{n_d}. Equation 2 in the paper.

fnd(p)=1ind2i1τ(p;xi,yi).f_{n_d}(p) = \sum_{1 \le i \le n_d} 2^{i-1}\, \tau(p;\, x_i, y_i).
Definition
Hamming distance

Bit-disagreement count between two descriptors, evaluated as a bitwise XOR followed by a population count.

dH(f,g)=popcount(fg).d_H(\mathbf{f}, \mathbf{g}) = \operatorname{popcount}(\mathbf{f} \oplus \mathbf{g}).

The smoothing parameters σ=2\sigma = 2 and w=9w = 9 produce stable recognition rates across the tested datasets; values of σ\sigma between 1 and 3 give similar results, while σ0\sigma \to 0 degrades sharply because individual pixel comparisons are noise-sensitive. The patch size S=48S = 48 pixels is the convention from the paper's experiments. The descriptor lengths are nd{128,256,512}n_d \in \{128, 256, 512\}, named BRIEF-16, BRIEF-32, BRIEF-64 by storage in bytes.

Sampling distributions

The offset table {(xi,yi)}\{(x_i, y_i)\} is drawn once before any descriptors are computed and reused thereafter. Five distributions are evaluated; G II dominates the others empirically and is the recommended default.

  • G I. (xi,yi)Uniform(S/2,S/2)2(x_i, y_i) \sim \text{Uniform}(-S/2,\, S/2)^2 i.i.d.
  • G II. (xi,yi)N ⁣(0,S2/25)2(x_i, y_i) \sim \mathcal{N}\!\bigl(0,\, S^2/25\bigr)^2 i.i.d., isotropic.
  • G III. xiN(0,S2/25)x_i \sim \mathcal{N}(0,\, S^2/25), yiN(xi,S2/100)y_i \sim \mathcal{N}(x_i,\, S^2/100) (two-stage; second sample concentrated near the first).
  • G IV. (xi,yi)(x_i, y_i) drawn at random from a coarse polar grid.
  • G V. xi=(0,0)x_i = (0, 0)^\top for all ii; yiy_i scans every position of a coarse polar grid of ndn_d points.
Algorithm
BRIEF descriptor extraction
Input: Greyscale image II; keypoint locations {kj}\{\mathbf{k}_j\}; fixed offset table {(xi,yi)}i=1nd\{(x_i, y_i)\}_{i=1}^{n_d}; smoothing σ\sigma, ww.
Output: Per-keypoint binary descriptors {fj}{0,1}nd\{\mathbf{f}_j\} \subset \{0, 1\}^{n_d}.
  1. Smooth II with a discrete Gaussian of width w×ww \times w and scale σ\sigma (or replace by a box-filter approximation on an integral image).
  2. For each keypoint kj\mathbf{k}_j, extract the S×SS \times S patch pp centred at kj\mathbf{k}_j from the smoothed image.
  3. For each i{1,,nd}i \in \{1, \ldots, n_d\}, compute τi=τ(p;xi,yi)\tau_i = \tau(p;\, x_i, y_i) as a single integer pixel comparison.
  4. Pack the ndn_d bits into nd/64\lceil n_d / 64 \rceil 64-bit words to obtain fj\mathbf{f}_j.
  5. Match across images by computing dH(fa,fb)d_H(\mathbf{f}_a, \mathbf{f}_b) with bitwise XOR plus a popcount instruction; nearest-neighbour search with a left-right consistency check is recommended for outlier removal.

Implementation

Per-keypoint test-and-pack in Rust on an already-smoothed greyscale image. The offset table is integer and shared across all calls; the descriptor is stored as 64-bit words so that matching reduces to word-wise XOR and count_ones().

const ND: usize = 256;                  // BRIEF-32: 256 bits = 4 u64 words
const WORDS: usize = ND / 64;
type Descriptor = [u64; WORDS];

// Fixed offset table, generated once by sampling from G II
// (i.i.d. Gaussian centred at the patch origin with sigma^2 = S^2/25).
struct Pair { x: (i32, i32), y: (i32, i32) }
type OffsetTable = [Pair; ND];

fn intensity(img: &[u8], w: usize, h: usize, kx: i32, ky: i32, dx: i32, dy: i32) -> u8 {
    let x = (kx + dx).clamp(0, w as i32 - 1) as usize;
    let y = (ky + dy).clamp(0, h as i32 - 1) as usize;
    img[y * w + x]
}

fn brief(smoothed: &[u8], w: usize, h: usize, kx: i32, ky: i32, table: &OffsetTable) -> Descriptor {
    let mut f: Descriptor = [0; WORDS];
    for (i, pair) in table.iter().enumerate() {
        let pa = intensity(smoothed, w, h, kx, ky, pair.x.0, pair.x.1);
        let pb = intensity(smoothed, w, h, kx, ky, pair.y.0, pair.y.1);
        if pa < pb {
            f[i / 64] |= 1u64 << (i % 64);
        }
    }
    f
}

fn hamming(a: &Descriptor, b: &Descriptor) -> u32 {
    a.iter().zip(b.iter()).map(|(x, y)| (x ^ y).count_ones()).sum()
}

Remarks

  • Per-keypoint description is O(nd)O(n_d) integer pixel comparisons; matching is O(nd/64)O(n_d / 64) word-level XOR plus popcount. Smoothing dominates the total runtime — replacing the 9×99 \times 9 Gaussian with a box-filter approximation on an integral image is the practical speedup path.
  • Storage is 16, 32, or 64 bytes per descriptor for nd{128,256,512}n_d \in \{128, 256, 512\}, against 256 bytes for a 64-D float SURF descriptor.
  • The descriptor is rotation-sensitive: recognition rate is approximately constant up to 10101515^\circ in-plane rotation and drops steeply beyond, because no orientation normalisation is applied.
  • Distribution G II (i.i.d. Gaussian, σ2=S2/25\sigma^2 = S^2/25) consistently outperforms the four alternatives across the paper's benchmark; the regular polar design G V is the worst.
  • A constant offset table is required for any two descriptors to be comparable — resampling between query and database is a silent match failure.
  • ORB extends BRIEF by steering the offset table SS to the keypoint orientation θ\theta via a 30-bin angle lookup table (2π/302\pi/30 resolution), then replaces the i.i.d. Gaussian sampling of the offsets with 256 tests selected by greedy uncorrelated-test search over 205590\sim 205\,590 candidate pairs on 300000\sim 300\,000 PASCAL 2006 keypoints. Naive steering alone collapses descriptor variance; the learned offset table is what restores it. ORB's 31×3131 \times 31 patch and 5×55 \times 5 sub-window sizes also differ from BRIEF's 48×4848 \times 48 patch — the two descriptors are not bit-for-bit comparable.

References

  1. M. Calonder, V. Lepetit, C. Strecha, P. Fua. BRIEF: Binary Robust Independent Elementary Features. Lecture Notes in Computer Science, 2010. link.springer.com
  2. E. Rosten, T. Drummond. Machine Learning for High-Speed Corner Detection. European Conference on Computer Vision, 2006.
  3. H. Bay, T. Tuytelaars, L. Van Gool. SURF: Speeded Up Robust Features. Lecture Notes in Computer Science, 2006.
  4. D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004.

Extended by

Compared with

Has learned alternative

  • SuperPoint

    SuperPoint replaces the FAST+BRIEF / SIFT / ORB classical pipeline with a single learned encoder + decoder heads.

  • XFeat

    XFeat replaces the FAST+BRIEF binary-descriptor pipeline with a featherweight learned model targeting CPU inference.