Goal

Encode a smoothed image patch around a previously detected keypoint as a $n_d$ -bit binary string by running a fixed table of $n_d$ pairwise pixel-intensity tests, then match between images by Hamming distance via bitwise XOR and popcount. Input: a greyscale image $I: \Omega \to \mathbb{R}$ and a list of keypoint locations from any external detector. Output: a binary descriptor $\mathbf{f} \in \{0, 1\}^{n_d}$ per keypoint, with $n_d \in \{128, 256, 512\}$ . The algorithm is specific to the use of binary tests on a fixed integer offset table — there is no learned codebook, no orientation assignment, no scale-space construction; description and matching are integer-only after the smoothing stage.

Algorithm

Let $I: \Omega \to \mathbb{R}$ denote the greyscale image. Let $p: \mathcal{P} \to \mathbb{R}$ denote the Gaussian-smoothed patch on the keypoint-centred window $\mathcal{P}$ of size $S \times S$ . Let $S$ denote the patch side length in pixels. Let $\sigma$ denote the standard deviation of the smoothing Gaussian and $w \times w$ the discrete kernel window. Let $n_d$ denote the descriptor bit length. Let $(x_i, y_i)$ denote the $i$ -th integer pixel-pair offset within $\mathcal{P}$ . Let $\tau$ denote a single pixel-pair test and $f_{n_d}$ the packed descriptor.

Definition

Pixel-pair test

Single-bit comparison between two pixels of the smoothed patch. Equation 1 in the paper.

\tau(p;\, x, y) = \begin{cases} 1 & \text{if } p(x) < p(y), \\ 0 & \text{otherwise}. \end{cases}

Definition

Binary descriptor

Packed concatenation of $n_d$ tests on a fixed offset table $\{(x_i, y_i)\}_{i=1}^{n_d}$ . Equation 2 in the paper.

f_{n_d}(p) = \sum_{1 \le i \le n_d} 2^{i-1}\, \tau(p;\, x_i, y_i).

Definition

Hamming distance

Bit-disagreement count between two descriptors, evaluated as a bitwise XOR followed by a population count.

d_H(\mathbf{f}, \mathbf{g}) = \operatorname{popcount}(\mathbf{f} \oplus \mathbf{g}).

The smoothing parameters $\sigma = 2$ and $w = 9$ produce stable recognition rates across the tested datasets; values of $\sigma$ between 1 and 3 give similar results, while $\sigma \to 0$ degrades sharply because individual pixel comparisons are noise-sensitive. The patch size $S = 48$ pixels is the convention from the paper's experiments. The descriptor lengths are $n_d \in \{128, 256, 512\}$ , named BRIEF-16, BRIEF-32, BRIEF-64 by storage in bytes.

Sampling distributions

The offset table $\{(x_i, y_i)\}$ is drawn once before any descriptors are computed and reused thereafter. Five distributions are evaluated; G II dominates the others empirically and is the recommended default.

G I. $(x_i, y_i) \sim \text{Uniform}(-S/2,\, S/2)^2$ i.i.d.
G II. $(x_i, y_i) \sim \mathcal{N}\!\bigl(0,\, S^2/25\bigr)^2$ i.i.d., isotropic.
G III. $x_i \sim \mathcal{N}(0,\, S^2/25)$ , $y_i \sim \mathcal{N}(x_i,\, S^2/100)$ (two-stage; second sample concentrated near the first).
G IV. $(x_i, y_i)$ drawn at random from a coarse polar grid.
G V. $x_i = (0, 0)^\top$ for all $i$ ; $y_i$ scans every position of a coarse polar grid of $n_d$ points.

Algorithm

BRIEF descriptor extraction

Input: Greyscale image

I

; keypoint locations

\{\mathbf{k}_j\}

; fixed offset table

\{(x_i, y_i)\}_{i=1}^{n_d}

; smoothing

\sigma

w

Output: Per-keypoint binary descriptors

\{\mathbf{f}_j\} \subset \{0, 1\}^{n_d}

Smooth $I$ with a discrete Gaussian of width $w \times w$ and scale $\sigma$ (or replace by a box-filter approximation on an integral image).
For each keypoint $\mathbf{k}_j$ , extract the $S \times S$ patch $p$ centred at $\mathbf{k}_j$ from the smoothed image.
For each $i \in \{1, \ldots, n_d\}$ , compute $\tau_i = \tau(p;\, x_i, y_i)$ as a single integer pixel comparison.
Pack the $n_d$ bits into $\lceil n_d / 64 \rceil$ 64-bit words to obtain $\mathbf{f}_j$ .
Match across images by computing $d_H(\mathbf{f}_a, \mathbf{f}_b)$ with bitwise XOR plus a popcount instruction; nearest-neighbour search with a left-right consistency check is recommended for outlier removal.

Implementation

Per-keypoint test-and-pack in Rust on an already-smoothed greyscale image. The offset table is integer and shared across all calls; the descriptor is stored as 64-bit words so that matching reduces to word-wise XOR and count_ones().

const ND: usize = 256;                  // BRIEF-32: 256 bits = 4 u64 words
const WORDS: usize = ND / 64;
type Descriptor = [u64; WORDS];

// Fixed offset table, generated once by sampling from G II
// (i.i.d. Gaussian centred at the patch origin with sigma^2 = S^2/25).
struct Pair { x: (i32, i32), y: (i32, i32) }
type OffsetTable = [Pair; ND];

fn intensity(img: &[u8], w: usize, h: usize, kx: i32, ky: i32, dx: i32, dy: i32) -> u8 {
    let x = (kx + dx).clamp(0, w as i32 - 1) as usize;
    let y = (ky + dy).clamp(0, h as i32 - 1) as usize;
    img[y * w + x]
}

fn brief(smoothed: &[u8], w: usize, h: usize, kx: i32, ky: i32, table: &OffsetTable) -> Descriptor {
    let mut f: Descriptor = [0; WORDS];
    for (i, pair) in table.iter().enumerate() {
        let pa = intensity(smoothed, w, h, kx, ky, pair.x.0, pair.x.1);
        let pb = intensity(smoothed, w, h, kx, ky, pair.y.0, pair.y.1);
        if pa < pb {
            f[i / 64] |= 1u64 << (i % 64);
        }
    }
    f
}

fn hamming(a: &Descriptor, b: &Descriptor) -> u32 {
    a.iter().zip(b.iter()).map(|(x, y)| (x ^ y).count_ones()).sum()
}

Remarks

Per-keypoint description is $O(n_d)$ integer pixel comparisons; matching is $O(n_d / 64)$ word-level XOR plus popcount. Smoothing dominates the total runtime — replacing the $9 \times 9$ Gaussian with a box-filter approximation on an integral image is the practical speedup path.
Storage is 16, 32, or 64 bytes per descriptor for $n_d \in \{128, 256, 512\}$ , against 256 bytes for a 64-D float SURF descriptor.
The descriptor is rotation-sensitive: recognition rate is approximately constant up to $10$ – $15^\circ$ in-plane rotation and drops steeply beyond, because no orientation normalisation is applied.
Distribution G II (i.i.d. Gaussian, $\sigma^2 = S^2/25$ ) consistently outperforms the four alternatives across the paper's benchmark; the regular polar design G V is the worst.
A constant offset table is required for any two descriptors to be comparable — resampling between query and database is a silent match failure.
ORB extends BRIEF by steering the offset table $S$ to the keypoint orientation $\theta$ via a 30-bin angle lookup table ( $2\pi/30$ resolution), then replaces the i.i.d. Gaussian sampling of the offsets with 256 tests selected by greedy uncorrelated-test search over $\sim 205\,590$ candidate pairs on $\sim 300\,000$ PASCAL 2006 keypoints. Naive steering alone collapses descriptor variance; the learned offset table is what restores it. ORB's $31 \times 31$ patch and $5 \times 5$ sub-window sizes also differ from BRIEF's $48 \times 48$ patch — the two descriptors are not bit-for-bit comparable.

References

M. Calonder, V. Lepetit, C. Strecha, P. Fua. BRIEF: Binary Robust Independent Elementary Features. Lecture Notes in Computer Science, 2010. link.springer.com
E. Rosten, T. Drummond. Machine Learning for High-Speed Corner Detection. European Conference on Computer Vision, 2006.
H. Bay, T. Tuytelaars, L. Van Gool. SURF: Speeded Up Robust Features. Lecture Notes in Computer Science, 2006.
D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004.

BRIEF: Binary Robust Independent Elementary Features

Goal

Algorithm

Sampling distributions

Implementation

Remarks

References

Prerequisites

Extended by

Compared with

Has learned alternative