Felzenszwalb–Huttenlocher Graph-Based Image Segmentation | VitaVision
Back to atlas

Felzenszwalb–Huttenlocher Graph-Based Image Segmentation

8 min readIntermediateView in graph
Based on
Efficient Graph-Based Image Segmentation
Felzenszwalb, Huttenlocher · International Journal of Computer Vision 2004
DOI ↗

Goal

Partition an image I:ΩRI: \Omega \to \mathbb{R} (or I:ΩR3I: \Omega \to \mathbb{R}^3 for colour) into perceptually coherent regions without fixing the number of regions. Input: an image II; a non-negative dissimilarity weight graph G=(V,E)G = (V, E) over pixels (8-connected grid or kk-nearest-neighbour in feature space); a Gaussian smoothing scale σ\sigma; a scale-of-observation parameter kk; and an optional minimum component size mminm_\text{min}. Output: a segmentation S={C1,,Cr}S = \{C_1, \ldots, C_r\} — a partition of VV into connected components — satisfying the formal properties of being neither too fine nor too coarse: no neighbouring pair of output components lacks evidence of a boundary (Theorem 1), and no proper refinement of SS also satisfies that condition (Theorem 2). The boundary predicate is

D(C1,C2)=[Dif(C1,C2)>MInt(C1,C2)],D(C_1, C_2) = \bigl[\,\text{Dif}(C_1, C_2) > \text{MInt}(C_1, C_2)\,\bigr],

with the size-adaptive threshold τ(C)=k/C\tau(C) = k/|C| controlling the scale at which evidence accumulates. The greedy Kruskal-style merge runs in O(mlogm)O(m \log m)O(nlogn)O(n \log n) for grid graphs where m=O(n)m = O(n).

Algorithm

Let G=(V,E)G = (V, E) denote the graph of pixels and their connections. Let w(vi,vj)w(v_i, v_j) denote the non-negative dissimilarity weight on edge (vi,vj)E(v_i, v_j) \in E. Let CVC \subseteq V denote a component (segment). Let C|C| denote the number of vertices in CC. Let MST(C,E)\mathrm{MST}(C, E) denote the minimum spanning tree of CC using edges in EE. Let k>0k > 0 denote the scale-of-observation parameter. Let σ0\sigma \geq 0 denote the Gaussian smoothing scale. Let mminm_\text{min} denote the optional minimum component size.

Definition
Internal variation

The maximum edge weight in the minimum spanning tree of component CC — the heaviest link that must be cut to split the component:

Int(C)=maxeMST(C,E)w(e).(Eq. 1)\text{Int}(C) = \max_{e \in \mathrm{MST}(C, E)} w(e). \tag{Eq. 1}
Definition
Pairwise difference

The minimum weight over all edges crossing the boundary between C1C_1 and C2C_2 — the cheapest available bridge:

Dif(C1,C2)=min(vi,vj)EviC1,vjC2w(vi,vj).(Eq. 2)\text{Dif}(C_1, C_2) = \min_{\substack{(v_i, v_j) \in E \\ v_i \in C_1,\; v_j \in C_2}} w(v_i, v_j). \tag{Eq. 2}
Definition
Minimum internal difference

The smaller of the two components' internal variations, each inflated by a size-adaptive prior τ(C)=k/C\tau(C) = k/|C|:

MInt(C1,C2)=min ⁣(Int(C1)+τ(C1),  Int(C2)+τ(C2)),(Eq. 4)\text{MInt}(C_1, C_2) = \min\!\bigl(\text{Int}(C_1) + \tau(C_1),\; \text{Int}(C_2) + \tau(C_2)\bigr), \tag{Eq. 4}

where

τ(C)=kC.(Eq. 5)\tau(C) = \frac{k}{|C|}. \tag{Eq. 5}

For a singleton component Int(C)=0\text{Int}(C) = 0; the prior τ\tau supplies the minimum required cross-boundary gap before the component is kept separate. As C|C| grows, τ0\tau \to 0 and the criterion approaches a bare MST-weight comparison. The parameter kk sets the scale of observation but is not a minimum component size — a small component survives when the boundary evidence is strong.

The boundary predicate D(C1,C2)D(C_1, C_2) is true (a boundary exists between C1C_1 and C2C_2) when:

D(C1,C2)=[Dif(C1,C2)>MInt(C1,C2)].(Eq. 3)D(C_1, C_2) = \bigl[\,\text{Dif}(C_1, C_2) > \text{MInt}(C_1, C_2)\,\bigr]. \tag{Eq. 3}

Graph constructions

Grid graph. Each pixel pip_i is a vertex; edges connect each pixel to its 8 spatial neighbours with weight w(vi,vj)=I(pi)I(pj)w(v_i, v_j) = |I(p_i) - I(p_j)|. The edge count is m=O(n)m = O(n), giving O(nlogn)O(n \log n) total runtime. Components formed by this construction are spatially connected.

Feature-space (nearest-neighbour) graph. Each pixel is embedded as the five-dimensional point (x,y,r,g,b)(x, y, r, g, b) with L2 distance as edge weight. Edges connect each pixel to its 10 nearest neighbours; an approximate nearest-neighbour structure builds the graph in sub-quadratic time. This construction permits spatially disjoint but texturally coherent regions to be co-segmented.

Procedure

Algorithm
Felzenszwalb–Huttenlocher graph segmentation
Input: Image II; graph G=(V,E)G = (V, E) with m=Em = |E| edges carrying non-negative weights; parameters σ\sigma, kk, mminm_\text{min}.
Output: Segmentation SS — a partition of VV into components satisfying DD neither too fine nor too coarse (Theorems 1–2). Each component labelled by its union-find representative.
  1. Smooth. Apply a Gaussian of scale σ=0.8\sigma = 0.8 to II to suppress digitisation noise. For grid graphs, compute edge weights w(vi,vj)=I(pi)I(pj)w(v_i, v_j) = |I(p_i) - I(p_j)| on the smoothed image. For feature-space graphs, compute L2 distances in the (x,y,r,g,b)(x, y, r, g, b) embedding.
  2. Sort edges. Produce the ordered sequence π=(o1,o2,,om)\pi = (o_1, o_2, \ldots, o_m) of all edges in non-decreasing order of weight. For 8-bit integer weights, counting sort achieves this in O(m)O(m).
  3. Initialise. Assign each vertex viVv_i \in V to its own singleton component; set Int(Ci0)=0\text{Int}(C_i^0) = 0 for all ii. Use a union-find structure with path compression and union by rank.
  4. Greedy merge. For q=1,,mq = 1, \ldots, m in order: let oq=(vi,vj)o_q = (v_i, v_j). If viv_i and vjv_j belong to distinct components Ciq1C_i^{q-1} and Cjq1C_j^{q-1}, and w(oq)MInt(Ciq1,Cjq1)w(o_q) \leq \text{MInt}(C_i^{q-1}, C_j^{q-1}), then merge Ciq1C_i^{q-1} and Cjq1C_j^{q-1} into a single component; update Int\text{Int} of the merged component to w(oq)w(o_q) in O(1)O(1) (Lemma 1 — the merge edge is the maximum-weight MST edge of the merged component). Otherwise leave the partition unchanged.
  5. Post-filter (optional). For each edge (vi,vj)(v_i, v_j) whose endpoints lie in different components: if min(Ci,Cj)<mmin\min(|C_i|, |C_j|) < m_\text{min}, merge those two components unconditionally.
  6. Return the final partition SmS^m.

The O(1)O(1) per-merge update of Int\text{Int} follows from the sorted edge order: because edges are visited in non-decreasing weight, the merge-triggering edge for any pair is always the minimum-weight edge between those components, and therefore the maximum-weight edge in the resulting component's MST.

Implementation

The greedy merge loop in Rust:

struct UnionFind {
    parent: Vec<usize>,
    rank:   Vec<usize>,
    size:   Vec<usize>,
    int:    Vec<f32>,   // Int(C): max MST edge weight
}

impl UnionFind {
    fn new(n: usize) -> Self {
        Self {
            parent: (0..n).collect(),
            rank:   vec![0; n],
            size:   vec![1; n],
            int:    vec![0.0; n],
        }
    }

    fn find(&mut self, x: usize) -> usize {
        if self.parent[x] != x { self.parent[x] = self.find(self.parent[x]); }
        self.parent[x]
    }

    fn merge(&mut self, a: usize, b: usize, w: f32) {
        let (ra, rb) = (self.find(a), self.find(b));
        let (root, child) = if self.rank[ra] >= self.rank[rb] { (ra, rb) } else { (rb, ra) };
        self.parent[child] = root;
        if self.rank[root] == self.rank[child] { self.rank[root] += 1; }
        self.size[root]  += self.size[child];
        self.int[root]    = w;              // merge edge is max MST edge (Lemma 1)
    }
}

/// `edges`: sorted (weight, u, v) triples. `k`: scale parameter. `min_size`: 0 to skip.
fn segment(n: usize, edges: &[(f32, usize, usize)], k: f32, min_size: usize) -> UnionFind {
    let mut uf = UnionFind::new(n);
    for &(w, u, v) in edges {
        let (ru, rv) = (uf.find(u), uf.find(v));
        if ru == rv { continue; }
        let mint = (uf.int[ru] + k / uf.size[ru] as f32)
            .min(uf.int[rv] + k / uf.size[rv] as f32);
        if w <= mint { uf.merge(ru, rv, w); }
    }
    if min_size > 0 {
        for &(w, u, v) in edges {
            let (ru, rv) = (uf.find(u), uf.find(v));
            if ru != rv && (uf.size[ru] < min_size || uf.size[rv] < min_size) {
                uf.merge(ru, rv, w);
            }
        }
    }
    uf
}

The equivalent merge loop in Python with a minimal union-find:

import numpy as np

class UF:
    def __init__(self, n):
        self.p    = list(range(n))
        self.rank = [0] * n
        self.size = [1] * n
        self.int_ = [0.0] * n          # Int(C)

    def find(self, x):
        while self.p[x] != x:
            self.p[x] = self.p[self.p[x]]
            x = self.p[x]
        return x

    def union(self, a, b, w):
        a, b = self.find(a), self.find(b)
        if self.rank[a] < self.rank[b]: a, b = b, a
        self.p[b] = a
        if self.rank[a] == self.rank[b]: self.rank[a] += 1
        self.size[a] += self.size[b]
        self.int_[a]  = w              # Lemma 1: merge edge is max MST edge

def felzenszwalb(n, edges_sorted, k, min_size=0):
    """edges_sorted: array of (w, u, v) sorted by w ascending."""
    uf = UF(n)
    for w, u, v in edges_sorted:
        ru, rv = uf.find(u), uf.find(v)
        if ru == rv: continue
        mint = min(uf.int_[ru] + k / uf.size[ru], uf.int_[rv] + k / uf.size[rv])
        if w <= mint:
            uf.union(ru, rv, w)
    if min_size:
        for w, u, v in edges_sorted:
            ru, rv = uf.find(u), uf.find(v)
            if ru != rv and min(uf.size[ru], uf.size[rv]) < min_size:
                uf.union(ru, rv, w)
    return uf

Remarks

  • Complexity. The overall runtime is O(mlogm)O(m \log m) dominated by the edge sort. For grid graphs (m=O(n)m = O(n)) this reduces to O(nlogn)O(n \log n); for 8-bit integer weights, counting sort lowers the sort to O(m)O(m). Union-find operations (steps 3–4) cost O(mα(m,n))O(m\,\alpha(m, n)) where α\alpha is the inverse Ackermann function, effectively O(m)O(m).
  • kk scales with image resolution. Because τ(C)=k/C\tau(C) = k/|C|, the same numeric value of kk produces coarser segmentation on larger images. The paper uses k=150k = 150 for 128×128128 \times 128 images and k=300k = 300 for 320×240320 \times 240 or larger; rescale kk proportionally to pixel count when applying to new image sizes.
  • Single cheap bridge causes chaining. Dif(C1,C2)\text{Dif}(C_1, C_2) uses only the minimum cross-boundary edge. A single low-weight edge between two otherwise distinct regions suppresses boundary detection for the entire pair. No simple robustness fix is available: replacing the minimum with any quantile K>0K > 0 of inter-component edge weights makes finding an optimal segmentation NP-hard (Theorem 4 — reduction from min-ratio-cut with uniform capacities and demands).
  • Grid mode produces only spatially connected components. Texturally coherent but spatially disjoint regions (e.g., repeated objects of the same colour) cannot be captured with the 8-connected grid graph. The feature-space nearest-neighbour graph mode addresses this, embedding each pixel as (x,y,r,g,b)(x, y, r, g, b) with L2 distance and 10 nearest neighbours per pixel.
  • Colour images run three independent monochrome segmentations and intersect. The R, G, and B channel segmentations are intersected: two pixels are co-segmented only if they co-segment in all three planes. This yields better results than a single segmentation on a combined-channel weight.
  • Order-independence. Tie-breaking on equal-weight edges does not affect the final segmentation (Theorem 3), which aids reproducibility on integer-weight graphs.

References

  1. Felzenszwalb, P. F., & Huttenlocher, D. P. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, 59(2)
    –181, 2004. paper
  2. Zahn, C. T. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computers, 20(1)
    –86, 1971. (MST-based graph segmentation antecedent; this paper uses the same greedy merge structure with an adaptive stopping criterion in place of a global threshold.)

Prerequisites