Goal
Partition an image (or for colour) into perceptually coherent regions without fixing the number of regions. Input: an image ; a non-negative dissimilarity weight graph over pixels (8-connected grid or -nearest-neighbour in feature space); a Gaussian smoothing scale ; a scale-of-observation parameter ; and an optional minimum component size . Output: a segmentation — a partition of into connected components — satisfying the formal properties of being neither too fine nor too coarse: no neighbouring pair of output components lacks evidence of a boundary (Theorem 1), and no proper refinement of also satisfies that condition (Theorem 2). The boundary predicate is
with the size-adaptive threshold controlling the scale at which evidence accumulates. The greedy Kruskal-style merge runs in — for grid graphs where .
Algorithm
Let denote the graph of pixels and their connections. Let denote the non-negative dissimilarity weight on edge . Let denote a component (segment). Let denote the number of vertices in . Let denote the minimum spanning tree of using edges in . Let denote the scale-of-observation parameter. Let denote the Gaussian smoothing scale. Let denote the optional minimum component size.
The maximum edge weight in the minimum spanning tree of component — the heaviest link that must be cut to split the component:
The minimum weight over all edges crossing the boundary between and — the cheapest available bridge:
The smaller of the two components' internal variations, each inflated by a size-adaptive prior :
where
For a singleton component ; the prior supplies the minimum required cross-boundary gap before the component is kept separate. As grows, and the criterion approaches a bare MST-weight comparison. The parameter sets the scale of observation but is not a minimum component size — a small component survives when the boundary evidence is strong.
The boundary predicate is true (a boundary exists between and ) when:
Graph constructions
Grid graph. Each pixel is a vertex; edges connect each pixel to its 8 spatial neighbours with weight . The edge count is , giving total runtime. Components formed by this construction are spatially connected.
Feature-space (nearest-neighbour) graph. Each pixel is embedded as the five-dimensional point with L2 distance as edge weight. Edges connect each pixel to its 10 nearest neighbours; an approximate nearest-neighbour structure builds the graph in sub-quadratic time. This construction permits spatially disjoint but texturally coherent regions to be co-segmented.
Procedure
- Smooth. Apply a Gaussian of scale to to suppress digitisation noise. For grid graphs, compute edge weights on the smoothed image. For feature-space graphs, compute L2 distances in the embedding.
- Sort edges. Produce the ordered sequence of all edges in non-decreasing order of weight. For 8-bit integer weights, counting sort achieves this in .
- Initialise. Assign each vertex to its own singleton component; set for all . Use a union-find structure with path compression and union by rank.
- Greedy merge. For in order: let . If and belong to distinct components and , and , then merge and into a single component; update of the merged component to in (Lemma 1 — the merge edge is the maximum-weight MST edge of the merged component). Otherwise leave the partition unchanged.
- Post-filter (optional). For each edge whose endpoints lie in different components: if , merge those two components unconditionally.
- Return the final partition .
The per-merge update of follows from the sorted edge order: because edges are visited in non-decreasing weight, the merge-triggering edge for any pair is always the minimum-weight edge between those components, and therefore the maximum-weight edge in the resulting component's MST.
Implementation
The greedy merge loop in Rust:
struct UnionFind {
parent: Vec<usize>,
rank: Vec<usize>,
size: Vec<usize>,
int: Vec<f32>, // Int(C): max MST edge weight
}
impl UnionFind {
fn new(n: usize) -> Self {
Self {
parent: (0..n).collect(),
rank: vec![0; n],
size: vec![1; n],
int: vec![0.0; n],
}
}
fn find(&mut self, x: usize) -> usize {
if self.parent[x] != x { self.parent[x] = self.find(self.parent[x]); }
self.parent[x]
}
fn merge(&mut self, a: usize, b: usize, w: f32) {
let (ra, rb) = (self.find(a), self.find(b));
let (root, child) = if self.rank[ra] >= self.rank[rb] { (ra, rb) } else { (rb, ra) };
self.parent[child] = root;
if self.rank[root] == self.rank[child] { self.rank[root] += 1; }
self.size[root] += self.size[child];
self.int[root] = w; // merge edge is max MST edge (Lemma 1)
}
}
/// `edges`: sorted (weight, u, v) triples. `k`: scale parameter. `min_size`: 0 to skip.
fn segment(n: usize, edges: &[(f32, usize, usize)], k: f32, min_size: usize) -> UnionFind {
let mut uf = UnionFind::new(n);
for &(w, u, v) in edges {
let (ru, rv) = (uf.find(u), uf.find(v));
if ru == rv { continue; }
let mint = (uf.int[ru] + k / uf.size[ru] as f32)
.min(uf.int[rv] + k / uf.size[rv] as f32);
if w <= mint { uf.merge(ru, rv, w); }
}
if min_size > 0 {
for &(w, u, v) in edges {
let (ru, rv) = (uf.find(u), uf.find(v));
if ru != rv && (uf.size[ru] < min_size || uf.size[rv] < min_size) {
uf.merge(ru, rv, w);
}
}
}
uf
}
The equivalent merge loop in Python with a minimal union-find:
import numpy as np
class UF:
def __init__(self, n):
self.p = list(range(n))
self.rank = [0] * n
self.size = [1] * n
self.int_ = [0.0] * n # Int(C)
def find(self, x):
while self.p[x] != x:
self.p[x] = self.p[self.p[x]]
x = self.p[x]
return x
def union(self, a, b, w):
a, b = self.find(a), self.find(b)
if self.rank[a] < self.rank[b]: a, b = b, a
self.p[b] = a
if self.rank[a] == self.rank[b]: self.rank[a] += 1
self.size[a] += self.size[b]
self.int_[a] = w # Lemma 1: merge edge is max MST edge
def felzenszwalb(n, edges_sorted, k, min_size=0):
"""edges_sorted: array of (w, u, v) sorted by w ascending."""
uf = UF(n)
for w, u, v in edges_sorted:
ru, rv = uf.find(u), uf.find(v)
if ru == rv: continue
mint = min(uf.int_[ru] + k / uf.size[ru], uf.int_[rv] + k / uf.size[rv])
if w <= mint:
uf.union(ru, rv, w)
if min_size:
for w, u, v in edges_sorted:
ru, rv = uf.find(u), uf.find(v)
if ru != rv and min(uf.size[ru], uf.size[rv]) < min_size:
uf.union(ru, rv, w)
return uf
Remarks
- Complexity. The overall runtime is dominated by the edge sort. For grid graphs () this reduces to ; for 8-bit integer weights, counting sort lowers the sort to . Union-find operations (steps 3–4) cost where is the inverse Ackermann function, effectively .
- scales with image resolution. Because , the same numeric value of produces coarser segmentation on larger images. The paper uses for images and for or larger; rescale proportionally to pixel count when applying to new image sizes.
- Single cheap bridge causes chaining. uses only the minimum cross-boundary edge. A single low-weight edge between two otherwise distinct regions suppresses boundary detection for the entire pair. No simple robustness fix is available: replacing the minimum with any quantile of inter-component edge weights makes finding an optimal segmentation NP-hard (Theorem 4 — reduction from min-ratio-cut with uniform capacities and demands).
- Grid mode produces only spatially connected components. Texturally coherent but spatially disjoint regions (e.g., repeated objects of the same colour) cannot be captured with the 8-connected grid graph. The feature-space nearest-neighbour graph mode addresses this, embedding each pixel as with L2 distance and 10 nearest neighbours per pixel.
- Colour images run three independent monochrome segmentations and intersect. The R, G, and B channel segmentations are intersected: two pixels are co-segmented only if they co-segment in all three planes. This yields better results than a single segmentation on a combined-channel weight.
- Order-independence. Tie-breaking on equal-weight edges does not affect the final segmentation (Theorem 3), which aids reproducibility on integer-weight graphs.
References
- Felzenszwalb, P. F., & Huttenlocher, D. P. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, 59(2)–181, 2004. paper
- Zahn, C. T. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computers, 20(1)–86, 1971. (MST-based graph segmentation antecedent; this paper uses the same greedy merge structure with an adaptive stopping criterion in place of a global threshold.)