Graph-Cut Interactive Segmentation | VitaVision
Back to atlas

Graph-Cut Interactive Segmentation

4 min readIntermediateView in graph
Based on
Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images
Boykov, Jolly · ICCV 2001
DOI ↗

Goal

An N-dimensional image I:PRI: P \to \mathbb{R} and two disjoint user-marked seed sets OPO \subset P (object) and BPB \subset P (background) are the inputs. The output is a binary labelling A:P{obj,bkg}A: P \to \{\text{obj}, \text{bkg}\} that satisfies the hard constraints pO,Ap="obj"\forall p \in O,\, A_p = \text{"obj"} and pB,Ap="bkg"\forall p \in B,\, A_p = \text{"bkg"}, and is the global minimum of a combined region-and-boundary MRF energy E(A)E(A). The global optimum is computed exactly via a single s-t min-cut on a pixel graph; both segments may consist of multiple disconnected components.

Algorithm

Let PP denote the set of all pixels (or voxels). Let NN denote the neighbourhood system — 8-connectivity in 2D, 26-connectivity in 3D. Let OPO \subset P and BPB \subset P be the disjoint object and background seed sets. Let Ap{"obj","bkg"}A_p \in \{\text{"obj"}, \text{"bkg"}\} be the label assigned to pixel pp. Let IpI_p denote the intensity at pixel pp. Let σ>0\sigma > 0 be the boundary contrast scale. Let λ0\lambda \geq 0 be the scalar weight trading off the region term against the boundary term. Let KK be the seed-enforcing capacity. Let SS denote the source terminal (object) and TT the sink terminal (background).

Definition
Energy E(A)

The total energy over a labelling AA combines a region term and a boundary term.

E(A)=λR(A)+B(A).E(A) = \lambda \cdot R(A) + B(A).

Definition
Region term R(A)

The region term sums per-pixel log-likelihood penalties under the seed-intensity histograms.

R(A)=pPRp(Ap),Rp("obj")=lnPr(IpO),Rp("bkg")=lnPr(IpB).R(A) = \sum_{p \in P} R_p(A_p), \qquad R_p(\text{"obj"}) = -\ln \Pr(I_p \mid O), \quad R_p(\text{"bkg"}) = -\ln \Pr(I_p \mid B).

Definition
Boundary term B(A)

The boundary term accumulates penalties only at label discontinuities.

B(A)={p,q}NBp,qδ(Ap,Aq),Bp,qexp ⁣((IpIq)22σ2)dist(p,q),B(A) = \sum_{\{p,q\} \in N} B_{p,q} \cdot \delta(A_p, A_q), \qquad B_{p,q} \propto \frac{\exp\!\left(-\dfrac{(I_p - I_q)^2}{2\sigma^2}\right)}{\mathrm{dist}(p, q)},

where δ(Ap,Aq)=1\delta(A_p, A_q) = 1 if ApAqA_p \neq A_q, and 00 otherwise.

Definition
Seed capacity K

KK is chosen large enough to guarantee that the minimum cut never severs a seed's assigned t-link.

K=1+maxpPq:{p,q}NBp,q.K = 1 + \max_{p \in P} \sum_{q:\,\{p,q\} \in N} B_{p,q}.

Procedure

Algorithm
Graph-cut interactive segmentation
Input: N-D image II; seed sets OO, BB; parameters σ\sigma, λ\lambda
Output: Binary labelling A:P{"obj","bkg"}A: P \to \{\text{"obj"}, \text{"bkg"}\} minimising E(A)E(A) subject to hard seed constraints
  1. Compute n-link weights Bp,qB_{p,q} for every neighbour pair {p,q}N\{p,q\} \in N.
  2. Compute K=1+maxpPq:{p,q}NBp,qK = 1 + \max_{p \in P} \sum_{q:\{p,q\}\in N} B_{p,q}.
  3. Build the directed graph G=(V,E)G = (V, E) with V=P{S,T}V = P \cup \{S, T\}. Add an n-link edge {p,q}\{p, q\} with capacity Bp,qB_{p,q} for every neighbour pair in NN.
  4. For each unlabelled pixel pOBp \notin O \cup B, add t-link {p,S}\{p, S\} with capacity λRp("bkg")\lambda \cdot R_p(\text{"bkg"}) and t-link {p,T}\{p, T\} with capacity λRp("obj")\lambda \cdot R_p(\text{"obj"}).
  5. For each object seed pOp \in O, add t-link {p,S}\{p, S\} with capacity KK and {p,T}\{p, T\} with capacity 00.
  6. For each background seed pBp \in B, add t-link {p,S}\{p, S\} with capacity 00 and {p,T}\{p, T\} with capacity KK.
  7. Compute the minimum s-t cut CEC^* \subseteq E via max-flow. Recover the segmentation: Ap="obj"A_p = \text{"obj"} if {p,T}C\{p, T\} \in C^*, else Ap="bkg"A_p = \text{"bkg"}.
  8. On a new seed, update only the two t-links at the seeded pixel and re-augment from the existing flow.

Implementation

The graph-construction step in Rust:

fn build_graph<G: GraphBuilder>(
    graph: &mut G,
    pixels: &[f32],                      // intensity per pixel
    neighbors: &[(usize, usize, f32)],   // (p, q, dist_pq) for each N-link pair
    ln_pr_obj: &[f32],                   // -ln Pr(I_p | O) per pixel
    ln_pr_bkg: &[f32],                   // -ln Pr(I_p | B) per pixel
    seeds_obj: &[usize],
    seeds_bkg: &[usize],
    sigma: f32,
    lambda: f32,
) {
    let n = pixels.len();
    let nodes: Vec<G::NodeId> = (0..n).map(|_| graph.add_node()).collect();
    let s = graph.source();
    let t = graph.sink();

    let mut n_link_sum = vec![0.0_f32; n];
    let mut n_link_caps = Vec::with_capacity(neighbors.len());
    for &(p, q, dist_pq) in neighbors {
        let diff = pixels[p] - pixels[q];
        let cap = (-(diff * diff) / (2.0 * sigma * sigma)).exp() / dist_pq;
        n_link_caps.push((p, q, cap));
        n_link_sum[p] += cap;
        n_link_sum[q] += cap;
    }
    for (p, q, cap) in n_link_caps {
        graph.add_edge(nodes[p], nodes[q], cap, cap);
    }

    let k: f32 = 1.0 + n_link_sum.iter().cloned().fold(f32::NEG_INFINITY, f32::max);

    let mut is_obj = vec![false; n];
    let mut is_bkg = vec![false; n];
    for &p in seeds_obj { is_obj[p] = true; }
    for &p in seeds_bkg { is_bkg[p] = true; }

    for p in 0..n {
        if is_obj[p] {
            graph.add_edge(nodes[p], s, k, 0.0);
            graph.add_edge(nodes[p], t, 0.0, 0.0);
        } else if is_bkg[p] {
            graph.add_edge(nodes[p], s, 0.0, 0.0);
            graph.add_edge(nodes[p], t, k, 0.0);
        } else {
            graph.add_edge(nodes[p], s, lambda * ln_pr_bkg[p], 0.0);
            graph.add_edge(nodes[p], t, lambda * ln_pr_obj[p], 0.0);
        }
    }
}

Remarks

  • Globality: a single min-cut yields the global minimum of E(A)E(A) subject to the hard seed constraints — failures trace to the energy definition, not to a local minimiser.
  • The pairwise boundary term carries a shrinking bias towards short cuts; λ\lambda requires per-image tuning: too small produces small segments, too large fragments the result through region competition.
  • Scope: the formulation is binary. Multi-region segmentation requires repeated binary cuts (sequential foreground/background passes) or different formulations such as α\alpha-expansion.
  • Common extension: GrabCut replaces dense seed strokes with a single bounding-box prior, swaps the fixed seed histograms for full-covariance RGB Gaussian mixture models re-fit at each pass, and adds a regularised 1-D α\alpha-profile in a ±6\pm 6-pixel ribbon for sub-pixel boundary transitions. See grabcut-iterative-segmentation.

References

  1. Y. Boykov, M.-P. Jolly. Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images. ICCV, 2001. PDF