Goal
A colour image with and a user-drawn bounding rectangle are the inputs. The output is a binary opacity array partitioning the image into foreground and background, plus a continuous in a narrow border ribbon of width pixels for smooth edge transitions. A single rectangle suffices as the sole required interaction; the algorithm needs no inner-boundary trace. The Gibbs energy decreases monotonically across the iterative coordinate-descent loop, guaranteeing convergence to a local minimum.
Algorithm
Let denote the RGB image, . Let denote the binary opacity matte; is background, is foreground. Let , , denote the per-pixel GMM component assignment. Let be the number of Gaussian components per side. Let collect the GMM mixture weights, means, and full covariance matrices. Let , , denote the hard-background, unknown, and hard-foreground pixel sets forming the trimap. Let denote the set of 8-connected neighbour pairs. Let be the contrast normalisation derived from image statistics. Let be the smoothness weight. Let be the border-matting ribbon half-width in pixels. Let and be the DP regulariser parameters. Let be the neighbourhood patch size for GMM parameter estimation in border matting. Let and parameterise the -profile step function along contour point .
The total energy over decomposes into a data term and a smoothness term.
is minimised over and by the min-cut and GMM-assignment steps; penalises label discontinuities across high-contrast edges.
The data term sums per-pixel negative log-likelihoods under the GMM.
Each GMM component contributes a mixture-weight penalty and a Mahalanobis distance from the component mean.
The smoothness term applies a contrast-weighted penalty at every label boundary in the 8-connected graph.
where is computed from all neighbouring pixel pairs in the image, and .
Along contour , a soft step function is fitted per contour point by dynamic programming.
is the signed distance of pixel from the boundary. Parameters (position) and (width) are optimised with a DP regulariser controlled by and , over a pixel neighbourhood patch centred on each contour point, with ribbon half-width .
Procedure
- Set to all pixels outside ; set to all pixels inside ; set . Assign for and for . Compute from all 8-connected neighbour pairs. Initialise foreground and background GMMs ( components each) from pixels with and respectively.
- Assign: for each pixel , set (hard assignment to the nearest GMM component).
- Learn: for each component in each GMM, recompute the sample mean , covariance , and weight from the pixels currently assigned to that component.
- Min-cut: construct the contrast-weighted pixel graph with -links from and -links from ; compute the global s-t min-cut to update over pixels.
- Repeat steps 2–4 until ceases to decrease.
- Optional user editing: hard-constrain misclassified pixels as foreground or background via brush strokes, then re-run step 4 (single min-cut) or the full loop from step 2.
- Border matting: detect the boundary contour of the hard segmentation. For each contour point , fit by DP with regulariser parameters , , over the pixel patch, within ribbon half-width . Steal foreground colours from to eliminate background bleeding at mixed pixels.
flowchart TB
A["Assign<br/>k_n ← argmin D(α_n, k, θ, z_n)"] --> B["Learn<br/>μ, Σ, π ← sample stats"]
B --> C["Min-cut<br/>α ← argmin E(α, k, θ, z)"]
C --> D{"E decreased?"}
D -- yes --> A
D -- no --> E["Converged"]
Implementation
The iterative coordinate-descent loop in Rust:
trait MinCut {
fn solve(&mut self, data: &[f32], smoothness: &[f32]) -> Vec<u8>;
}
struct Gmm {
means: Vec<[f32; 3]>,
covs: Vec<[[f32; 3]; 3]>,
weights: Vec<f32>,
}
impl Gmm {
fn assign(&self, z: [f32; 3]) -> usize {
(0..self.means.len())
.min_by(|&a, &b| self.mahal(z, a).partial_cmp(&self.mahal(z, b)).unwrap())
.unwrap()
}
fn mahal(&self, z: [f32; 3], k: usize) -> f32 { /* (z-μ)ᵀ Σ⁻¹ (z-μ) */ 0.0 }
fn fit(pixels: &[[f32; 3]], component_ids: &[usize], k: usize) -> Gmm { todo!() }
}
fn grabcut_iterate(
z: &[[f32; 3]],
alpha: &mut Vec<u8>,
fg_gmm: &mut Gmm,
bg_gmm: &mut Gmm,
tu_mask: &[bool],
solver: &mut impl MinCut,
) {
let n = z.len();
let mut comp = vec![0usize; n];
loop {
for i in 0..n {
if tu_mask[i] {
comp[i] = if alpha[i] == 1 { fg_gmm.assign(z[i]) }
else { bg_gmm.assign(z[i]) };
}
}
*fg_gmm = Gmm::fit(z, &comp, fg_gmm.means.len());
*bg_gmm = Gmm::fit(z, &comp, bg_gmm.means.len());
let new_alpha = solver.solve(&[], &[]);
for i in 0..n { if tu_mask[i] { alpha[i] = new_alpha[i]; } }
break; // caller checks ΔE and re-enters; graph build follows graph-cut-segmentation.
}
}
Remarks
- Each full pass (assign → learn → min-cut) minimises over one variable group; the energy decreases monotonically per pass. Representative images converge in approximately 12 iterations.
- Each min-cut step runs on a -node graph with edges for an 8-connected pixel lattice; per-iteration cost is dominated by one max-flow computation.
- was calibrated on 15 training images; changing it shifts the balance between the colour-model unary and the boundary smoothness term and can cause over-smooth or fragmented results.
- GMM components per side is a fixed design choice: too few components underfit multi-modal colour distributions; too many starve individual components of training pixels.
- Three failure modes: (i) low-contrast boundaries where the exponential smoothness term cannot localise the edge; (ii) foreground–background camouflage where the GMM unary term provides no gradient; (iii) background material inside the rectangle that is absent from the rectangle exterior, biasing the background GMM at initialisation.
- Border matting covers only the pixel boundary ribbon; full-image soft transparency (hair, smoke) is out of scope. A lasso input instead of a rectangle provides tighter background coverage and partially mitigates failure mode (iii).