Goal
Detect corners in a grayscale image on pixel domain . Output: a set of integer pixel locations at which intensity varies strongly in two independent directions. The response is the smaller eigenvalue of the gradient structure tensor — the quantity that directly bounds the worst-conditioned direction of local intensity variation. The acceptance threshold is not a heuristic: it is derived in §3 of the paper from the condition that the linear system governing feature displacement must be both above the noise level and well-conditioned for reliable tracking.
Algorithm
Let denote the image derivatives in the horizontal and vertical directions, approximated by central differences or Sobel operators. Let denote a Gaussian window with standard deviation :
Let denote the eigenvalues of at a given pixel.
The symmetric matrix summarising gradient covariance in a local neighbourhood, identical to the structure tensor of Harris.
with entries , , , so that and .
The smaller eigenvalue of :
Computed in closed form from and as
so
Procedure
- Compute and across the image via central differences or Sobel operators.
- Form the per-pixel products , , and .
- Convolve each product with the Gaussian window to obtain the entries , , of at every pixel.
- Compute at every pixel.
- Discard pixels with . The threshold encodes a feature-tracking quality bound: a small implies an aperture problem in at least one direction, preventing reliable displacement estimation (§3 of the paper).
- Apply non-maximum suppression: retain only pixels whose value is an 8-way local maximum.
Implementation
The per-pixel response computation in Rust:
fn shi_tomasi_response(img: &[f32], w: usize, h: usize, sigma: f32) -> Vec<f32> {
let (ix, iy) = gradients(img, w, h);
let ixx: Vec<f32> = ix.iter().zip(&ix).map(|(a, b)| a * b).collect();
let iyy: Vec<f32> = iy.iter().zip(&iy).map(|(a, b)| a * b).collect();
let ixy: Vec<f32> = ix.iter().zip(&iy).map(|(a, b)| a * b).collect();
let a = gaussian_blur(&ixx, w, h, sigma);
let b = gaussian_blur(&iyy, w, h, sigma);
let c = gaussian_blur(&ixy, w, h, sigma);
a.iter().zip(&b).zip(&c)
.map(|((&aa, &bb), &cc)| {
let trace = aa + bb;
let det = aa * bb - cc * cc;
let disc = (trace * trace - 4.0 * det).max(0.0).sqrt();
0.5 * (trace - disc)
})
.collect()
}
gradients and gaussian_blur are standard helpers; gradients returns central-difference derivatives and gaussian_blur applies a separable Gaussian kernel. The .max(0.0) clamp before .sqrt() guards against floating-point round-off: the discriminant is non-negative by construction, but finite-precision arithmetic can produce a small negative value.
Remarks
- Complexity: per scale — gradient computation, three convolutions (one per product , , ), and a fixed scalar arithmetic pass per pixel. Identical to Harris.
- The closed form for avoids explicit eigendecomposition; the only extra cost over Harris is one square root per pixel.
- The response is rotation-invariant: is a function of and , both of which are rotation-invariant quantities.
- The detector is not scale-invariant. Response peaks at the scale of the integration window . Multi-scale detection requires running the detector on a Gaussian image pyramid.
- No free sensitivity parameter analogous to Harris's : the threshold has a direct interpretation as a tracking-quality bound derived from the linear displacement system (equation (8) of the paper).
References
- J. Shi, C. Tomasi. Good Features to Track. IEEE CVPR, 1994. DOI: 10.1109/CVPR.1994.323794
- C. Harris, M. J. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988. DOI: 10.5244/c.2.23