Shi-Tomasi Corner Detector | VitaVision
Back to atlas

Shi-Tomasi Corner Detector

6 min readIntermediateView in graph
Based on
Good Features to Track
Shi, Tomasi · IEEE CVPR 1994
DOI ↗

Goal

Detect corners in a grayscale image I:ΩRI: \Omega \to \mathbb{R} on pixel domain ΩZ2\Omega \subset \mathbb{Z}^2. Output: a set of integer pixel locations {(xi,yi)}Ω\{(x_i, y_i)\} \subset \Omega at which intensity varies strongly in two independent directions. The response is the smaller eigenvalue of the gradient structure tensor — the quantity that directly bounds the worst-conditioned direction of local intensity variation. The acceptance threshold τ\tau is not a heuristic: it is derived in §3 of the paper from the condition that the linear system governing feature displacement must be both above the noise level and well-conditioned for reliable tracking.

Algorithm

Let Ix,Iy:ΩRI_x, I_y: \Omega \to \mathbb{R} denote the image derivatives in the horizontal and vertical directions, approximated by central differences or Sobel operators. Let w(u,v)w(u, v) denote a Gaussian window with standard deviation σ\sigma:

w(u,v)=exp ⁣(u2+v22σ2).w(u, v) = \exp\!\left(-\frac{u^2 + v^2}{2\sigma^2}\right).

Let λ1λ2\lambda_1 \geq \lambda_2 denote the eigenvalues of MM at a given pixel.

Definition
Structure tensor (M)

The 2×22 \times 2 symmetric matrix summarising gradient covariance in a local neighbourhood, identical to the structure tensor of Harris.

M=(u,v)w(u,v)[Ix2IxIyIxIyIy2],M = \sum_{(u,v)} w(u,v) \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix},

with entries A=Ix2wA = I_x^2 \otimes w, B=Iy2wB = I_y^2 \otimes w, C=(IxIy)wC = (I_x I_y) \otimes w, so that tr(M)=A+B=λ1+λ2\operatorname{tr}(M) = A + B = \lambda_1 + \lambda_2 and det(M)=ABC2=λ1λ2\det(M) = AB - C^2 = \lambda_1 \lambda_2.

Definition
Shi-Tomasi response (R)

The smaller eigenvalue of MM:

R=λ2=min(λ1,λ2).R = \lambda_2 = \min(\lambda_1, \lambda_2).

Computed in closed form from tr(M)\operatorname{tr}(M) and det(M)\det(M) as

λ1,2=12(tr(M)±tr(M)24det(M)),\lambda_{1,2} = \tfrac{1}{2}\left(\operatorname{tr}(M) \pm \sqrt{\operatorname{tr}(M)^2 - 4\det(M)}\right),

so

R=12(tr(M)tr(M)24det(M)).R = \tfrac{1}{2}\left(\operatorname{tr}(M) - \sqrt{\operatorname{tr}(M)^2 - 4\det(M)}\right).

Procedure

Algorithm
Shi-Tomasi corner detection
Input: Grayscale image II on domain Ω\Omega; Gaussian window parameter σ\sigma; threshold τ\tau.
Output: Set of integer pixel locations {(xi,yi)}\{(x_i, y_i)\} marking detected corners.
  1. Compute IxI_x and IyI_y across the image via central differences or Sobel operators.
  2. Form the per-pixel products Ix2I_x^2, Iy2I_y^2, and IxIyI_x I_y.
  3. Convolve each product with the Gaussian window ww to obtain the entries AA, BB, CC of MM at every pixel.
  4. Compute R(x,y)=12 ⁣((A+B)(A+B)24(ABC2))R(x, y) = \tfrac{1}{2}\!\left((A+B) - \sqrt{(A+B)^2 - 4(AB - C^2)}\right) at every pixel.
  5. Discard pixels with R(x,y)τR(x, y) \leq \tau. The threshold τ\tau encodes a feature-tracking quality bound: a small λ2\lambda_2 implies an aperture problem in at least one direction, preventing reliable displacement estimation (§3 of the paper).
  6. Apply non-maximum suppression: retain only pixels whose RR value is an 8-way local maximum.

Implementation

The per-pixel response computation in Rust:

fn shi_tomasi_response(img: &[f32], w: usize, h: usize, sigma: f32) -> Vec<f32> {
    let (ix, iy) = gradients(img, w, h);
    let ixx: Vec<f32> = ix.iter().zip(&ix).map(|(a, b)| a * b).collect();
    let iyy: Vec<f32> = iy.iter().zip(&iy).map(|(a, b)| a * b).collect();
    let ixy: Vec<f32> = ix.iter().zip(&iy).map(|(a, b)| a * b).collect();

    let a = gaussian_blur(&ixx, w, h, sigma);
    let b = gaussian_blur(&iyy, w, h, sigma);
    let c = gaussian_blur(&ixy, w, h, sigma);

    a.iter().zip(&b).zip(&c)
        .map(|((&aa, &bb), &cc)| {
            let trace = aa + bb;
            let det = aa * bb - cc * cc;
            let disc = (trace * trace - 4.0 * det).max(0.0).sqrt();
            0.5 * (trace - disc)
        })
        .collect()
}

gradients and gaussian_blur are standard helpers; gradients returns central-difference derivatives and gaussian_blur applies a separable Gaussian kernel. The .max(0.0) clamp before .sqrt() guards against floating-point round-off: the discriminant tr(M)24det(M)=(λ1λ2)2\operatorname{tr}(M)^2 - 4\det(M) = (\lambda_1 - \lambda_2)^2 is non-negative by construction, but finite-precision arithmetic can produce a small negative value.

Remarks

  • Complexity: O(Ω)O(|\Omega|) per scale — gradient computation, three convolutions (one per product Ix2I_x^2, Iy2I_y^2, IxIyI_x I_y), and a fixed scalar arithmetic pass per pixel. Identical to Harris.
  • The closed form for min(λ1,λ2)\min(\lambda_1, \lambda_2) avoids explicit eigendecomposition; the only extra cost over Harris is one square root per pixel.
  • The response is rotation-invariant: RR is a function of tr(M)\operatorname{tr}(M) and det(M)\det(M), both of which are rotation-invariant quantities.
  • The detector is not scale-invariant. Response peaks at the scale of the integration window σ\sigma. Multi-scale detection requires running the detector on a Gaussian image pyramid.
  • No free sensitivity parameter analogous to Harris's kk: the threshold τ\tau has a direct interpretation as a tracking-quality bound derived from the linear displacement system (equation (8) of the paper).
  • Compared with Harris: see When to choose Harris over Shi-Tomasi on the Harris page, which hosts the comparison per the older-paper-hosts rule.

When to choose Shi-Tomasi over SIFT

SIFT detects scale-space extrema in a DoG pyramid and bundles a 128-D descriptor; Shi-Tomasi computes a single-scale corner-response score from the structure-tensor minimum eigenvalue. They solve different problems — Shi-Tomasi locates corners at a fixed integration scale, SIFT produces matchable keypoints across scale.

Shi-Tomasi SIFT
Scale handling single scale; multi-scale only by external pyramid scale-invariant by construction
Output corner-response score per pixel keypoint (x,y,σ,θ)(x, y, \sigma, \theta) + 128-D descriptor
Per-detection cost gradient + 3 convolutions + 1 sqrt full DoG pyramid, sub-pixel refinement, orientation histogram, descriptor
Descriptor none bundled 128-D L2-normalised vector
Threshold meaning minimum-eigenvalue value, transfers across exposures DoG contrast threshold + Hessian-ratio edge test

Choose Shi-Tomasi when (1) scale is controlled by the application — KLT tracking is the canonical use case (Shi-Tomasi corners are the highest-quality features for the linear displacement model the paper derives); (2) the exposure-portable threshold matters more than scale invariance; (3) downstream code does its own descriptor or operates on raw patches.

Choose SIFT when (1) images differ in scale, distance, or focal length and corners must repeat across scale; (2) a single-step detect-and-describe pipeline is required; (3) the matching task is wide-baseline rather than frame-to-frame tracking.

References

  1. J. Shi, C. Tomasi. Good Features to Track. IEEE CVPR, 1994. DOI: 10.1109/CVPR.1994.323794
  2. C. Harris, M. J. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988. DOI: 10.5244/c.2.23

Extends

  • Lucas-Kanade Image Registration

    Shi-Tomasi derives the feature-selection threshold from the conditioning of the LK normal-equation matrix and adds a 6-DOF affine variant with dissimilarity monitoring.

Has learned alternative