Shi-Tomasi Corner Detector | VitaVision
← Back to algorithms

Shi-Tomasi Corner Detector

Vitaly Vorobyev··4 min read
intermediate
computer-visionfeature-detectioncorner

Goal

Detect corners in a grayscale image I:ΩRI: \Omega \to \mathbb{R} on pixel domain ΩZ2\Omega \subset \mathbb{Z}^2. Output: a set of integer pixel locations {(xi,yi)}Ω\{(x_i, y_i)\} \subset \Omega at which intensity varies strongly in two independent directions. The response is the smaller eigenvalue of the gradient structure tensor — the quantity that directly bounds the worst-conditioned direction of local intensity variation. The acceptance threshold τ\tau is not a heuristic: it is derived in §3 of the paper from the condition that the linear system governing feature displacement must be both above the noise level and well-conditioned for reliable tracking.

Algorithm

Let Ix,Iy:ΩRI_x, I_y: \Omega \to \mathbb{R} denote the image derivatives in the horizontal and vertical directions, approximated by central differences or Sobel operators. Let w(u,v)w(u, v) denote a Gaussian window with standard deviation σ\sigma:

w(u,v)=exp ⁣(u2+v22σ2).w(u, v) = \exp\!\left(-\frac{u^2 + v^2}{2\sigma^2}\right).

Let λ1λ2\lambda_1 \geq \lambda_2 denote the eigenvalues of MM at a given pixel.

Definition
Structure tensor (M)

The 2×22 \times 2 symmetric matrix summarising gradient covariance in a local neighbourhood, identical to the structure tensor of Harris.

M=(u,v)w(u,v)[Ix2IxIyIxIyIy2],M = \sum_{(u,v)} w(u,v) \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix},

with entries A=Ix2wA = I_x^2 \otimes w, B=Iy2wB = I_y^2 \otimes w, C=(IxIy)wC = (I_x I_y) \otimes w, so that tr(M)=A+B=λ1+λ2\operatorname{tr}(M) = A + B = \lambda_1 + \lambda_2 and det(M)=ABC2=λ1λ2\det(M) = AB - C^2 = \lambda_1 \lambda_2.

Definition
Shi-Tomasi response (R)

The smaller eigenvalue of MM:

R=λ2=min(λ1,λ2).R = \lambda_2 = \min(\lambda_1, \lambda_2).

Computed in closed form from tr(M)\operatorname{tr}(M) and det(M)\det(M) as

λ1,2=12(tr(M)±tr(M)24det(M)),\lambda_{1,2} = \tfrac{1}{2}\left(\operatorname{tr}(M) \pm \sqrt{\operatorname{tr}(M)^2 - 4\det(M)}\right),

so

R=12(tr(M)tr(M)24det(M)).R = \tfrac{1}{2}\left(\operatorname{tr}(M) - \sqrt{\operatorname{tr}(M)^2 - 4\det(M)}\right).

Procedure

Algorithm
Shi-Tomasi corner detection
Input: Grayscale image II on domain Ω\Omega; Gaussian window parameter σ\sigma; threshold τ\tau.
Output: Set of integer pixel locations {(xi,yi)}\{(x_i, y_i)\} marking detected corners.
  1. Compute IxI_x and IyI_y across the image via central differences or Sobel operators.
  2. Form the per-pixel products Ix2I_x^2, Iy2I_y^2, and IxIyI_x I_y.
  3. Convolve each product with the Gaussian window ww to obtain the entries AA, BB, CC of MM at every pixel.
  4. Compute R(x,y)=12 ⁣((A+B)(A+B)24(ABC2))R(x, y) = \tfrac{1}{2}\!\left((A+B) - \sqrt{(A+B)^2 - 4(AB - C^2)}\right) at every pixel.
  5. Discard pixels with R(x,y)τR(x, y) \leq \tau. The threshold τ\tau encodes a feature-tracking quality bound: a small λ2\lambda_2 implies an aperture problem in at least one direction, preventing reliable displacement estimation (§3 of the paper).
  6. Apply non-maximum suppression: retain only pixels whose RR value is an 8-way local maximum.

Implementation

The per-pixel response computation in Rust:

fn shi_tomasi_response(img: &[f32], w: usize, h: usize, sigma: f32) -> Vec<f32> {
    let (ix, iy) = gradients(img, w, h);
    let ixx: Vec<f32> = ix.iter().zip(&ix).map(|(a, b)| a * b).collect();
    let iyy: Vec<f32> = iy.iter().zip(&iy).map(|(a, b)| a * b).collect();
    let ixy: Vec<f32> = ix.iter().zip(&iy).map(|(a, b)| a * b).collect();

    let a = gaussian_blur(&ixx, w, h, sigma);
    let b = gaussian_blur(&iyy, w, h, sigma);
    let c = gaussian_blur(&ixy, w, h, sigma);

    a.iter().zip(&b).zip(&c)
        .map(|((&aa, &bb), &cc)| {
            let trace = aa + bb;
            let det = aa * bb - cc * cc;
            let disc = (trace * trace - 4.0 * det).max(0.0).sqrt();
            0.5 * (trace - disc)
        })
        .collect()
}

gradients and gaussian_blur are standard helpers; gradients returns central-difference derivatives and gaussian_blur applies a separable Gaussian kernel. The .max(0.0) clamp before .sqrt() guards against floating-point round-off: the discriminant tr(M)24det(M)=(λ1λ2)2\operatorname{tr}(M)^2 - 4\det(M) = (\lambda_1 - \lambda_2)^2 is non-negative by construction, but finite-precision arithmetic can produce a small negative value.

Remarks

  • Complexity: O(Ω)O(|\Omega|) per scale — gradient computation, three convolutions (one per product Ix2I_x^2, Iy2I_y^2, IxIyI_x I_y), and a fixed scalar arithmetic pass per pixel. Identical to Harris.
  • The closed form for min(λ1,λ2)\min(\lambda_1, \lambda_2) avoids explicit eigendecomposition; the only extra cost over Harris is one square root per pixel.
  • The response is rotation-invariant: RR is a function of tr(M)\operatorname{tr}(M) and det(M)\det(M), both of which are rotation-invariant quantities.
  • The detector is not scale-invariant. Response peaks at the scale of the integration window σ\sigma. Multi-scale detection requires running the detector on a Gaussian image pyramid.
  • No free sensitivity parameter analogous to Harris's kk: the threshold τ\tau has a direct interpretation as a tracking-quality bound derived from the linear displacement system (equation (8) of the paper).

References

  1. J. Shi, C. Tomasi. Good Features to Track. IEEE CVPR, 1994. DOI: 10.1109/CVPR.1994.323794
  2. C. Harris, M. J. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988. DOI: 10.5244/c.2.23

Related Algorithms