Structure Tensor | VitaVision
Back to atlas

Structure Tensor

7 min readIntermediateView in graph

Definition

The structure tensor at a pixel (x,y)(x, y) is the 2×22 \times 2 symmetric positive-semidefinite matrix

M(x,y)=(u,v)Ww(u,v)I(x+u,y+v)I(x+u,y+v)T,M(x, y) = \sum_{(u,v) \in \mathcal{W}} w(u, v)\,\nabla I(x+u, y+v)\,\nabla I(x+u, y+v)^T,

where W\mathcal{W} is a local window, ww is a weighting function (typically a Gaussian), and I=(Ix,Iy)T\nabla I = (I_x, I_y)^T is the image gradient. Writing out the components:

M=[WwIx2WwIxIyWwIxIyWwIy2].M = \begin{bmatrix} \sum_\mathcal{W} w\,I_x^2 & \sum_\mathcal{W} w\,I_x I_y \\ \sum_\mathcal{W} w\,I_x I_y & \sum_\mathcal{W} w\,I_y^2 \end{bmatrix}.

The structure tensor is the second-moment matrix of the gradient distribution in the window. Its eigenvalues and eigenvectors encode the dominant gradient orientations and their strengths: the eigenvectors give the directions of maximum and minimum gradient energy, and the eigenvalues give the corresponding magnitudes.

Mathematical Description

Eigenvalue classification

Let λ1λ20\lambda_1 \geq \lambda_2 \geq 0 be the eigenvalues of MM. The gradient distribution in the window is characterized by the ratio λ2/λ1\lambda_2 / \lambda_1:

Definition
Structure tensor eigenvalue classification

Classification of local image structure from (λ1,λ2)(\lambda_1, \lambda_2).

  • λ1λ20\lambda_1 \approx \lambda_2 \approx 0: flat region — no gradient in any direction.
  • λ1λ20\lambda_1 \gg \lambda_2 \approx 0: edge — strong gradient in one direction only.
  • λ1λ20\lambda_1 \approx \lambda_2 \gg 0: corner — strong gradient in two independent directions.

Corner response functions

Three cornerness measures are derived from the eigenvalues of MM:

Definition
Harris response

Avoids explicit eigenvalue computation by expressing the determinant and trace in terms of MM's entries.

RHarris=det(M)k(trM)2=λ1λ2k(λ1+λ2)2,R_{\text{Harris}} = \det(M) - k\,(\mathrm{tr}\,M)^2 = \lambda_1 \lambda_2 - k(\lambda_1 + \lambda_2)^2,

with empirical constant k[0.04,0.06]k \in [0.04, 0.06].

Definition
Shi-Tomasi response

The minimum eigenvalue; retains the smaller principal curvature as the cornerness score.

RShiTomasi=min(λ1,λ2)=λ2.R_{\text{ShiTomasi}} = \min(\lambda_1, \lambda_2) = \lambda_2.
Definition
Förstner response

The harmonic mean of the eigenvalues, equal to det(M)/tr(M)\det(M)/\mathrm{tr}(M), divided by the trace to normalize for overall brightness.

RForstner=λ1λ2λ1+λ2=det(M)tr(M).R_{\text{Forstner}} = \frac{\lambda_1 \lambda_2}{\lambda_1 + \lambda_2} = \frac{\det(M)}{\mathrm{tr}(M)}.

All three measures are maximized at true corners (λ1λ2\lambda_1 \approx \lambda_2 large) and small at flat regions and edges. Harris and Förstner can be negative at edges; Shi-Tomasi is non-negative everywhere.

Anisotropy and coherence

The coherence of the gradient field in the window is measured by

C=(λ1λ2λ1+λ2)2[0,1].C = \left(\frac{\lambda_1 - \lambda_2}{\lambda_1 + \lambda_2}\right)^2 \in [0, 1].

C=1C = 1 at perfect edges (one nonzero eigenvalue); C=0C = 0 at isotropic junctions and flat regions. Anisotropic diffusion algorithms use CC to steer smoothing along edges rather than across them.

Two-scale construction

The structure tensor involves two distinct smoothing scales:

  1. Gradient scale σd\sigma_d: the standard deviation of the Gaussian applied before computing IxI_x and IyI_y (equivalently, the scale of the derivative-of-Gaussian kernels). Controls which frequency band is differentiated.
  2. Integration scale σi\sigma_i: the standard deviation of the Gaussian window ww that weights the outer-product sum. Controls the size of the neighbourhood over which gradient statistics are accumulated.

Setting σi=1.5σd\sigma_i = 1.5\,\sigma_d is a common heuristic, but the optimal ratio depends on the target feature scale.

Relation to the autocorrelation surface

MM is the Hessian of the local autocorrelation function E(Δx,Δy)E(\Delta x, \Delta y) at the origin:

E(Δx,Δy)=Ww(u,v)[I(x+u+Δx,y+v+Δy)I(x+u,y+v)]2[Δx,Δy]M[Δx,Δy]T.E(\Delta x, \Delta y) = \sum_\mathcal{W} w(u, v)\,[I(x+u+\Delta x, y+v+\Delta y) - I(x+u, y+v)]^2 \approx [\Delta x, \Delta y]\,M\,[\Delta x, \Delta y]^T.

The eigenvalues of MM are the principal curvatures of EE at zero shift. Harris's original motivation was to find pixels where EE has large curvature in all directions, justifying the trace and determinant formulation.

Numerical Concerns

Floating-point accumulation. Each entry of MM is a sum of squared or cross-multiplied gradient values. For 8-bit images, Ix2I_x^2 spans [0,4×2552][0, 4 \times 255^2] before normalization; integer implementations must use 32-bit accumulators. Floating-point implementations with 32-bit floats can accumulate rounding error across large windows.

Ill-conditioning. When λ1λ20\lambda_1 \gg \lambda_2 \approx 0 (an edge), the Harris and Förstner responses are near zero but the matrix is rank-1. Computing det(M)\det(M) as λ1λ2\lambda_1\lambda_2 via the entry formula Ix2Iy2(IxIy)2I_x^2 I_y^2 - (I_x I_y)^2 is numerically stable. Computing it via eigendecomposition and multiplying is equivalent but adds unnecessary work.

Scale sensitivity. The response amplitude scales with σi2\sigma_i^2 (larger windows accumulate more gradient energy). Comparing responses across scales requires normalizing by σi2\sigma_i^2.

Window boundary effects. Near image borders, the Gaussian window is truncated. Implementations typically zero-pad the gradient maps before convolving, which introduces a gradient-free border that suppresses corners near the image edge.

Non-maximum suppression threshold units. Corner detection thresholds applied to RHarrisR_{\text{Harris}} or RShiTomasiR_{\text{ShiTomasi}} have units of gradient-squared. They do not transfer between images of different exposure, resolution, or preprocessing. Normalizing the response by the maximum value in the image, or by the squared image gradient energy, makes thresholds more portable.

Ridge-edge ambiguity. When λ1λ2>0\lambda_1 \gg \lambda_2 > 0 (both eigenvalues positive but very different in magnitude), the pixel is on a ridge rather than a flat edge. Harris classifies it as an edge (negative RR); Shi-Tomasi retains it as a weak corner because λ2>0\lambda_2 > 0. This distinction matters for tracking algorithms that require corners to be uniquely localizable.

Where it appears

The structure tensor is the shared algebraic core of every gradient-based corner detector. All three standard cornerness measures — Harris, Shi-Tomasi, and Förstner — compute MM identically and differ only in the scalar function of its eigenvalues used as the response.

  • harris-corner-detector — computes MM as described above; the Harris response R=det(M)ktr(M)2R = \det(M) - k\,\mathrm{tr}(M)^2 is the standard cornerness score.
  • shi-tomasi-corner-detector — identical structure tensor construction; the response is replaced by min(λ1,λ2)\min(\lambda_1, \lambda_2), motivating the "Good Features to Track" name.
  • lucas-kanade — the iterative image registration update uses MM as the coefficient matrix of the per-iteration normal equation; the invertibility of MM is the precondition under which the gradient-based displacement estimate is well-defined.

The structure tensor also appears in anisotropic diffusion, where the coherence CC steers the diffusion tensor; that algorithm is not yet registered on this site.

References

  • W. Förstner, E. Gülch. "A Fast Operator for Detection and Precise Location of Distinct Points, Corners and Centres of Circular Features." ISPRS Intercommission Workshop, 1987. Introduces the harmonic-mean cornerness measure and the two-scale framework.
  • C. Harris, M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conference, 1988. Original motivation via the autocorrelation surface and the trace-determinant response.
  • J. Shi, C. Tomasi. "Good Features to Track." IEEE CVPR, 1994. Derives min(λ1,λ2)\min(\lambda_1, \lambda_2) as the theoretically correct tracking-quality measure.
  • H. Knutsson. "Representing Local Structure Using Tensors." Scandinavian Conference on Image Analysis, 1989. General framework for the structure tensor in signal processing.
  • J. Bigun, G. H. Granlund. "Optimal Orientation Detection of Linear Symmetry." IEEE ICCV, 1987. Simultaneous independent derivation of the structure tensor for orientation estimation.