Atlas 91
Practical computer vision atlas — algorithms, models, and concepts.
Algorithms47
Solve the hand-eye equation AX=XB jointly for rotation and translation by parametrising rigid motions as unit dual quaternions and extractin…
Recover the constant rigid transform from a robot gripper to a rigidly mounted camera by solving the AX=XB equation in two stages — modified…
Real-time frontal-face detection by sliding a fixed 24×24 sub-window across a grayscale image at multiple scales, scoring each position with…
Partition an image into perceptually coherent regions by a Kruskal-style greedy merge over a pixel graph, accepting an inter-component edge …
Extract a foreground from a colour image using a single bounding rectangle as the only required input by alternating Gaussian mixture compon…
Compute the global minimum of a binary region-and-boundary MRF energy as a single s-t min-cut on a pixel graph; user-marked seeds enter as h…
Founding random-sample-consensus paradigm: fit a parametric model to data containing an unknown fraction of gross outliers by drawing minima…
Robust estimator that eliminates the user-tuned inlier threshold by treating the noise scale σ as a random variable on [0, σ_max] and margin…
Engineering decomposition of practical RANSAC into four pluggable stages — sampling (PROSAC), model verification (SPRT), local optimisation …
Rectifies a monocular image to a metric overhead (bird's-eye) view by constructing the rectifying homography in closed form from two CNN-reg…
Detect a target object class in arbitrary images by scoring every position and scale in a HOG feature pyramid with a mixture of star-structu…
1981 closed-form linear method for relative orientation of two viewpoints from eight calibrated point correspondences, introducing the bilin…
Refine pixel-level chessboard corner positions to sub-pixel accuracy by nonlinear least-squares fitting a seven-parameter ideal blurred-corn…
Detects scale- and rotation-invariant blob keypoints as scale-space maxima of the Hessian determinant, approximated with box filters on an i…
Detect thin step edges in greyscale images by smoothing with a Gaussian, computing gradient magnitude and direction, suppressing non-maxima …
Calibrate any central catadioptric or fisheye camera from a few planar checkerboard views by fitting a radially-symmetric Taylor-polynomial …
Non-iterative O(n) solver for the calibrated Perspective-n-Point problem: express the n reference points as weighted sums of four virtual co…
Gradient-vote operator that highlights pixels of high local radial symmetry — bright/dark blobs and approximately circular features. Each pi…
Affine extension of FRST: each pixel votes along a corrected direction $\hat V = G M G^{-1} M^{-1} \nabla I$ at radius $n$, where $G = R D \…
Stitch two-plane outdoor panoramas by clustering SIFT correspondences into a ground group and a distant group via spatial K-means, fitting o…
Detect checkerboard X-corners by computing a four-quadrant corner likelihood at each pixel using axis-aligned and 45°-rotated prototype filt…
Post-process a partially detected checkerboard by training two Gaussian processes (one per pixel coordinate) on the allocated (boardXY, boar…
Stitch two images under moderate parallax by replacing the global affine with a per-feature deviation field, regularised to be smooth via a …
Compute the fundamental matrix from n ≥ 8 point correspondences by conditioning the linear DLT system via a similarity normalisation, recove…
Recover camera intrinsics from one or more views of one or more planar targets via the same two IAC-on-homography constraints as Zhang's met…
Two-stage 1987 camera calibration that uses the radial alignment constraint to recover extrinsics and image scale linearly from a precision …
Replace a global homography with a spatially varying field of homographies, each fit by a per-cell weighted DLT (Moving DLT) on the same poi…
Extend Tsai's radial alignment constraint to a non-frontal sensor by modelling lens–sensor tilt as a 2-DoF rotation, projecting observations…
Detect checkerboard X-junctions by approximating a localized Radon transform with 1-D box filters on rotated copies of the image; the per-pi…
Recover camera intrinsics, radial distortion, and per-view extrinsics from at least three images of a planar pattern at different orientatio…
Detect chessboard X-junctions in heavily blurred or high-resolution images by computing a 16-sample circular x-corner intensity at every lev…
Recover the largest visible checkerboard subgraph from a partially occluded pattern by running VF2 subgraph isomorphism against a model grap…
Recover the integer $(i, j)$ grid coordinate of every corner in a checkerboard calibration image by Delaunay-triangulating the corners, merg…
A chessboard-specific corner detector: scores each pixel by how well its local neighborhood matches an alternating bright-dark X-junction pa…
Segment-test corner detector on a 16-pixel Bresenham ring of radius 3 around each candidate; classifies a point as a corner when N contiguou…
Compute a fixed-length descriptor for an image window by binning pixel gradients into 8×8 cells of 9 unsigned-orientation histograms, normal…
Encodes a Gaussian-smoothed image patch around a detected keypoint as a 128/256/512-bit binary string by running a fixed table of pairwise p…
Detects rotation-invariant oriented keypoints by running FAST-9 on a √2 image pyramid, ranking by Harris cornerness, and assigning orientati…
Detect a full planar checkerboard in an image by reducing the gradient-magnitude edge set to a single-pixel centreline graph, extracting inn…
Detect every corner of a chessboard calibration pattern and assign it an integer grid coordinate by counting ring-alternations to locate X-j…
Detect and decode a self-identifying checkerboard calibration pattern: saddle-point corners from a Hessian response, grid reconstruction via…
Scores each pixel by the Harris response R = det(M) − k·tr(M)², where M is the gradient covariance matrix summed over a Gaussian window; ret…
Scores each pixel by the smaller eigenvalue of the gradient structure tensor M; returns integer pixel locations where that eigenvalue exceed…
Optical flow that replaces the quadratic data and smoothness penalties of variational flow with redescending M-estimators, solved by SOR wit…
Dense optical flow recovered by minimising a variational energy that combines the brightness-constancy constraint with a global smoothness p…
Iterative Newton-Raphson method that estimates the parametric warp between two images by linearising the residual and solving the resulting …
Detects keypoints as scale-space extrema in a Difference-of-Gaussian image pyramid, refines location and scale by 3D quadratic interpolation…
Models18
Two-stage CNN object detector that replaces external Selective Search / EdgeBoxes proposals with a learned Region Proposal Network sharing c…
Single-stage CNN object detector that frames detection as one regression problem from full-image pixels to a 7×7×30 tensor of grid-cell box …
Dense semantic segmentation by repurposing an ImageNet classifier with atrous (dilated) convolution to preserve spatial resolution, an Atrou…
Two-stage instance segmentation by adding a parallel FCN mask branch to Faster R-CNN — per-class binary masks predicted at each RoI under a …
Symmetric encoder-decoder fully-convolutional network for dense pixel-wise biomedical image segmentation — contracting path with channel-dou…
Encoder-decoder CNN for dense pixel-wise classification — converts ImageNet classifiers into fully convolutional networks via 1×1-conv reint…
Family of very deep CNN image classifiers (18 to 152 layers) built from residual blocks $y = \mathcal{F}(x, \{W_i\}) + x$ that reformulate e…
Eight-layer convolutional neural network for 1000-class image classification on ImageNet, trained end-to-end on two GPUs with ReLU activatio…
Twenty-two-layer CNN built from Inception modules — parallel 1×1, 3×3, 5×5 convolutions and 3×3 max-pool concatenated along the channel axis…
Family of very deep CNN image classifiers (11 to 19 weight layers) built from stacked 3×3 convolutions with stride 1 and 2×2 max-pool stride…
First learned per-pixel checkerboard X-corner detector: a three-convolutional-layer CNN with 2,939 parameters trained with mean-squared-erro…
Fully-convolutional CNN that jointly detects interest points and computes 256-D descriptors in a single forward pass, trained without human …
Fully convolutional network that regresses a per-pixel checkerboard-corner response map; trained with weighted cross-entropy and paired with…
Lightweight CNN that jointly detects keypoints, extracts 64-D dense descriptors, and refines semi-dense matches from coarse descriptor pairs…
Three-stage learning-based camera calibration pipeline: a CNN regresses radial-distortion-correction parameters, a UNet predicts per-corner …
Adaptive-depth Transformer matcher for sparse local features: stacks 9 self+cross-attention layers with rotary positional encoding and a per…
Detector-free dense feature matcher: shared CNN backbone produces coarse and fine feature maps, a Linear Transformer with interleaved self- …
Concepts26
Computes each output element as a learned, input-dependent weighted average of value vectors, letting every element aggregate information fr…
The linear, shift-invariant operation that produces each output pixel as a kernel-weighted sum of input pixels in a local neighbourhood.
The framework that poses image labelling and segmentation as minimising an objective combining a per-pixel data term and a pairwise smoothne…
A discrete multi-resolution representation — a sequence of images at progressively coarser resolution, each smoothed and downsampled from it…
A precomputed prefix-sum array that returns the sum of pixel values over any axis-aligned rectangle in constant time with four array reads.
Reducing a dense response map or a set of overlapping detections to a sparse set of local maxima by discarding every element that is not str…
The projective map from 3-D scene points to 2-D image pixels through a single centre of projection, parameterised by an intrinsic matrix and…
Estimating a geometric entity defined only up to scale by stacking constraints into a homogeneous linear system and taking the smallest righ…
Random sample consensus — a paradigm for fitting a parametric model to data containing an unknown fraction of gross outliers, by drawing min…
Verify candidate calibration-pattern corners by constructing a graph over them (Delaunay triangulation, k-nearest-neighbours, or proximity) …
Joint nonlinear least-squares refinement of all camera parameters — and, in structure-from-motion, all 3-D points — that minimises the total…
A feed-forward network that builds a spatial hierarchy of learned features by alternating weight-shared convolution layers, pointwise nonlin…
Recovery of the 6-DOF rigid transformation — rotation and translation — relating a camera to a scene, an object, or a second camera.
Mathematical models for departures from the ideal pinhole projection — radial barrel/pincushion, tangential decentering, thin-prism — and th…
A two-line similarity transform — translate the point centroid to the origin, isotropically scale so the average distance is √2 — that condi…
The intrinsic projective geometry of two views of a scene, encoding the constraint that a point visible in one image must lie on a specific …
An invertible projective transformation of the plane, represented by a 3×3 matrix defined up to a non-zero scalar, mapping points between tw…
The 2-vector of partial derivatives of image intensity with respect to spatial coordinates, measuring the rate and direction of brightness c…
A one-parameter family of images obtained by progressively blurring an input image with Gaussians of increasing standard deviation, providin…
Fixed-length vectors encoding the local image appearance around a keypoint, built so the same physical point yields similar descriptors acro…
Twenty-five years of methods for finding the inner corners of a planar checkerboard calibration target — from Harris-on-thresholded-images t…
A scalar response computed from the determinant of the image Hessian, negative at saddle points (X-corners) and zero at flat regions, edges,…
A 2011–2013 lineage of stitching methods that replace the single global homography with a spatially varying warp field — fitted as either tw…
A symmetric 2×2 matrix formed by summing the outer products of the image gradient over a local window, encoding the dominant orientation and…
Establishing keypoint correspondences between two images by comparing descriptors and resolving them into a consistent partial assignment — …
The apparent 2-D velocity field of image brightness between consecutive frames, recovered from the spatio-temporal gradient under the bright…