Motivation

Detect inner corners of a planar checkerboard pattern in a greyscale image without requiring the pattern's square count $(r \times c)$ as a prior — and do it with a learned CNN rather than a hand-crafted gradient or saddle-fitting pipeline. MATE (Donné et al., Sensors 2016) is the first deep-learning checkerboard X-corner detector and is the direct architectural ancestor that CCDN (Chen et al., 2023) extends and supersedes.

Prior work — ChESS, ROCHADE, OCamCalib (Rufli 2008) — was entirely hand-crafted: ring-sampling, gradient-magnitude centrelines, saddle-fitting refinement. MATE's contribution is to show that a minimal three-layer CNN trained on labelled checkerboard images can learn the X-corner signature directly from data, and to do so without any prior on the pattern's grid size.

Architecture

Stub note. The full paper PDF (MDPI Sensors 16(11):1858, 2016) was not accessible during ingestion (HTTP 403 to automated fetchers). The architecture sketch below is reconstructed from the CCDN paper's explicit comparisons and from the donne2016-mate index.yaml notes. Specifics (channel counts, exact kernel sizes per layer, training hyperparameters) are not available from secondary sources and are marked accordingly.

Family & shape. Fully-convolutional CNN. Input: greyscale image $X \in \mathbb{R}^{H \times W}$ . Output: per-pixel corner-response map at a spatial resolution coarser than the input (CCDN §2.1 explicitly contrasts MATE's subsampled-grid output with CCDN's stride-1 max-pool design, which preserves resolution).

Depth and parameter budget. Three convolutional layers with ReLU non-linearities. Total 2,939 parameters — roughly $5.5\times$ smaller than CCDN's 16,301 (CCDN §2.1; CCDN page Complexity).

First-layer kernel. CCDN §2.1 cites MATE's own analysis of the spatial-support radius trade-off: a too-small first-layer kernel admits background false detections; a too-large kernel loses recall on true corners. MATE establishes the radius-4 target that CCDN inherits (9 × 9 first-layer kernel). MATE's exact first-layer size is not directly recorded in the available secondary sources but is consistent with a 7 × 7 kernel given the parameter-count differential.

Loss. Mean-squared error between the predicted response map and a per-pixel binary corner mask. Critically, no positive/negative class balancing is applied — at VGA the positive-label fraction is approximately $10^{-4}$ , so the per-positive gradient contribution is dwarfed by the aggregate background gradient under MSE. CCDN's Fig. 4 shows MATE's MSE training "started out much more slowly for the first 150 epochs" relative to CCDN's weighted cross-entropy.

Post-processing. A fixed threshold of $0.5$ is applied to the response map (CCDN §2.2). No non-maximum suppression. No spatial clustering. Every supra-threshold pixel becomes a corner candidate.

Assessment

Novelty

First learned X-corner detector for checkerboards. Prior work (ROCHADE, ChESS, OCamCalib) was hand-crafted; MATE established that a minimal CNN can replace those pipelines.
Pattern-agnostic by design. The per-pixel response formulation removes the need for $(r \times c)$ at inference — an inherited property in CCDN, XFeat, and any subsequent pattern-aware-but-grid-free detector.
Compact (2,939 parameters). Demonstrates that a three-layer network suffices to capture the X-corner signature in clean, calibration-controlled imagery.

Limitations

Fixed 0.5 threshold is scene-independent. ReLU outputs are unbounded above. A corner whose response is 2.3 is correctly detected, but a corner whose response is 0.4 (low-contrast image, distant camera, strong blur) is missed with no recourse. CCDN §2.2 replaces this with an adaptive $0.5\cdot\max$ rule.
No NMS or clustering. Every supra-threshold pixel emits a corner candidate. On the ROCHADE uEye dataset MATE produces 492 false positives; on the GoPro dataset (strong radial distortion) MATE produces 4.556 % double detections and 389 FPs. CCDN's three-stage post-processing (adaptive threshold + 4×4 NMS + k-means++) cuts these to 93 / 0 % / 0 (Tables 1–2).
MSE under extreme class imbalance. Without per-class normalisation the per-positive gradient is $\sim 10^4\times$ smaller than the per-negative gradient. Convergence is slow and the trained-positive response is at risk of being weak.
Subsampled output grid. Non-unit-stride max-pooling means the output map is spatially coarser than the input; the localisation error floor is bounded below by the pooling stride. No subpixel refinement is built in.
Strictly dominated by CCDN on reported benchmarks. No regime is reported in which MATE outperforms CCDN on uEye or GoPro across any of the four metrics (mean error, missed rate, double-detection rate, false-positive count).

When to choose MATE over CCDN

CCDN (Chen et al., 2023) is MATE's direct architectural successor, reusing the per-pixel-response formulation but doubling the depth (six convolutions vs three), replacing MSE with positive-negative-balanced cross-entropy, enforcing stride-1 max-pools to preserve input resolution, and adding adaptive-threshold + NMS + k-means++ post-processing. CCDN supersedes MATE on every reported metric.

	MATE (2016)	CCDN (2023)
Convolutional layers	3	6
Parameters	2,939	16,301 ( $\sim 5.5\times$ )
Loss	MSE	weighted cross-entropy + L2
Output resolution	subsampled (max-pool stride > 1)	full input resolution (stride-1 max-pool)
Threshold	fixed 0.5	adaptive $0.5\cdot\max$
Post-processing	none	4 × 4 NMS + k-means++ ( $k=10$ , drop $N_i < 2$ )
uEye mean error / missed / doubles / FP	1.009 px / 3.065 % / 0.809 % / 492	0.812 / 1.169 % / 0 % / 93
GoPro mean error / missed / doubles / FP	0.835 px / 4.566 % / 4.556 % / 389	0.576 / 0.907 % / 0 % / 0

MATE remains relevant in three narrow situations:

Extreme parameter budget. On deeply embedded hardware where the $5.5\times$ smaller weight count is the binding constraint and accuracy under distortion is not critical.
Trivial post-processing. Where the calling code cannot host a k-means++ + NMS pipeline and a single fixed threshold is the only viable inference path.
Historical baseline. When reproducing MATE's numbers for a paper comparing successive generations of learned checkerboard detectors.

For production calibration use under any non-trivial imaging conditions (lens distortion, low contrast, partial visibility), choose CCDN.

When to choose MATE over CCS

CCS (Zhang et al., RA-L 2022) is a full learning-based calibration pipeline that pairs a UNet 2D-Gaussian heatmap detector with sub-pixel Gaussian surface fitting and image-level RANSAC over Zhang's planar calibration. The detection stage operates at sub-pixel accuracy (CCS Table III: 0.78 / 0.51 / 0.71 px under noise / bad lighting / distortion) versus MATE's pixel-level response map without built-in refinement. CCS supersedes MATE for end-to-end sub-pixel calibration tasks; MATE's role narrows to scenarios where only a lightweight integer-grid response is needed and the surrounding pipeline owns refinement, distortion correction, and parameter estimation.

	MATE (2016)	CCS (2022)
Scope	Detector only	Full calibration pipeline (detection + distortion correction + estimation)
Detection output	Per-pixel response map; integer-grid accuracy	UNet 2D-Gaussian heatmap; SVD-based Gaussian surface-fit sub-pixel coordinates with confidence $\sigma$
Distortion correction	None	CNN-regressed radial correction model (5 parameters) preceding detection
Outlier handling	None (every supra-threshold pixel emits a candidate)	Distribution-aware $\sigma$ rejection upstream + image-level RANSAC over views downstream
Parameter estimation	Not included	Built-in Zhang + RANSAC view selection
Implementation status	No verified public implementation	Official PyTorch release (MIT) with detection weights

Choose MATE when only a lightweight learned X-corner response is needed and the calling system already owns sub-pixel refinement, distortion handling, and parameter estimation — for example, a research baseline studying a single-stage CNN detector in isolation. Choose CCS when the goal is an end-to-end calibration pipeline with sub-pixel accuracy and the camera's distortion can be matched to the CCS training distribution.

Implementations

No public implementation of MATE has been verified at the time of writing this stub. CCDN's reference implementation (https://github.com/AnkaChan/new_chessboards_test) reproduces MATE's architecture as a comparison baseline; readers seeking a runnable MATE should verify the architecture in that repository against the original paper before relying on it.

References

S. Donné, J. De Vylder, B. Goossens, W. Philips. MATE: Machine Learning for Adaptive Calibration Template Detection. MDPI Sensors 16(11):1858, 2016. doi:10.3390/s16111858
B. Chen, C. Xiong, Q. Zhang. CCDN: Checkerboard Corner Detection Network for Robust Camera Calibration. arXiv 2302.05097, 2023. arXiv
S. Placht, P. Fürsattel, E. Mengue, H. Hofmann, C. Schaller, M. Balda, E. Angelopoulou. ROCHADE: Robust Checkerboard Advanced Detection for Camera Calibration. ECCV 2014, 766–779.
S. Bennett, J. Lasenby. ChESS — Quick and Robust Detection of Chess-board Features. arXiv 1301.5491, 2013.
M. Rufli, D. Scaramuzza, R. Siegwart. Automatic Detection of Checkerboards on Blurred and Distorted Images. IROS 2008, 3121–3126.
Y. Zhang, X. Zhao, D. Qian. Learning-Based Distortion Correction and Feature Detection for High Precision and Robust Camera Calibration. IEEE Robotics and Automation Letters 7(4):10470–10477, 2022. arXiv

MATE