MATE | VitaVision
Back to atlas

MATE

8 min readIntermediateStubno public implcnn2,939View in graph
Based on
MATE: Machine Learning for Adaptive Calibration Template Detection
Donné, Vylder, Goossens, Philips · Sensors 2016
DOI ↗

Motivation

Detect inner corners of a planar checkerboard pattern in a greyscale image without requiring the pattern's square count (r×c)(r \times c) as a prior — and do it with a learned CNN rather than a hand-crafted gradient or saddle-fitting pipeline. MATE (Donné et al., Sensors 2016) is the first deep-learning checkerboard X-corner detector and is the direct architectural ancestor that CCDN (Chen et al., 2023) extends and supersedes.

Prior work — ChESS, ROCHADE, OCamCalib (Rufli 2008) — was entirely hand-crafted: ring-sampling, gradient-magnitude centrelines, saddle-fitting refinement. MATE's contribution is to show that a minimal three-layer CNN trained on labelled checkerboard images can learn the X-corner signature directly from data, and to do so without any prior on the pattern's grid size.

Architecture

Stub note. The full paper PDF (MDPI Sensors 16(11)

, 2016) was not accessible during ingestion (HTTP 403 to automated fetchers). The architecture sketch below is reconstructed from the CCDN paper's explicit comparisons and from the donne2016-mate index.yaml notes. Specifics (channel counts, exact kernel sizes per layer, training hyperparameters) are not available from secondary sources and are marked accordingly.

Family & shape. Fully-convolutional CNN. Input: greyscale image XRH×WX \in \mathbb{R}^{H \times W}. Output: per-pixel corner-response map at a spatial resolution coarser than the input (CCDN §2.1 explicitly contrasts MATE's subsampled-grid output with CCDN's stride-1 max-pool design, which preserves resolution).

Depth and parameter budget. Three convolutional layers with ReLU non-linearities. Total 2,939 parameters — roughly 5.5×5.5\times smaller than CCDN's 16,301 (CCDN §2.1; CCDN page Complexity).

First-layer kernel. CCDN §2.1 cites MATE's own analysis of the spatial-support radius trade-off: a too-small first-layer kernel admits background false detections; a too-large kernel loses recall on true corners. MATE establishes the radius-4 target that CCDN inherits (9 × 9 first-layer kernel). MATE's exact first-layer size is not directly recorded in the available secondary sources but is consistent with a 7 × 7 kernel given the parameter-count differential.

Loss. Mean-squared error between the predicted response map and a per-pixel binary corner mask. Critically, no positive/negative class balancing is applied — at VGA the positive-label fraction is approximately 10410^{-4}, so the per-positive gradient contribution is dwarfed by the aggregate background gradient under MSE. CCDN's Fig. 4 shows MATE's MSE training "started out much more slowly for the first 150 epochs" relative to CCDN's weighted cross-entropy.

Post-processing. A fixed threshold of 0.50.5 is applied to the response map (CCDN §2.2). No non-maximum suppression. No spatial clustering. Every supra-threshold pixel becomes a corner candidate.

Assessment

Novelty

  • First learned X-corner detector for checkerboards. Prior work (ROCHADE, ChESS, OCamCalib) was hand-crafted; MATE established that a minimal CNN can replace those pipelines.
  • Pattern-agnostic by design. The per-pixel response formulation removes the need for (r×c)(r \times c) at inference — an inherited property in CCDN, XFeat, and any subsequent pattern-aware-but-grid-free detector.
  • Compact (2,939 parameters). Demonstrates that a three-layer network suffices to capture the X-corner signature in clean, calibration-controlled imagery.

Limitations

  • Fixed 0.5 threshold is scene-independent. ReLU outputs are unbounded above. A corner whose response is 2.3 is correctly detected, but a corner whose response is 0.4 (low-contrast image, distant camera, strong blur) is missed with no recourse. CCDN §2.2 replaces this with an adaptive 0.5max0.5\cdot\max rule.
  • No NMS or clustering. Every supra-threshold pixel emits a corner candidate. On the ROCHADE uEye dataset MATE produces 492 false positives; on the GoPro dataset (strong radial distortion) MATE produces 4.556 % double detections and 389 FPs. CCDN's three-stage post-processing (adaptive threshold + 4×4 NMS + k-means++) cuts these to 93 / 0 % / 0 (Tables 1–2).
  • MSE under extreme class imbalance. Without per-class normalisation the per-positive gradient is 104×\sim 10^4\times smaller than the per-negative gradient. Convergence is slow and the trained-positive response is at risk of being weak.
  • Subsampled output grid. Non-unit-stride max-pooling means the output map is spatially coarser than the input; the localisation error floor is bounded below by the pooling stride. No subpixel refinement is built in.
  • Strictly dominated by CCDN on reported benchmarks. No regime is reported in which MATE outperforms CCDN on uEye or GoPro across any of the four metrics (mean error, missed rate, double-detection rate, false-positive count).

When to choose MATE over CCDN

CCDN (Chen et al., 2023) is MATE's direct architectural successor, reusing the per-pixel-response formulation but doubling the depth (six convolutions vs three), replacing MSE with positive-negative-balanced cross-entropy, enforcing stride-1 max-pools to preserve input resolution, and adding adaptive-threshold + NMS + k-means++ post-processing. CCDN supersedes MATE on every reported metric.

MATE (2016) CCDN (2023)
Convolutional layers 3 6
Parameters 2,939 16,301 (5.5×\sim 5.5\times)
Loss MSE weighted cross-entropy + L2
Output resolution subsampled (max-pool stride > 1) full input resolution (stride-1 max-pool)
Threshold fixed 0.5 adaptive 0.5max0.5\cdot\max
Post-processing none 4 × 4 NMS + k-means++ (k=10k=10, drop Ni<2N_i < 2)
uEye mean error / missed / doubles / FP 1.009 px / 3.065 % / 0.809 % / 492 0.812 / 1.169 % / 0 % / 93
GoPro mean error / missed / doubles / FP 0.835 px / 4.566 % / 4.556 % / 389 0.576 / 0.907 % / 0 % / 0

MATE remains relevant in three narrow situations:

  • Extreme parameter budget. On deeply embedded hardware where the 5.5×5.5\times smaller weight count is the binding constraint and accuracy under distortion is not critical.
  • Trivial post-processing. Where the calling code cannot host a k-means++ + NMS pipeline and a single fixed threshold is the only viable inference path.
  • Historical baseline. When reproducing MATE's numbers for a paper comparing successive generations of learned checkerboard detectors.

For production calibration use under any non-trivial imaging conditions (lens distortion, low contrast, partial visibility), choose CCDN.

When to choose MATE over CCS

CCS (Zhang et al., RA-L 2022) is a full learning-based calibration pipeline that pairs a UNet 2D-Gaussian heatmap detector with sub-pixel Gaussian surface fitting and image-level RANSAC over Zhang's planar calibration. The detection stage operates at sub-pixel accuracy (CCS Table III: 0.78 / 0.51 / 0.71 px under noise / bad lighting / distortion) versus MATE's pixel-level response map without built-in refinement. CCS supersedes MATE for end-to-end sub-pixel calibration tasks; MATE's role narrows to scenarios where only a lightweight integer-grid response is needed and the surrounding pipeline owns refinement, distortion correction, and parameter estimation.

MATE (2016) CCS (2022)
Scope Detector only Full calibration pipeline (detection + distortion correction + estimation)
Detection output Per-pixel response map; integer-grid accuracy UNet 2D-Gaussian heatmap; SVD-based Gaussian surface-fit sub-pixel coordinates with confidence σ\sigma
Distortion correction None CNN-regressed radial correction model (5 parameters) preceding detection
Outlier handling None (every supra-threshold pixel emits a candidate) Distribution-aware σ\sigma rejection upstream + image-level RANSAC over views downstream
Parameter estimation Not included Built-in Zhang + RANSAC view selection
Implementation status No verified public implementation Official PyTorch release (MIT) with detection weights

Choose MATE when only a lightweight learned X-corner response is needed and the calling system already owns sub-pixel refinement, distortion handling, and parameter estimation — for example, a research baseline studying a single-stage CNN detector in isolation. Choose CCS when the goal is an end-to-end calibration pipeline with sub-pixel accuracy and the camera's distortion can be matched to the CCS training distribution.

Implementations

No public implementation of MATE has been verified at the time of writing this stub. CCDN's reference implementation (https://github.com/AnkaChan/new_chessboards_test) reproduces MATE's architecture as a comparison baseline; readers seeking a runnable MATE should verify the architecture in that repository against the original paper before relying on it.

References

  1. S. Donné, J. De Vylder, B. Goossens, W. Philips. MATE: Machine Learning for Adaptive Calibration Template Detection. MDPI Sensors 16(11)
    , 2016. doi
    .3390/s16111858
  2. B. Chen, C. Xiong, Q. Zhang. CCDN: Checkerboard Corner Detection Network for Robust Camera Calibration. arXiv
    .05097, 2023. arXiv
  3. S. Placht, P. Fürsattel, E. Mengue, H. Hofmann, C. Schaller, M. Balda, E. Angelopoulou. ROCHADE: Robust Checkerboard Advanced Detection for Camera Calibration. ECCV 2014, 766–779.
  4. S. Bennett, J. Lasenby. ChESS — Quick and Robust Detection of Chess-board Features. arXiv
    .5491, 2013.
  5. M. Rufli, D. Scaramuzza, R. Siegwart. Automatic Detection of Checkerboards on Blurred and Distorted Images. IROS 2008, 3121–3126.
  6. Y. Zhang, X. Zhao, D. Qian. Learning-Based Distortion Correction and Feature Detection for High Precision and Robust Camera Calibration. IEEE Robotics and Automation Letters 7(4)
    –10477, 2022. arXiv

Prerequisites

Compared with

  • CCDN
  • CCS

    Different scope: MATE is a detector, CCS is a full calibration pipeline; comparison is at the corner-detection level.

Learned alternative of