All posts
Security 25 February 2026 · 11 min read

From 3 Signals to 10: How We Rebuilt Document Fraud Detection

A forger edits the name on a South African ID card in Photoshop. The text is pixel-perfect. The JPEG compression looks clean. But the text sharpness in the edited region doesn't match the rest of the card, the color histogram has gaps that real ink doesn't produce, and the barcode on the back still says the original name. Our old 3-signal pipeline would have caught two of those. The new one catches all three — plus seven more attack surfaces we weren't even looking at.

Why 3 Signals Weren't Enough

FaceVault's original document fraud pipeline ran three signals: moire FFT (screen capture detection), Error Level Analysis (splice detection), and EXIF metadata checks. A fourth signal — barcode decoding — extracted data but never scored it. These three signals were good at catching the obvious attacks: photographing a screen, crude Photoshop splices, GAN-generated images with no EXIF.

But document fraud has evolved. Attackers don't just put an ID on a screen anymore. They print high-quality reproductions. They use AI inpainting to edit individual fields while preserving compression artifacts. They photograph real cards and swap just the face photo. Each of these attacks slips through at least one of the original three signals.

Gap 1

No physical document validation

We never checked whether the image contained an actual card-shaped object with the right aspect ratio. A cropped screenshot of a document could pass all three checks.

Gap 2

Barcode data was decoded but never compared

We extracted PDF417/QR data from barcodes but stored it as informational. The obvious next step — comparing barcode fields against OCR/MRZ data — was left on the roadmap.

Gap 3

No text-level forensics

Sophisticated forgeries edit individual text fields (name, DOB) while leaving everything else untouched. ELA catches crude splices, but AI inpainting tools match compression artifacts well enough to fool block-level analysis. We needed character-level sharpness consistency checks.

Gap 4

MRZ check digits were extracted but never scored

PassportEye already validated MRZ check digits per field. We stored the results but never factored them into the fraud score. A forged MRZ with failing check digits was treated the same as a valid one.

The upgrade closes all four gaps. Seven new signals bring the total to ten, following the same weighted fusion pattern we use in our anti-spoofing pipeline. Each signal is independent, scores 0–1, and missing signals are excluded with weight redistribution.

The 10-Signal Fusion Architecture

Every signal follows the same contract: take an input, return {"score": float, "details": dict}. A score of 1.0 means "this looks authentic". A score near 0.0 means "this looks forged". Signals that can't run (missing data, library unavailable) return "skip": true and are excluded from fusion.

Weighted fusion
FRAUD_SIGNAL_WEIGHTS = {
    # Image-based (Phase A — run on upload)
    "moire":             0.35,   # FFT screen detection
    "ela":               0.20,   # Error Level Analysis
    "exif":              0.10,   # EXIF metadata
    "edge_detection":    0.15,   # Card boundary + aspect ratio
    "color_consistency": 0.15,   # Histogram gamut + correlation
    "face_on_id":        0.10,   # ID face quality check
    "recapture":         0.15,   # Glare, bezels, perspective
    "text_sharpness":    0.10,   # Block-level text consistency
    # Cross-reference (Phase B — run after MRZ extraction)
    "barcode_xref":      0.15,   # Barcode vs MRZ/OCR fields
    "mrz_check_digits":  0.10,   # PassportEye check digit validation
}
# Sum = 1.55 (intentional)
# Divide by available weight sum — same as anti_spoofing.py

The weights sum to more than 1.0. This is intentional. At runtime, we divide by the sum of available weights — signals that returned data. If barcode cross-reference can't run (no barcode found), its 0.15 weight is redistributed proportionally across the other nine signals. This guarantees the final score is always 0–1 regardless of how many signals fired.

Same pattern, proven at scale. Our anti-spoofing pipeline has used this exact fusion pattern with 12 signals since launch. Weight redistribution handles everything from desktop browsers (no rPPG, no gyroscope) to locked-down corporate phones (no camera_trust). The document fraud pipeline now benefits from the same resilience.

Signal 1: Moire FFT

Weight: 0.35

The original signal, now with a reduced weight. When a camera photographs a screen, the screen's pixel grid creates interference patterns visible as directional spikes in the mid-frequency annulus of the 2D FFT. Three sub-signals: angular peak ratio (moire directionality), spectral flatness (1/f² deviation), and HF/LF energy ratio (synthetic content).

Weight reduced from 0.50 to 0.35 because the new recapture and color signals now share responsibility for screen detection. Moire remains the strongest single indicator for direct screen photography, but it misses high-DPI screens and angled captures that the new signals catch.

Signal 2: Error Level Analysis

Weight: 0.20

Re-saves the JPEG at Q75 and Q90, computes pixel-wise difference from the original. Spliced regions show different compression artifacts because they were last saved at a different quality level. Also detects double-compression (screen photo → camera → server) via the Q75/Q90 error ratio.

Weight reduced from 0.30 to 0.20. ELA is reliable for crude splices but increasingly unreliable against AI inpainting tools that match the surrounding compression level. The new text sharpness signal picks up where ELA falls short.

Signal 3: EXIF Metadata

Weight: 0.10

Flags editing software tags (Photoshop, GIMP, Canva), completely stripped metadata (GAN indicator), missing camera info, and suspicious dimensions. Weight reduced from 0.20 to 0.10 — EXIF is easily spoofable and our server intentionally strips EXIF on re-save, so this signal carries less forensic value after processing.

Signal 4: Document Edge Detection

New — Weight: 0.15

A real ID card photographed on a surface should have a visible rectangular boundary. A cropped screenshot or a full-bleed scan won't. This signal uses Canny edge detection, contour extraction, and polygon approximation to find the card outline.

Detection logic
# Canny + contour extraction
GaussianBlur(5,5) → Canny(50,150) → dilate
findContours → sort by area
approxPolyDP → look for 4-point polygon

# Scoring
if quad found AND aspect ratio within 0.15 of ISO 7810:  1.0
if quad found AND non-standard aspect ratio:              0.85
if 5-6 sided polygon (partial edges):                     0.6
if no card edges found:                                   0.3

The ISO 7810 ID-1 standard specifies 85.6 × 53.98mm (aspect ratio 1.586). Passports (ID-3) are 125 × 88mm (1.420). We check the detected quad's aspect ratio against both standards. A card photographed at an angle will produce a non-rectangular quadrilateral, which minAreaRect normalizes before comparing.

Catches: Cropped screenshots, digital scans without borders, partial document crops

× Weak against: Physical card photographed flush against a same-color surface (edges blend into background)

Signal 5: Color Consistency

New — Weight: 0.15

Screens have a fundamentally different color signature than physical objects. A camera photograph of ink on plastic has wide gamut (many histogram bins occupied), high cross-channel correlation (natural lighting creates correlated R/G/B responses), and many histogram peaks (complex surface textures). Screens compress all of these.

Gamut coverage % of 256 bins with >0.1% pixels. Natural: 60–90%. Screen: 30–50%. Weighted 0.35
Cross-channel R Pearson correlation between B–G, B–R, G–R channels. Natural: 0.6–0.95. Screen: 0.3–0.6. Weighted 0.35
Peak clustering Number of distinct peaks per histogram. Real ink & plastic = many peaks. Screen subpixels = clustered peaks. Weighted 0.30

This signal complements moire FFT. Moire catches the pixel grid interference pattern; color consistency catches the gamut limitation. A high-DPI Retina screen might not produce visible moire, but it still compresses the color gamut compared to physical ink on polycarbonate.

Signal 6: Face-on-ID Quality

New — Weight: 0.10

Every ID card has a face photo on it. This signal detects that face using a Haar cascade, then compares its Laplacian variance (sharpness) against the surrounding document region. A genuine card was printed as a single unit — the face photo and the text should have similar sharpness. A pasted face will have a measurably different sharpness profile.

Sharpness analysis
# Detect face on the ID card
Haar cascade → largest face ROI

# Compare sharpness
face_blur  = Laplacian(face_ROI).var()
doc_blur   = Laplacian(document_excluding_face).var()
ratio      = face_blur / doc_blur

# Scoring
face sharp (≥ 35) AND ratio 0.5–2.0:   1.0  (consistent)
face sharp AND ratio < 0.3 or > 3.0:     0.3  (paste artifact)
face blurry (15–35):                     0.6
face very blurry (< 15):                  0.4
no face detected:                          0.5  (neutral)

The 35.0 Laplacian threshold is the same one we use for the selfie blur gate in our face analysis pipeline. When no face is detected (some ID cards have very small photos, or passport bio pages at odd angles), the signal returns a neutral 0.5 and is effectively excluded from the fusion.

Signal 7: Recapture Artifacts

New — Weight: 0.15

When someone photographs a document displayed on a screen, the resulting image carries physical artifacts beyond moire patterns. This signal catches three of them:

Glare detection

Threshold at 240 brightness, find connected bright clusters. Localized glare (a few large bright spots) is characteristic of screen photography — the camera flash or ambient light reflecting off the glass surface. Diffuse overexposure is different from clustered glare. Weighted 0.40.

Bezel detection

Compare mean intensity of 5% border strips against the image center. If 3–4 borders are significantly darker than the center, the image was likely captured with the screen bezel visible. This catches the common mistake of not cropping tightly enough around the document on screen. Weighted 0.35.

Perspective distortion

HoughLinesP extracts dominant lines, then measures how far they deviate from 0°/90° axes. A flat document scan has perfectly aligned lines. A hand-held photo has mild tilt (<8°). A recaptured screen photo taken at an angle shows significant angular variance (>15°). Weighted 0.25.

Complementary to moire. Moire FFT detects the frequency-domain fingerprint of screen pixel grids. Recapture artifact detection catches the spatial-domain evidence: light reflections, physical bezels, geometric distortion. Together, they cover both the electromagnetic and physical signatures of screen recapture.

Signal 8: Text Sharpness Consistency

New — Weight: 0.10

This is the signal we're most excited about. When a forger edits text on an ID — changing a name, altering a date of birth — the edited text almost always has a different sharpness profile than the original printing. Even if the font and size match perfectly, the rendering pipeline is different: the original text was laser-printed or offset-lithographed on polycarbonate, while the edit was rasterized in software and re-compressed as JPEG.

Detection logic
# Isolate dark text pixels via HSV
text_mask = (V < 80) OR (V < 120 AND S < 60)

# Divide into 8×8 grid
for each block:
    if text_pixels > 5% of block:
        variance = Laplacian(block).var()
        record variance

# Score coefficient of variation (CoV = std/mean)
CoV < 0.3:    1.0  (uniform — authentic)
CoV 0.3–0.5:  0.8  (mild — angled capture)
CoV 0.5–0.8:  0.4  (suspicious editing)
CoV > 0.8:    0.2  (clear text pasting)
< 4 blocks:   0.5  (insufficient data)

The HSV dark-text mask is adapted from our OCR pipeline, which uses the same technique to isolate text for extraction. The key metric is coefficient of variation (standard deviation divided by mean) of Laplacian variance across blocks. A genuine card has uniform text sharpness everywhere. An edited card has anomalous sharpness in the edited regions.

Catches: Text field editing (name/DOB changes), AI inpainting with different rendering, pasted text overlays

× Weak against: Full-page reprints where all text was re-rendered at once (uniform sharpness), documents with very little text

Signal 9: Barcode Cross-Reference

New — Phase B — Weight: 0.15

Many ID documents carry machine-readable barcodes — PDF417 on US/Canadian driver's licenses, QR codes on South African IDs, Data Matrix on some European cards. These barcodes encode the same data visible on the card: name, date of birth, document number. A forger who edits the visible text almost never edits the barcode, because barcode formats use checksums and structured encoding that's harder to manipulate.

This signal compares three fields between barcode data and MRZ/OCR-extracted data: name (Jaccard token match), date of birth (normalized date comparison), and document number (exact match after normalization).

All 3 fields match 1.0
2 of 3 fields match 0.7
1 of 3 fields match 0.3
0 fields match 0.1
No barcode found 0.5 (excluded)

Not all documents have barcodes (passports generally don't, most European IDs don't). When no barcode is found, the signal returns skip: true and its weight is redistributed. But when a barcode is present and disagrees with the OCR/MRZ data, that's one of the strongest fraud indicators possible — nearly zero false positive rate.

Signal 10: MRZ Check Digits

New — Phase B — Weight: 0.10

The Machine Readable Zone on passports and some ID cards includes check digits — single-digit checksums calculated from specific fields (document number, date of birth, expiry date, composite). These are mathematically derived from the field values. A legitimate MRZ has all check digits valid. A forged MRZ where the attacker changed a field but forgot to recalculate the check digit will fail validation.

All check digits valid 1.0
1 failure 0.5
2 failures 0.25
3+ failures 0.15
No MRZ / OCR source 0.5 (excluded)

A single check digit failure can be a scanning error (OCR misread, damaged card). Two or more failures are statistically very unlikely on a genuine document — it almost certainly means the MRZ was manually edited without recalculating the checksums. The signal is excluded for OCR-sourced extractions (no check digits available) and for documents without MRZ lines.

Two-Phase Execution

Not all signals can run at the same time. The image-based signals (1–8) only need the raw photo. The cross-reference signals (9–10) need MRZ/OCR data, which is extracted in a separate background thread. So we split execution into two phases:

Phase A During ID upload — ~580ms

All 8 image-based signals run in the background thread alongside MRZ extraction, but before the EXIF strip and resize. Results are written to the database immediately so /complete can read them even if MRZ extraction is still running.

Phase B After MRZ extraction — ~6ms

Once MRZ extraction completes, enhance_fraud_with_document_data() reads the existing fraud result from the database, computes the two cross-reference signals, adds them to the signal map, recomputes the fused score, and writes the enhanced result back. If MRZ extraction fails, Phase B is skipped and the score from Phase A stands.

Race condition handled. The /complete endpoint can fire before Phase B finishes. That's fine — it reads whatever fraud data is available. If only Phase A has run, the score is computed from 8 signals. If Phase B has also run, the score includes all 10. The trust engine sees an opaque doc_fraud_score float either way. No code changes needed downstream.

By the Numbers

10

Independent fraud signals

8

Image-based (Phase A)

2

Cross-reference (Phase B)

580

Phase A milliseconds

6

Phase B milliseconds

44

Unit tests

Every Signal Is a Bet Against the Attacker

The old pipeline made three bets: moire will catch screen captures, ELA will catch splices, EXIF will catch synthetics. Three bets is not enough when attackers have access to AI inpainting, high-DPI screens, and metadata spoofing tools.

The new pipeline makes ten bets. Edge geometry. Color physics. Sharpness forensics. Barcode integrity. MRZ mathematics. Each bet exploits a different physical or informational constraint. An attacker who beats moire still has to explain why the color gamut looks like a screen. An attacker who matches the text sharpness still has to deal with the barcode that says a different name. An attacker who recalculates the MRZ check digits still has to produce correct edge geometry.

And because the fusion engine redistributes weight dynamically, we can keep adding signals without touching the trust engine, the session flow, or the database schema. Same JSON column, richer content, better decisions.

Related Posts

Deepfake Defense: An IDS/IPS for Identity Verification — The anti-spoofing pipeline that inspired this architecture

Why We Rebuilt Our OCR Pipeline From Scratch — The text extraction engine that feeds barcode cross-reference

How FaceVault Verifies a Face in Under 30 Seconds — The full ML pipeline from upload to decision

Your Face Is Encrypted Before It Hits Disk — How all fraud signal data is encrypted at rest via Vault Transit