From 3 Signals to 10: How We Rebuilt
Document Fraud Detection
A forger edits the name on a South African ID card in Photoshop. The text is pixel-perfect. The JPEG compression looks clean. But the text sharpness in the edited region doesn't match the rest of the card, the color histogram has gaps that real ink doesn't produce, and the barcode on the back still says the original name. Our old 3-signal pipeline would have caught two of those. The new one catches all three — plus seven more attack surfaces we weren't even looking at.
Why 3 Signals Weren't Enough
FaceVault's original document fraud pipeline ran three signals: moire FFT (screen capture detection), Error Level Analysis (splice detection), and EXIF metadata checks. A fourth signal — barcode decoding — extracted data but never scored it. These three signals were good at catching the obvious attacks: photographing a screen, crude Photoshop splices, GAN-generated images with no EXIF.
But document fraud has evolved. Attackers don't just put an ID on a screen anymore. They print high-quality reproductions. They use AI inpainting to edit individual fields while preserving compression artifacts. They photograph real cards and swap just the face photo. Each of these attacks slips through at least one of the original three signals.
No physical document validation
We never checked whether the image contained an actual card-shaped object with the right aspect ratio. A cropped screenshot of a document could pass all three checks.
Barcode data was decoded but never compared
We extracted PDF417/QR data from barcodes but stored it as informational. The obvious next step — comparing barcode fields against OCR/MRZ data — was left on the roadmap.
No text-level forensics
Sophisticated forgeries edit individual text fields (name, DOB) while leaving everything else untouched. ELA catches crude splices, but AI inpainting tools match compression artifacts well enough to fool block-level analysis. We needed character-level sharpness consistency checks.
MRZ check digits were extracted but never scored
PassportEye already validated MRZ check digits per field. We stored the results but never factored them into the fraud score. A forged MRZ with failing check digits was treated the same as a valid one.
The upgrade closes all four gaps. Seven new signals bring the total to ten, following the same weighted fusion pattern we use in our anti-spoofing pipeline. Each signal is independent, scores 0–1, and missing signals are excluded with weight redistribution.
The 10-Signal Fusion Architecture
Every signal follows the same contract: take an input, return {"score": float, "details": dict}. A score of 1.0 means "this looks authentic". A score near 0.0 means "this looks forged". Signals that can't run (missing data, library unavailable) return "skip": true and are excluded from fusion.
FRAUD_SIGNAL_WEIGHTS = {
# Image-based (Phase A — run on upload)
"moire": 0.35, # FFT screen detection
"ela": 0.20, # Error Level Analysis
"exif": 0.10, # EXIF metadata
"edge_detection": 0.15, # Card boundary + aspect ratio
"color_consistency": 0.15, # Histogram gamut + correlation
"face_on_id": 0.10, # ID face quality check
"recapture": 0.15, # Glare, bezels, perspective
"text_sharpness": 0.10, # Block-level text consistency
# Cross-reference (Phase B — run after MRZ extraction)
"barcode_xref": 0.15, # Barcode vs MRZ/OCR fields
"mrz_check_digits": 0.10, # PassportEye check digit validation
}
# Sum = 1.55 (intentional)
# Divide by available weight sum — same as anti_spoofing.py The weights sum to more than 1.0. This is intentional. At runtime, we divide by the sum of available weights — signals that returned data. If barcode cross-reference can't run (no barcode found), its 0.15 weight is redistributed proportionally across the other nine signals. This guarantees the final score is always 0–1 regardless of how many signals fired.
Signal 1: Moire FFT
The original signal, now with a reduced weight. When a camera photographs a screen, the screen's pixel grid creates interference patterns visible as directional spikes in the mid-frequency annulus of the 2D FFT. Three sub-signals: angular peak ratio (moire directionality), spectral flatness (1/f² deviation), and HF/LF energy ratio (synthetic content).
Weight reduced from 0.50 to 0.35 because the new recapture and color signals now share responsibility for screen detection. Moire remains the strongest single indicator for direct screen photography, but it misses high-DPI screens and angled captures that the new signals catch.
Signal 2: Error Level Analysis
Re-saves the JPEG at Q75 and Q90, computes pixel-wise difference from the original. Spliced regions show different compression artifacts because they were last saved at a different quality level. Also detects double-compression (screen photo → camera → server) via the Q75/Q90 error ratio.
Weight reduced from 0.30 to 0.20. ELA is reliable for crude splices but increasingly unreliable against AI inpainting tools that match the surrounding compression level. The new text sharpness signal picks up where ELA falls short.
Signal 3: EXIF Metadata
Flags editing software tags (Photoshop, GIMP, Canva), completely stripped metadata (GAN indicator), missing camera info, and suspicious dimensions. Weight reduced from 0.20 to 0.10 — EXIF is easily spoofable and our server intentionally strips EXIF on re-save, so this signal carries less forensic value after processing.
Signal 4: Document Edge Detection
A real ID card photographed on a surface should have a visible rectangular boundary. A cropped screenshot or a full-bleed scan won't. This signal uses Canny edge detection, contour extraction, and polygon approximation to find the card outline.
# Canny + contour extraction
GaussianBlur(5,5) → Canny(50,150) → dilate
findContours → sort by area
approxPolyDP → look for 4-point polygon
# Scoring
if quad found AND aspect ratio within 0.15 of ISO 7810: 1.0
if quad found AND non-standard aspect ratio: 0.85
if 5-6 sided polygon (partial edges): 0.6
if no card edges found: 0.3
The ISO 7810 ID-1 standard specifies 85.6 × 53.98mm (aspect ratio 1.586). Passports (ID-3) are 125 × 88mm (1.420). We check the detected quad's aspect ratio against both standards. A card photographed at an angle will produce a non-rectangular quadrilateral, which minAreaRect normalizes before comparing.
✓ Catches: Cropped screenshots, digital scans without borders, partial document crops
× Weak against: Physical card photographed flush against a same-color surface (edges blend into background)
Signal 5: Color Consistency
Screens have a fundamentally different color signature than physical objects. A camera photograph of ink on plastic has wide gamut (many histogram bins occupied), high cross-channel correlation (natural lighting creates correlated R/G/B responses), and many histogram peaks (complex surface textures). Screens compress all of these.
This signal complements moire FFT. Moire catches the pixel grid interference pattern; color consistency catches the gamut limitation. A high-DPI Retina screen might not produce visible moire, but it still compresses the color gamut compared to physical ink on polycarbonate.
Signal 6: Face-on-ID Quality
Every ID card has a face photo on it. This signal detects that face using a Haar cascade, then compares its Laplacian variance (sharpness) against the surrounding document region. A genuine card was printed as a single unit — the face photo and the text should have similar sharpness. A pasted face will have a measurably different sharpness profile.
# Detect face on the ID card
Haar cascade → largest face ROI
# Compare sharpness
face_blur = Laplacian(face_ROI).var()
doc_blur = Laplacian(document_excluding_face).var()
ratio = face_blur / doc_blur
# Scoring
face sharp (≥ 35) AND ratio 0.5–2.0: 1.0 (consistent)
face sharp AND ratio < 0.3 or > 3.0: 0.3 (paste artifact)
face blurry (15–35): 0.6
face very blurry (< 15): 0.4
no face detected: 0.5 (neutral) The 35.0 Laplacian threshold is the same one we use for the selfie blur gate in our face analysis pipeline. When no face is detected (some ID cards have very small photos, or passport bio pages at odd angles), the signal returns a neutral 0.5 and is effectively excluded from the fusion.
Signal 7: Recapture Artifacts
When someone photographs a document displayed on a screen, the resulting image carries physical artifacts beyond moire patterns. This signal catches three of them:
Glare detection
Threshold at 240 brightness, find connected bright clusters. Localized glare (a few large bright spots) is characteristic of screen photography — the camera flash or ambient light reflecting off the glass surface. Diffuse overexposure is different from clustered glare. Weighted 0.40.
Bezel detection
Compare mean intensity of 5% border strips against the image center. If 3–4 borders are significantly darker than the center, the image was likely captured with the screen bezel visible. This catches the common mistake of not cropping tightly enough around the document on screen. Weighted 0.35.
Perspective distortion
HoughLinesP extracts dominant lines, then measures how far they deviate from 0°/90° axes. A flat document scan has perfectly aligned lines. A hand-held photo has mild tilt (<8°). A recaptured screen photo taken at an angle shows significant angular variance (>15°). Weighted 0.25.
Signal 8: Text Sharpness Consistency
This is the signal we're most excited about. When a forger edits text on an ID — changing a name, altering a date of birth — the edited text almost always has a different sharpness profile than the original printing. Even if the font and size match perfectly, the rendering pipeline is different: the original text was laser-printed or offset-lithographed on polycarbonate, while the edit was rasterized in software and re-compressed as JPEG.
# Isolate dark text pixels via HSV
text_mask = (V < 80) OR (V < 120 AND S < 60)
# Divide into 8×8 grid
for each block:
if text_pixels > 5% of block:
variance = Laplacian(block).var()
record variance
# Score coefficient of variation (CoV = std/mean)
CoV < 0.3: 1.0 (uniform — authentic)
CoV 0.3–0.5: 0.8 (mild — angled capture)
CoV 0.5–0.8: 0.4 (suspicious editing)
CoV > 0.8: 0.2 (clear text pasting)
< 4 blocks: 0.5 (insufficient data) The HSV dark-text mask is adapted from our OCR pipeline, which uses the same technique to isolate text for extraction. The key metric is coefficient of variation (standard deviation divided by mean) of Laplacian variance across blocks. A genuine card has uniform text sharpness everywhere. An edited card has anomalous sharpness in the edited regions.
✓ Catches: Text field editing (name/DOB changes), AI inpainting with different rendering, pasted text overlays
× Weak against: Full-page reprints where all text was re-rendered at once (uniform sharpness), documents with very little text
Signal 9: Barcode Cross-Reference
Many ID documents carry machine-readable barcodes — PDF417 on US/Canadian driver's licenses, QR codes on South African IDs, Data Matrix on some European cards. These barcodes encode the same data visible on the card: name, date of birth, document number. A forger who edits the visible text almost never edits the barcode, because barcode formats use checksums and structured encoding that's harder to manipulate.
This signal compares three fields between barcode data and MRZ/OCR-extracted data: name (Jaccard token match), date of birth (normalized date comparison), and document number (exact match after normalization).
Not all documents have barcodes (passports generally don't, most European IDs don't). When no barcode is found, the signal returns skip: true and its weight is redistributed. But when a barcode is present and disagrees with the OCR/MRZ data, that's one of the strongest fraud indicators possible — nearly zero false positive rate.
Signal 10: MRZ Check Digits
The Machine Readable Zone on passports and some ID cards includes check digits — single-digit checksums calculated from specific fields (document number, date of birth, expiry date, composite). These are mathematically derived from the field values. A legitimate MRZ has all check digits valid. A forged MRZ where the attacker changed a field but forgot to recalculate the check digit will fail validation.
A single check digit failure can be a scanning error (OCR misread, damaged card). Two or more failures are statistically very unlikely on a genuine document — it almost certainly means the MRZ was manually edited without recalculating the checksums. The signal is excluded for OCR-sourced extractions (no check digits available) and for documents without MRZ lines.
Two-Phase Execution
Not all signals can run at the same time. The image-based signals (1–8) only need the raw photo. The cross-reference signals (9–10) need MRZ/OCR data, which is extracted in a separate background thread. So we split execution into two phases:
All 8 image-based signals run in the background thread alongside MRZ extraction, but before the EXIF strip and resize. Results are written to the database immediately so /complete can read them even if MRZ extraction is still running.
Once MRZ extraction completes, enhance_fraud_with_document_data() reads the existing fraud result from the database, computes the two cross-reference signals, adds them to the signal map, recomputes the fused score, and writes the enhanced result back. If MRZ extraction fails, Phase B is skipped and the score from Phase A stands.
/complete endpoint can fire before Phase B finishes. That's fine — it reads whatever fraud data is available. If only Phase A has run, the score is computed from 8 signals. If Phase B has also run, the score includes all 10. The trust engine sees an opaque doc_fraud_score float either way. No code changes needed downstream.
By the Numbers
10
Independent fraud signals
8
Image-based (Phase A)
2
Cross-reference (Phase B)
580
Phase A milliseconds
6
Phase B milliseconds
44
Unit tests
Every Signal Is a Bet Against the Attacker
The old pipeline made three bets: moire will catch screen captures, ELA will catch splices, EXIF will catch synthetics. Three bets is not enough when attackers have access to AI inpainting, high-DPI screens, and metadata spoofing tools.
The new pipeline makes ten bets. Edge geometry. Color physics. Sharpness forensics. Barcode integrity. MRZ mathematics. Each bet exploits a different physical or informational constraint. An attacker who beats moire still has to explain why the color gamut looks like a screen. An attacker who matches the text sharpness still has to deal with the barcode that says a different name. An attacker who recalculates the MRZ check digits still has to produce correct edge geometry.
And because the fusion engine redistributes weight dynamically, we can keep adding signals without touching the trust engine, the session flow, or the database schema. Same JSON column, richer content, better decisions.
Related Posts
Deepfake Defense: An IDS/IPS for Identity Verification — The anti-spoofing pipeline that inspired this architecture
Why We Rebuilt Our OCR Pipeline From Scratch — The text extraction engine that feeds barcode cross-reference
How FaceVault Verifies a Face in Under 30 Seconds — The full ML pipeline from upload to decision
Your Face Is Encrypted Before It Hits Disk — How all fraud signal data is encrypted at rest via Vault Transit