Building Privacy-First KYC:
Why We Delete Your Face
It's 2am. I just pushed the last commit of a pre-launch security audit that turned into a 14-hour rabbit hole. My terminal is full of green checkmarks and my coffee is ice cold. Before I crash, I need to talk about something that's been on my mind all night — why most KYC providers treat your biometric data like it's theirs to keep, and why we chose to do the opposite.
The Uncomfortable Truth About KYC Data
Here's something that should bother you more than it probably does: when you scan your passport for a crypto exchange, a neobank, or a fintech app, there's a very good chance your face, your document, and your personal details are sitting on a server somewhere. Indefinitely. Maybe encrypted. Maybe not. You'll never know.
The KYC industry has a dirty secret. Most providers retain biometric data far longer than they need to — sometimes forever. They call it "audit requirements" or "regulatory compliance." And sure, some of that is real. But a lot of it is just inertia. Nobody sat down and asked: do we actually need to keep this person's face photo six months after we verified them?
I've been building FaceVault for months now, and tonight, after hours of running our final security audit and tightening every bolt I could find, I keep coming back to the same thought: the most secure data is data you don't have.
Verify, Then Forget
Here's the core idea behind FaceVault, and honestly it's embarrassingly simple: confirm the person is who they say they are, then get rid of the evidence.
Think about it like a bouncer at a club. They check your ID, they look at your face, they nod and let you in. They don't photocopy your driving licence and file it in a cabinet out back. They don't need to. The verification happened. The answer was yes or no. That's all that matters.
FaceVault works the same way. A user uploads their ID, takes a selfie with built-in liveness detection. Our AI pipeline compares the faces, extracts the document data, returns a result. Then — and this is the part that makes us different — we start counting down to deletion.
Upload & Verify
User submits ID + selfie. AI pipeline runs face matching, MRZ extraction, liveness check. Result: pass or fail.
Webhook Fires
Your backend gets the result: match score, extracted data, pass/fail. You have everything you need.
Clock Starts Ticking
Photos get a retention deadline. Free tier: 7 days. Pro tier: 30 days. After that, they're gone.
Auto-Purge
A daily job deletes expired photos from disk, clears file paths from the database, marks the session as purged. Irreversible.
The verification result — the pass/fail, the match score, the extracted name and date of birth — that stays in the database as an audit record. But the photos? The actual biometric data? Gone. Truly gone. Not moved to cold storage. Not "soft deleted." os.remove(), shutil.rmtree(), overwritten in the database with None.
Auto-Purge: Your Photos Have an Expiry Date
I'm going to show you the actual code, because I think it matters. When people tell you they "delete your data," you should be able to verify that claim. Here's what runs every night at 3am UTC on our server:
async def purge_expired_sessions(db: AsyncSession) -> int:
now = datetime.now(timezone.utc)
# Find every session past its retention deadline
result = await db.execute(
select(VerificationSession).where(
and_(
VerificationSession.retained_until <= now,
VerificationSession.status != "purged",
)
)
)
for session in sessions:
# Delete every photo file from disk
for path_attr in ("id_photo_path", "selfie_photo_path",
"liveness_photo_path"):
photo_path = getattr(session, path_attr)
if photo_path and os.path.exists(full_path):
os.remove(full_path)
setattr(session, path_attr, None) # Clear DB reference
# Nuke the entire session directory
shutil.rmtree(session_dir, ignore_errors=True)
session.status = "purged"
await db.commit()
return count
There's no "are you sure?" prompt. No grace period. No recycle bin. When the retention deadline passes, the photos are deleted from disk and the database paths are set to None. The session directory itself is removed. It's not a flag flip — it's actual file deletion.
Retention windows
7 days
Free tier
30 days
Pro tier
And here's the thing that took me a while to get right: we send email warnings before deletion. Two days before, and again one day before. Because maybe you need that session data for a dispute, or a compliance review, or a support ticket. We give you time. But when the clock runs out, it runs out.
Developers can also delete sessions immediately through the API or the dashboard. Don't need it? Kill it now. Don't wait for the timer.
Hashing Everything That Moves
Early in development, we stored API keys with SHA-256 hashing — standard practice. But during our pre-launch security audit, I realised we weren't applying that same rigour everywhere. Verification link tokens? Plaintext. Session tokens? Plaintext. If someone ever got read access to the database — SQL injection, leaked backup, compromised server — they'd have every active token.
The window of exposure would be small (tokens expire), but it's still a window I don't want open. So we applied the same pattern across the board before going live: hash before storage, compare hashes on lookup.
# When creating a verification link:
raw_token = secrets.token_hex(32) # 64 hex chars
token_hash = sha256(raw_token) # Stored in DB
token_prefix = raw_token[:8] # For human identification
# Return raw_token to the developer ONCE.
# We never see it again.
# When someone uses the link:
incoming_hash = sha256(request.token)
link = db.query(token_hash == incoming_hash) # Compare hashes The developer gets the raw token exactly once, when the link is created. After that, we only store the hash. If the database leaks, an attacker gets a pile of SHA-256 hashes — which are useless. They can't reverse them into working tokens.
secrets.token_hex(32) — 256 bits of pure entropy. There's no dictionary to attack. No rainbow table helps you. Bcrypt's slow hashing is designed to protect weak passwords. For high-entropy secrets, SHA-256 is perfectly safe and doesn't add 100ms of latency to every API call.
We now hash three things: API keys (fv_live_ prefix), verification link tokens, and session auth tokens. If our database ever shows up on a paste site — which, god forbid — none of those secrets are usable.
Photos That Can't Be Stolen
Let me walk you through how a face photo travels through FaceVault, because I think the journey tells the story better than any privacy policy could.
Capture: camera to server, nothing in between
When the user takes a selfie in the webapp, the camera frame goes: canvas.drawImage() → canvas.toBlob() → upload(). The photo is captured, compressed to JPEG, and uploaded in one motion. It's never stored on the device. It's never cached in the browser. It's never displayed back to the user. Capture and upload. That's it.
Validation: JPEG or nothing
Before we write a single byte to disk, we check the magic bytes. If the first three bytes aren't FF D8 FF — the JPEG file signature — the upload is rejected. No SVGs with embedded scripts. No HTML files renamed to .jpg. No polyglot files. We also enforce a hard 5 MB limit before the file touches disk. This isn't just about storage — it's about preventing upload-based attacks.
Serving: never the original
When a developer views a session photo in the dashboard, they don't get the original JPEG. We convert it to WebP on the fly. This strips all EXIF metadata — GPS coordinates, camera model, timestamps, anything embedded in the original. The response headers tell the browser: Cache-Control: private, no-store. No proxy caches it. No CDN caches it. Close the tab, and the photo exists only on our server, counting down to deletion.
Access: authentication required, always
Photo URLs aren't public. They don't have guessable paths. Every photo request requires a valid JWT, and the API verifies that the requesting developer actually owns the session. You can't access another developer's photos, even if you somehow knew the session ID.
return Response(
content=webp_bytes, # Converted from JPEG, EXIF stripped
media_type="image/webp",
headers={
"Cache-Control": "private, no-store",
"Content-Disposition": "inline",
},
) The original JPEG never leaves the server. What you see in the dashboard is a lossy, metadata-stripped, cache-disabled WebP preview that exists only in your browser's memory for as long as the tab is open.
Locking the Front Door
When you're deep in building an auth system, it's easy to focus on the happy path: does login work? Do tokens validate? Does TOTP verify? But the security audit forced me to think like an attacker. And the first question an attacker asks is: what happens if I try a million passwords?
That's the thing about building a product — you know the right things to do, you plan to do them, but the "plumbing" gets deprioritised while you're focused on making the core product work. Our pre-launch audit was specifically about catching those gaps before they ever see production traffic.
Here's what we built into the auth layer:
Rate Limiting (slowapi)
Login: 10 attempts per 15 minutes. Registration: 5 per hour. Password reset: 5 per hour. Email verification: 10 per hour. These are per-IP limits enforced at the application level.
Account Lockout
5 wrong passwords in a row? Account locked for 15 minutes. We track attempts per email, not per IP — so VPN hopping doesn't help. The user gets a 423 response. Clean, clear, not negotiable.
TOTP + Backup Codes
Developers can enable TOTP 2FA. When they do, they get 10 one-time backup codes (8-character hex, SHA-256 hashed in the database). If they lose their authenticator, they can still get in. Each backup code works exactly once.
SSRF Protection
Webhook callback URLs are validated at creation time. Must be HTTPS. Hostnames are resolved and checked against private IP ranges: 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254. No hitting the cloud metadata endpoint through our API.
# Track failed attempts per developer email
if not verify_password(body.password, developer.password_hash):
developer.failed_login_attempts += 1
if developer.failed_login_attempts >= 5:
developer.locked_until = now + timedelta(minutes=15)
await db.commit()
raise HTTPException(401, "Invalid email or password")
# Successful login: reset everything
developer.failed_login_attempts = 0
developer.locked_until = None Notice the error message: "Invalid email or password." Not "Invalid password" or "User not found." Generic responses prevent email enumeration. An attacker can't tell whether they have the right email and wrong password, or the wrong email entirely.
GDPR: Not Just a Checkbox
I'll admit something. When I first read the GDPR, my eyes glazed over somewhere around Article 17. It's dense. It's legal. It's written by people who use the phrase "data controller" like it's a job title everyone understands.
But the more I built FaceVault, the more I realised the GDPR isn't just bureaucratic overhead. It's a pretty good blueprint for how you should handle personal data if you actually give a damn about the people behind the data. Here's how we align:
Article 5(1)(c) — Data Minimisation
"Personal data shall be adequate, relevant and limited to what is necessary." We don't store IP addresses. We don't fingerprint browsers. We don't track location. We store: the photos needed for verification, the extracted document data, and the match result. That's it.
Article 5(1)(e) — Storage Limitation
"Kept in a form which permits identification for no longer than is necessary." Our auto-purge system is a direct implementation of this principle. Photos have a defined retention window. When it expires, they're deleted. Not archived. Deleted.
Article 17 — Right to Erasure
"The data subject shall have the right to obtain erasure of personal data without undue delay." Developers can delete any session instantly via the API or dashboard. The photo files are removed from disk immediately, not queued for later deletion. Right to erasure isn't a 30-day process — it's a single API call.
Article 25 — Data Protection by Design
"The controller shall implement appropriate technical measures designed to implement data-protection principles." That's what this entire blog post is about. Privacy isn't a feature we bolted on. It's a design constraint we build around. Every decision — from token hashing to photo expiry to cookie scoping — starts with the question: how do we minimise the blast radius if something goes wrong?
Being Honest About What We Store
I think a lot of privacy pages are kind of misleading. They tell you what they don't do, but they never tell you what they do. So let me be painfully explicit.
What we store (always)
• Session ID — random UUID, not tied to any personal identifier
• External user ID — whatever your app passes in (could be a Telegram ID, email, or internal reference)
• Extracted document data — name, DOB, nationality from MRZ/OCR (JSON)
• User-confirmed data — what the user typed: name, DOB, nationality (JSON)
• Face match score — a float between 0 and 1
• Pass/fail result — boolean
• Timestamps — created_at, completed_at
What we store temporarily (then delete)
• ID photo — JPEG file on disk, 7–30 days
• Selfie photo — JPEG file on disk (captured during liveness step), 7–30 days
What we never store
× IP addresses
× Browser fingerprints or user agents
× GPS or location data
× Raw camera frames (only the final captures)
× Liveness video sequence (only the final frontal frame)
× EXIF metadata from photos (stripped during WebP conversion)
× Face embeddings or biometric templates (computed in memory, never persisted)
That last one is important. The 512-dimensional ArcFace embedding — the mathematical representation of someone's face — is computed during the /complete call, used for comparison, and then discarded. It never touches the database. It never touches disk. It lives in memory for the duration of the request, and then it's gone.
2am Thoughts
It's weird, writing this at 2am. The house is quiet. My terminal is still open with the last deployment log scrolling. Every security check is passing. And I keep thinking about the people whose faces will pass through this system.
They'll never read this blog post. They'll never know about the token hashing or the httpOnly cookies or the nightly purge job. They'll open an app, scan their passport, take a selfie, and move on with their day. And that's fine. That's how it should work.
But I think there's something important about building systems that respect people even when they're not watching. About deleting data you could legally keep. About hashing tokens you could've stored in plaintext. About choosing the harder path because the easier one felt wrong.
Privacy isn't a feature. It's not a compliance requirement. It's a decision you make at 2am when nobody's looking and the easier option is right there.
We chose the harder path. And I'm going to bed now. Good night.
References & Further Reading
GDPR Article 5 — Principles relating to processing of personal data
GDPR Article 17 — Right to erasure ('right to be forgotten')
GDPR Article 25 — Data protection by design and by default
OWASP: HttpOnly Cookie Flag — Why httpOnly cookies matter
OWASP: JWT Security Cheat Sheet — Token storage best practices
How FaceVault Verifies a Face in Under 30 Seconds — the AI pipeline behind the verification
FaceVault API Documentation — integrate in 10 minutes