All posts
Security 14 February 2026 · 14 min read

Building Privacy-First KYC: Why We Delete Your Face

It's 2am. I just pushed the last commit of a pre-launch security audit that turned into a 14-hour rabbit hole. My terminal is full of green checkmarks and my coffee is ice cold. Before I crash, I need to talk about something that's been on my mind all night — why most KYC providers treat your biometric data like it's theirs to keep, and why we chose to do the opposite.

The Uncomfortable Truth About KYC Data

Here's something that should bother you more than it probably does: when you scan your passport for a crypto exchange, a neobank, or a fintech app, there's a very good chance your face, your document, and your personal details are sitting on a server somewhere. Indefinitely. Maybe encrypted. Maybe not. You'll never know.

The KYC industry has a dirty secret. Most providers retain biometric data far longer than they need to — sometimes forever. They call it "audit requirements" or "regulatory compliance." And sure, some of that is real. But a lot of it is just inertia. Nobody sat down and asked: do we actually need to keep this person's face photo six months after we verified them?

I've been building FaceVault for months now, and tonight, after hours of running our final security audit and tightening every bolt I could find, I keep coming back to the same thought: the most secure data is data you don't have.

An uncomfortable stat: In 2024 alone, over 1.1 billion records were exposed in data breaches worldwide. Every one of those records was data that some company decided it needed to keep. How much of it did they actually need?

Verify, Then Forget

Here's the core idea behind FaceVault, and honestly it's embarrassingly simple: confirm the person is who they say they are, then get rid of the evidence.

Think about it like a bouncer at a club. They check your ID, they look at your face, they nod and let you in. They don't photocopy your driving licence and file it in a cabinet out back. They don't need to. The verification happened. The answer was yes or no. That's all that matters.

FaceVault works the same way. A user uploads their ID, takes a selfie with built-in liveness detection. Our AI pipeline compares the faces, extracts the document data, returns a result. Then — and this is the part that makes us different — we start counting down to deletion.

01

Upload & Verify

User submits ID + selfie. AI pipeline runs face matching, MRZ extraction, liveness check. Result: pass or fail.

02

Webhook Fires

Your backend gets the result: match score, extracted data, pass/fail. You have everything you need.

03

Clock Starts Ticking

Photos get a retention deadline. Free tier: 7 days. Pro tier: 30 days. After that, they're gone.

04

Auto-Purge

A daily job deletes expired photos from disk, clears file paths from the database, marks the session as purged. Irreversible.

The verification result — the pass/fail, the match score, the extracted name and date of birth — that stays in the database as an audit record. But the photos? The actual biometric data? Gone. Truly gone. Not moved to cold storage. Not "soft deleted." os.remove(), shutil.rmtree(), overwritten in the database with None.

Auto-Purge: Your Photos Have an Expiry Date

I'm going to show you the actual code, because I think it matters. When people tell you they "delete your data," you should be able to verify that claim. Here's what runs every night at 3am UTC on our server:

retention.py — the nightly purge
async def purge_expired_sessions(db: AsyncSession) -> int:
    now = datetime.now(timezone.utc)

    # Find every session past its retention deadline
    result = await db.execute(
        select(VerificationSession).where(
            and_(
                VerificationSession.retained_until <= now,
                VerificationSession.status != "purged",
            )
        )
    )

    for session in sessions:
        # Delete every photo file from disk
        for path_attr in ("id_photo_path", "selfie_photo_path",
                          "liveness_photo_path"):
            photo_path = getattr(session, path_attr)
            if photo_path and os.path.exists(full_path):
                os.remove(full_path)
            setattr(session, path_attr, None)  # Clear DB reference

        # Nuke the entire session directory
        shutil.rmtree(session_dir, ignore_errors=True)
        session.status = "purged"

    await db.commit()
    return count

There's no "are you sure?" prompt. No grace period. No recycle bin. When the retention deadline passes, the photos are deleted from disk and the database paths are set to None. The session directory itself is removed. It's not a flag flip — it's actual file deletion.

Retention windows

7 days

Free tier

30 days

Pro tier

And here's the thing that took me a while to get right: we send email warnings before deletion. Two days before, and again one day before. Because maybe you need that session data for a dispute, or a compliance review, or a support ticket. We give you time. But when the clock runs out, it runs out.

Developers can also delete sessions immediately through the API or the dashboard. Don't need it? Kill it now. Don't wait for the timer.

Hashing Everything That Moves

Early in development, we stored API keys with SHA-256 hashing — standard practice. But during our pre-launch security audit, I realised we weren't applying that same rigour everywhere. Verification link tokens? Plaintext. Session tokens? Plaintext. If someone ever got read access to the database — SQL injection, leaked backup, compromised server — they'd have every active token.

The window of exposure would be small (tokens expire), but it's still a window I don't want open. So we applied the same pattern across the board before going live: hash before storage, compare hashes on lookup.

The pattern
# When creating a verification link:
raw_token = secrets.token_hex(32)        # 64 hex chars
token_hash = sha256(raw_token)           # Stored in DB
token_prefix = raw_token[:8]             # For human identification

# Return raw_token to the developer ONCE.
# We never see it again.

# When someone uses the link:
incoming_hash = sha256(request.token)
link = db.query(token_hash == incoming_hash)  # Compare hashes

The developer gets the raw token exactly once, when the link is created. After that, we only store the hash. If the database leaks, an attacker gets a pile of SHA-256 hashes — which are useless. They can't reverse them into working tokens.

Why SHA-256 and not bcrypt? For API keys and session tokens, the input is a secrets.token_hex(32) — 256 bits of pure entropy. There's no dictionary to attack. No rainbow table helps you. Bcrypt's slow hashing is designed to protect weak passwords. For high-entropy secrets, SHA-256 is perfectly safe and doesn't add 100ms of latency to every API call.

We now hash three things: API keys (fv_live_ prefix), verification link tokens, and session auth tokens. If our database ever shows up on a paste site — which, god forbid — none of those secrets are usable.

The localStorage Moment That Kept Me Up

When I first built the developer dashboard, I did what every React JWT tutorial tells you: stored tokens in localStorage. Access token, refresh token, right there. It works. It's easy. Half the internet does this.

But during our security review, I sat down and really thought about it. And the more I thought, the less I liked it. Because localStorage is one XSS vulnerability away from total account compromise.

Here's the problem: any JavaScript running on your page can read localStorage. If an attacker finds an XSS hole — an unsanitised input, a malicious third-party script, anything — they can grab both tokens with two lines of code:

The attack (this simple)
// Any XSS payload can do this:
const access = localStorage.getItem('fv_access_token');
const refresh = localStorage.getItem('fv_refresh_token');
// Send to attacker's server. Game over.

With the refresh token, the attacker can mint new access tokens for 7 days. They own the account until the refresh token expires or the developer changes their password.

So before launch, I ripped it out. All of it. Tokens now live in httpOnly cookies.

auth_routes.py — how tokens are stored now
def _set_auth_cookies(response, access_token, refresh_token):
    response.set_cookie(
        "fv_access",
        access_token,
        httponly=True,       # JavaScript cannot read this
        secure=True,         # HTTPS only
        samesite="lax",      # CSRF protection
        domain=".facevault.id",
        path="/",
        max_age=3600,        # 1 hour
    )
    response.set_cookie(
        "fv_refresh",
        refresh_token,
        httponly=True,
        secure=True,
        samesite="lax",
        domain=".facevault.id",
        path="/api/v1/auth",  # Only sent to auth endpoints
        max_age=604800,       # 7 days
    )

Three things make this dramatically more secure:

httpOnly

The browser includes the cookie in requests automatically, but JavaScript literally cannot access it. Not through document.cookie, not through any API. XSS can't steal what XSS can't see.

Scoped paths

The refresh token cookie is scoped to /api/v1/auth. It's only sent when hitting the auth endpoints — login, refresh, logout. It never travels with regular API calls. Even if an attacker MITM'd a regular API request, they wouldn't see the refresh token.

SameSite: lax

The cookie is only sent with same-site requests and top-level navigations. A malicious page on another domain can't trigger authenticated API calls on the user's behalf. This is your CSRF protection, built into the cookie itself.

Was this a fun migration at midnight? No. Did it require changing every fetch call to include credentials: 'include'? Yes. Was it worth it? Absolutely. Some things you don't cut corners on just because the deadline is close.

Photos That Can't Be Stolen

Let me walk you through how a face photo travels through FaceVault, because I think the journey tells the story better than any privacy policy could.

Capture: camera to server, nothing in between

When the user takes a selfie in the webapp, the camera frame goes: canvas.drawImage()canvas.toBlob()upload(). The photo is captured, compressed to JPEG, and uploaded in one motion. It's never stored on the device. It's never cached in the browser. It's never displayed back to the user. Capture and upload. That's it.

Validation: JPEG or nothing

Before we write a single byte to disk, we check the magic bytes. If the first three bytes aren't FF D8 FF — the JPEG file signature — the upload is rejected. No SVGs with embedded scripts. No HTML files renamed to .jpg. No polyglot files. We also enforce a hard 5 MB limit before the file touches disk. This isn't just about storage — it's about preventing upload-based attacks.

Serving: never the original

When a developer views a session photo in the dashboard, they don't get the original JPEG. We convert it to WebP on the fly. This strips all EXIF metadata — GPS coordinates, camera model, timestamps, anything embedded in the original. The response headers tell the browser: Cache-Control: private, no-store. No proxy caches it. No CDN caches it. Close the tab, and the photo exists only on our server, counting down to deletion.

Access: authentication required, always

Photo URLs aren't public. They don't have guessable paths. Every photo request requires a valid JWT, and the API verifies that the requesting developer actually owns the session. You can't access another developer's photos, even if you somehow knew the session ID.

Photo serving response
return Response(
    content=webp_bytes,           # Converted from JPEG, EXIF stripped
    media_type="image/webp",
    headers={
        "Cache-Control": "private, no-store",
        "Content-Disposition": "inline",
    },
)

The original JPEG never leaves the server. What you see in the dashboard is a lossy, metadata-stripped, cache-disabled WebP preview that exists only in your browser's memory for as long as the tab is open.

Locking the Front Door

When you're deep in building an auth system, it's easy to focus on the happy path: does login work? Do tokens validate? Does TOTP verify? But the security audit forced me to think like an attacker. And the first question an attacker asks is: what happens if I try a million passwords?

That's the thing about building a product — you know the right things to do, you plan to do them, but the "plumbing" gets deprioritised while you're focused on making the core product work. Our pre-launch audit was specifically about catching those gaps before they ever see production traffic.

Here's what we built into the auth layer:

01

Rate Limiting (slowapi)

Login: 10 attempts per 15 minutes. Registration: 5 per hour. Password reset: 5 per hour. Email verification: 10 per hour. These are per-IP limits enforced at the application level.

02

Account Lockout

5 wrong passwords in a row? Account locked for 15 minutes. We track attempts per email, not per IP — so VPN hopping doesn't help. The user gets a 423 response. Clean, clear, not negotiable.

03

TOTP + Backup Codes

Developers can enable TOTP 2FA. When they do, they get 10 one-time backup codes (8-character hex, SHA-256 hashed in the database). If they lose their authenticator, they can still get in. Each backup code works exactly once.

04

SSRF Protection

Webhook callback URLs are validated at creation time. Must be HTTPS. Hostnames are resolved and checked against private IP ranges: 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254. No hitting the cloud metadata endpoint through our API.

Account lockout logic
# Track failed attempts per developer email
if not verify_password(body.password, developer.password_hash):
    developer.failed_login_attempts += 1
    if developer.failed_login_attempts >= 5:
        developer.locked_until = now + timedelta(minutes=15)
    await db.commit()
    raise HTTPException(401, "Invalid email or password")

# Successful login: reset everything
developer.failed_login_attempts = 0
developer.locked_until = None

Notice the error message: "Invalid email or password." Not "Invalid password" or "User not found." Generic responses prevent email enumeration. An attacker can't tell whether they have the right email and wrong password, or the wrong email entirely.

GDPR: Not Just a Checkbox

I'll admit something. When I first read the GDPR, my eyes glazed over somewhere around Article 17. It's dense. It's legal. It's written by people who use the phrase "data controller" like it's a job title everyone understands.

But the more I built FaceVault, the more I realised the GDPR isn't just bureaucratic overhead. It's a pretty good blueprint for how you should handle personal data if you actually give a damn about the people behind the data. Here's how we align:

Article 5(1)(c) — Data Minimisation

"Personal data shall be adequate, relevant and limited to what is necessary." We don't store IP addresses. We don't fingerprint browsers. We don't track location. We store: the photos needed for verification, the extracted document data, and the match result. That's it.

Article 5(1)(e) — Storage Limitation

"Kept in a form which permits identification for no longer than is necessary." Our auto-purge system is a direct implementation of this principle. Photos have a defined retention window. When it expires, they're deleted. Not archived. Deleted.

Article 17 — Right to Erasure

"The data subject shall have the right to obtain erasure of personal data without undue delay." Developers can delete any session instantly via the API or dashboard. The photo files are removed from disk immediately, not queued for later deletion. Right to erasure isn't a 30-day process — it's a single API call.

Article 25 — Data Protection by Design

"The controller shall implement appropriate technical measures designed to implement data-protection principles." That's what this entire blog post is about. Privacy isn't a feature we bolted on. It's a design constraint we build around. Every decision — from token hashing to photo expiry to cookie scoping — starts with the question: how do we minimise the blast radius if something goes wrong?

A note on biometric data: Under GDPR Article 9, biometric data is a "special category" that requires explicit consent and legitimate purpose. FaceVault processes face photos only for the explicit purpose of identity verification, and only for the duration necessary to complete and audit that verification. We don't use biometric data for profiling, analytics, or model training.

Being Honest About What We Store

I think a lot of privacy pages are kind of misleading. They tell you what they don't do, but they never tell you what they do. So let me be painfully explicit.

What we store (always)

Session ID — random UUID, not tied to any personal identifier

External user ID — whatever your app passes in (could be a Telegram ID, email, or internal reference)

Extracted document data — name, DOB, nationality from MRZ/OCR (JSON)

User-confirmed data — what the user typed: name, DOB, nationality (JSON)

Face match score — a float between 0 and 1

Pass/fail result — boolean

Timestamps — created_at, completed_at

What we store temporarily (then delete)

ID photo — JPEG file on disk, 7–30 days

Selfie photo — JPEG file on disk (captured during liveness step), 7–30 days

What we never store

× IP addresses

× Browser fingerprints or user agents

× GPS or location data

× Raw camera frames (only the final captures)

× Liveness video sequence (only the final frontal frame)

× EXIF metadata from photos (stripped during WebP conversion)

× Face embeddings or biometric templates (computed in memory, never persisted)

That last one is important. The 512-dimensional ArcFace embedding — the mathematical representation of someone's face — is computed during the /complete call, used for comparison, and then discarded. It never touches the database. It never touches disk. It lives in memory for the duration of the request, and then it's gone.

2am Thoughts

It's weird, writing this at 2am. The house is quiet. My terminal is still open with the last deployment log scrolling. Every security check is passing. And I keep thinking about the people whose faces will pass through this system.

They'll never read this blog post. They'll never know about the token hashing or the httpOnly cookies or the nightly purge job. They'll open an app, scan their passport, take a selfie, and move on with their day. And that's fine. That's how it should work.

But I think there's something important about building systems that respect people even when they're not watching. About deleting data you could legally keep. About hashing tokens you could've stored in plaintext. About choosing the harder path because the easier one felt wrong.

Privacy isn't a feature. It's not a compliance requirement. It's a decision you make at 2am when nobody's looking and the easier option is right there.

We chose the harder path. And I'm going to bed now. Good night.

References & Further Reading