All posts
Security 22 February 2026 · 10 min read

Your Face Is Encrypted Before It Hits Disk

A few weeks ago I wrote about why we delete your face. Auto-purge, retention windows, the whole "verify then forget" philosophy. I still believe in all of that. But it left a gap that's been nagging at me — what happens to your photos during those 7–30 days while they're sitting on our server, waiting to be deleted?

The Gap Between "We'll Delete It" and "It's Safe Now"

Here's the scenario that kept bugging me. Someone uploads their passport and selfie. Our AI pipeline runs. Verification completes. The photos now sit on disk for up to 30 days, depending on the developer's tier. During that window, the files are there. On a Linux filesystem. As JPEG bytes. Readable by anything with the right path.

We had mitigations: photos are never served via static file handlers, every access requires JWT authentication, the dashboard checks developer ownership. But the files themselves? Plaintext. If someone got shell access to the server — a compromised dependency, an SSH key leak, a zero-day in the container runtime — they could cat every face photo on disk.

"We delete your data after 7 days" is a great promise. But it's not a security control. The question isn't just how long you keep the data — it's what happens to it while you have it.

The uncomfortable truth: Every KYC provider that says "your data is secure" but stores photos as plaintext JPEG files on disk is one server compromise away from a biometric data breach. Retention policies don't protect against exfiltration. Encryption does.

AES-256-GCM: Every File, Every Time

So we added encryption at rest. Not the "tick a box on your cloud provider's dashboard" kind. Application-level encryption, with a key we control, wrapping every photo before it touches the filesystem.

Here's what happens now when you upload a photo:

01

Upload & Validate

Photo arrives over TLS. Magic bytes checked (FF D8 FF). EXIF stripped. Resized. Written temporarily as plaintext for ML processing.

02

ML Pipeline Runs

Background threads compute face embeddings, run anti-spoofing, extract MRZ data. All while the photo is briefly in plaintext.

03

Encrypt & Replace

Once ML is done, the photo is encrypted with AES-256-GCM, written as .jpg.enc, and the plaintext original is deleted. The window of plaintext exposure is seconds, not days.

04

Ciphertext Until Purge

For the rest of the retention window, the file on disk is nonce (12 bytes) || ciphertext + authentication tag. Not a JPEG. Not openable. Not parseable. Just noise.

crypto.py — the encrypt function
def encrypt_file(plaintext: bytes) -> bytes:
    """Encrypt with AES-256-GCM. Returns nonce || ciphertext+tag."""
    key = _get_key()
    if key is None:
        return plaintext  # No key configured = passthrough

    nonce = os.urandom(12)           # 96-bit random nonce
    aesgcm = AESGCM(key)
    ciphertext = aesgcm.encrypt(nonce, plaintext, None)
    return nonce + ciphertext        # 12 + len(plaintext) + 16 bytes

AES-256-GCM isn't exotic. It's what TLS 1.3 uses. It's what your browser is using right now to read this page. The "GCM" part is important — it's authenticated encryption. If someone tampers with the ciphertext (flips a bit, truncates the file), decryption fails. You get either the original data or nothing. No silent corruption.

Every photo gets a unique nonce. Each encryption operation generates a fresh 12-byte random nonce via os.urandom(). Even if two users upload identical photos, the ciphertext is completely different. There's no pattern to exploit, no frequency analysis to attempt.

Not Just Photos

Encrypting the photos was the obvious move. But the moment I looked at what else was on disk, I wasn't happy. Our ML pipeline caches intermediate results as JSON files alongside each photo:

straight_embedding.json — 512-dimensional ArcFace face embedding
anti_spoofing_cache.json — 12-signal deepfake detection results
rppg_result.json — remote photoplethysmography liveness signal
meta.json — rPPG frame metadata (fps, frame count)

That face embedding? It's a mathematical fingerprint of someone's face. You can search for it. You can match it against other faces. If someone exfiltrated those JSON files, they'd have a biometric database that survives even after we delete the photos.

So we encrypt all of them. Same key, same AES-256-GCM, same nonce-per-file scheme. The JSON is serialized, encrypted as raw bytes, and written to disk. Reading it back requires the key.

crypto.py — JSON cache helpers
def write_json_cache(path: str, data) -> None:
    """Write JSON data to file, encrypting if key is configured."""
    raw = json.dumps(data).encode("utf-8")
    encrypted = encrypt_file(raw)
    with open(path, "wb") as f:
        f.write(encrypted)

def read_json_cache(path: str):
    """Read JSON data from file, decrypting if needed."""
    with open(path, "rb") as f:
        raw = f.read()
    decrypted = decrypt_file(raw)
    return json.loads(decrypted)

We also encrypt PII fields in the database itself — confirmed names, dates of birth, nationalities. Same AES-256-GCM, base64-encoded. If someone gets a database dump, they see ciphertext, not personal data.

What's encrypted at rest

On disk

ID photos (.jpg.enc)

Selfie photos (.jpg.enc)

Face embeddings (JSON)

Anti-spoofing caches (JSON)

rPPG results (JSON)

In database

confirmed_data (name, DOB)

mrz_data (passport fields)

face_embedding (512-d vector)

Belt and Suspenders: Blocking the File Path

Encryption is the main defense. But we also asked: what if someone tries to access the files via the web server? Our Caddy reverse proxy serves a /media/ path for some public assets. The KYC photos live under /media/kyc/.

Even though those files are now encrypted and useless without the key, we don't want them served at all. So we added a hard block:

Caddyfile
# Block all KYC data from the web — hard 403
handle_path /media/kyc/* {
    respond 403
}

# General media (public assets) — served normally
handle_path /media/* {
    root * /media
    file_server
}

Order matters. The /media/kyc/* block comes first. Anyone hitting that path — whether they know the session ID or not, whether they're authenticated or not — gets a 403. Photos are only served through the API, which requires authentication and checks developer ownership.

Locked by Default

There's a UX decision here that I think says something about how we think about this. When a developer opens a session in the dashboard, photos are locked by default. You see a placeholder with a padlock icon, not the face.

To see the actual photos, you click "Decrypt Photos." The dashboard makes an authenticated API call, the server decrypts in memory, converts to WebP (stripping any residual metadata), and streams it back with Cache-Control: private, no-store. Close the modal, and the decrypted image only exists in your browser's memory until garbage collection clears it.

This isn't about making life harder for developers. It's about making the default state safe. If someone glances at your screen, if you share your screen during a call, if you leave the tab open — no face photos are visible unless you explicitly chose to view them.

The default matters. Every other KYC dashboard I've seen shows face photos immediately when you open a session. We think that's backwards. The photos are sensitive biometric data. The default should be locked. Access should be intentional.

Let's Be Honest About What This Isn't

I could call this "encrypted end-to-end" and most people wouldn't question it. But that would be dishonest, so let me be clear about the threat model.

This is encryption at rest, not end-to-end encryption. The server holds the key. It has to — because the server needs to decrypt photos in memory to run face matching, anti-spoofing, and OCR. True E2EE would mean only the user can decrypt, which makes server-side ML impossible.

What this protects against

Stolen disk / backup tapes — ciphertext without the key

Filesystem access without app context — files are unreadable binary

Database dump exfiltration — PII fields are encrypted

Casual browsing of the media directory — 403 block + encryption

Leaked JSON cache files — embeddings are encrypted, not searchable

What this doesn't protect against

× Full server compromise with root access — attacker can read the DEK from process memory

× Compromised application process — the running API has the DEK loaded

× Insider with production access — anyone who can read the env can read the key Fixed — see update below

I'm telling you this because I think honesty about limitations builds more trust than vague claims about being "fully encrypted." Every layer of defense has a boundary. We're clear about where ours is. Update: we've since eliminated one of these limitations entirely. Keep reading.

Update: The Key No Longer Lives Here

New

When I wrote the section above, I listed three things this doesn't protect against. The third one — "anyone who can read the env can read the key" — bothered me the most. The encryption key was a base64 string sitting in a .env file. If you had shell access to the server, you could cat .env and decrypt everything. That's not a theoretical concern. It's the most likely attack path.

So we moved key management into HashiCorp Vault Transit. Here's the new architecture:

01

Master Key Lives in Vault

The AES-256-GCM master key is created inside Vault's Transit engine and never exported. It doesn't exist in any env var, config file, or application memory. Vault performs all master-key operations internally.

02

Data Encryption Key (DEK)

Vault generates a DEK via its datakey endpoint. We get two things: the plaintext DEK (cached in API memory for fast local AES-GCM) and an encrypted copy (stored on disk). The encrypted DEK is useless without Vault.

03

On Restart, Ask Vault

When the API restarts, it reads the encrypted DEK from disk and sends it to Vault Transit for decryption. Vault verifies the request, decrypts with the master key, and returns the plaintext DEK. No env var needed.

The key hierarchy
Vault Transit (master key — never leaves Vault)
  └─ encrypts → DEK (stored as vault:v1:... ciphertext on disk)
  └─ encrypts → BYOK client keys (stored in DB as vault:v1:...)

DEK (plaintext, cached in API memory only)
  └─ encrypts → photos, JSON caches, DB PII fields
  └─ local AES-256-GCM (fast, no network call per operation)

The key insight is separation of concerns. The API handles encryption/decryption of data (fast, local, no network overhead per photo). Vault handles encryption/decryption of keys (infrequent, only on startup or BYOK operations). The master key that protects everything never enters the API's address space.

What this eliminates: reading the .env file no longer gives you the encryption key. There is no encryption key in the env. The API authenticates to Vault with a scoped token that can only encrypt, decrypt, and generate data keys — it can't export the master key, read policies, or perform any admin operations.

To be clear about what hasn't changed: the DEK still lives in the API's process memory at runtime. A full memory dump of the running process could recover it. But the attack surface shrunk significantly — from "read a text file" to "dump a running process's memory," which requires a much higher level of access and sophistication.

Vault also gives us an audit trail (every encrypt/decrypt operation is logged), key versioning (rotate the master key without re-encrypting everything), and the FACEVAULT_ENCRYPTION_KEY env var is gone from production entirely.

The Full Stack

Encryption at rest is one layer. Here's the full picture of how a face photo is protected from capture to deletion:

In transit

TLS 1.3 via Caddy with X25519 key exchange. HTTP/2 and HTTP/3 (QUIC). HSTS with preload. The photo never travels unencrypted between the user's device and our server.

At rest (disk)

AES-256-GCM with unique 12-byte nonce per file. Photos stored as .jpg.enc. All JSON caches (embeddings, anti-spoofing, rPPG) encrypted with the same scheme. Plaintext exists only in memory during ML processing.

Key management (Vault Transit)

Master key lives inside HashiCorp Vault — never exported, never in env vars. DEK generated via Vault datakey endpoint, cached in API memory. BYOK client keys wrapped/unwrapped via Vault Transit. Scoped API token: encrypt/decrypt only.

At rest (database)

PII fields (confirmed data, MRZ data, face embeddings) encrypted with AES-256-GCM and base64-encoded. Database dumps contain ciphertext, not personal data.

Access control

Photos served only via authenticated API (JWT required, developer ownership verified). Caddy returns 403 for direct /media/kyc/ access. Dashboard shows locked placeholders by default.

Retention & purge

Auto-purge after 7–30 days (tier-dependent). Photos deleted from disk, paths NULLed in database, face embeddings wiped, PII fields cleared. shutil.rmtree() on the session directory. Irreversible.

No single layer is bulletproof. That's the point. Defense in depth means an attacker has to breach multiple independent controls to get to the data. Get past TLS? Files are encrypted. Get the files? No key. Get the key? Photos are auto-purged. Miss the purge window? Caddy blocks the path. Bypass Caddy? API checks ownership. Each layer assumes the one above it might fail.

The Point

A few weeks ago, our promise was: we verify your face and then we delete it. That's still true. But now the promise is bigger: while we have it, nobody else can read it.

3,305 photos encrypted. 383 JSON cache files encrypted. Zero plaintext files remaining on disk. Every new verification session is encrypted from the moment ML processing finishes — typically a few seconds after upload.

Is this perfect? No. I told you the limitations. A sophisticated enough attacker with root access and memory-dumping tools could still extract the DEK from the running process. But the bar is dramatically higher than it was yesterday — the master key never leaves Vault, there's no env var to steal, and the attack surface is "dump a running process" rather than "read a config file." That's a meaningful difference.

We delete your face. And while we have it, it's encrypted. That feels right.

References & Further Reading

AES-GCM (Galois/Counter Mode) — authenticated encryption used in TLS 1.3

HashiCorp Vault Transit Secrets Engine — encryption-as-a-service: master key never leaves Vault

GDPR Article 32 — Security of Processing — "encryption of personal data" as an appropriate technical measure

Building Privacy-First KYC: Why We Delete Your Face — auto-purge, token hashing, the verify-then-forget philosophy

Deepfake Defense: An IDS/IPS for Identity Verification — the 12-signal anti-spoofing pipeline

FaceVault API Documentation — integrate in 10 minutes