Privacy 1 March 2026 · 9 min read

Why We Don't Use
Cloud AI APIs

AWS Rekognition. Google Cloud Vision. Azure Face API. Every cloud provider wants to run your face recognition for you. We run ours on our own servers. Not because we can't afford cloud. Because your face shouldn't leave our infrastructure.

What Cloud AI APIs Actually Do With Your Data

When a KYC provider uses AWS Rekognition to compare a selfie against an ID photo, here's what actually happens: the selfie leaves the provider's server, traverses the internet (TLS-encrypted, yes), arrives at an AWS region, gets decoded in Amazon's infrastructure, processed by Amazon's models on Amazon's GPUs, and a response is sent back.

During that round trip, your face — the biometric data that uniquely identifies you — exists in Amazon's memory. Briefly. But it exists there.

Now read the fine print:

AWS

Rekognition

"We may store and use content processed by Amazon Rekognition to develop and improve the service unless you opt out." Opt-out exists, but it's not the default. And you, the developer using the KYC provider, have no visibility into whether they opted out.

GCP

Cloud Vision

"Google does not use customer data to train its AI models for Cloud AI services." Good. But your images still transit Google's network, are processed in Google's data centers, and are subject to Google's data processing agreements. Subpoenas happen.

Azure

Face API

Microsoft retired the general-purpose Face API in 2023 and now gates access behind an application process. The direction is right, but the underlying architecture is the same: your data leaves your infrastructure.

The question isn't whether these companies are trustworthy. It's whether your users' biometric data needs to travel through any third party's infrastructure at all. We decided the answer is no.

What We Run Instead

Every model in our pipeline runs on our own servers via ONNX Runtime on CPU. No cloud API calls. No third-party inference. No data leaving our infrastructure.

Task	Cloud Alternative	What We Use
Face detection	Rekognition, Vision	MediaPipe (client) + ONNX (server)
Face matching	Rekognition CompareFaces	ArcFace w600k_r50 (ONNX, INT8)
Depth estimation	N/A (most don't)	Depth Anything ViT-S (ONNX, INT8)
OCR / text extraction	Textract, Vision OCR	OnnxTR (db_resnet50 + PARSeq, INT8)
MRZ reading	Third-party APIs	PassportEye + Tesseract fallback
Anti-spoofing	Varies (often proprietary)	12-signal fusion (rPPG, depth, FFT, ELA, GAN texture...)

All four neural networks are INT8 quantized — 72% smaller, roughly 2x faster, with zero accuracy loss. They run on standard x86 CPUs. No GPU required. No specialized hardware. The entire inference pipeline fits comfortably on a modest VPS.

Open-source all the way down. ArcFace (InsightFace), Depth Anything (TDesign Lab), OnnxTR (Mindee), MediaPipe (Google), Tesseract (Google/community), PassportEye (Konstantin Tretyakov). Every model we use is open-source, auditable, and runs locally. No black boxes.

The Privacy Argument

This is the one that matters most.

When you use FaceVault, your selfie goes from your camera to our server. That's one hop. It gets processed, encrypted at rest (AES-256-GCM), and eventually purged according to your retention policy. At no point does it transit a third party's network, exist in a third party's memory, or touch a third party's disk.

Compare that to a provider using AWS Rekognition:

Data flow comparison

FaceVault:
  User → Our server → Local ONNX inference → Result
  (1 hop, 0 third parties)

Cloud AI provider:
  User → Provider server → AWS Rekognition → Provider server → Result
  (3 hops, 1 third party, 2 network transits)

Under GDPR, every entity that processes personal data is either a controller or a processor, and each adds compliance obligations, data processing agreements, and liability surface. A cloud AI API is an additional processor. We have zero additional processors in our ML pipeline.

BYOK goes further. Pro-tier customers can provide their own encryption key. Photos and PII are encrypted with the customer's key, which we wrap via our key management service. If a customer deletes their key, all their data becomes cryptographically irrecoverable — even to us. Try doing that when your face is sitting in an AWS region.

Privacy isn't a feature you bolt on. It's an architecture you build from the ground up. Running our own models is the foundation of that architecture.

The Cost Argument

Cloud AI APIs charge per call. Every face comparison, every OCR extraction, every detection — a line item on your bill.

Service	Cost per 1,000 calls
AWS Rekognition CompareFaces	$1.00
AWS Rekognition DetectFaces	$1.00
AWS Textract (document)	$1.50
Google Cloud Vision (OCR)	$1.50
Typical KYC check (3–4 API calls)	$0.004–0.006

That looks cheap at $0.005 per verification. But it's a marginal cost that scales linearly with volume. At 100K verifications per month, that's $500/month just for cloud AI — on top of your compute, storage, bandwidth, and everything else.

Our approach has zero marginal cost for ML inference. The models run on the same server that handles the API requests. The 100,001st verification costs exactly the same as the first: nothing extra. Our per-verification cost decreases as volume increases. Cloud AI costs are constant.

This is how we offer verification at a fraction of what Sumsub and Onfido charge. We're not subsidizing with VC money. We just don't have a cloud AI bill.

The Latency Argument

A cloud AI API call involves:

1. Serialize the image to a request payload

2. TLS handshake (if new connection)

3. Upload the image over the network

4. Queue in the cloud provider's inference pipeline

5. Inference on their hardware

6. Response back over the network

Steps 2–4 and 6 are pure overhead. On a local inference call, none of them exist. The image is already in memory. The model is already loaded. The result is returned from a function call, not a network request.

In practice, a cloud API call adds 100–400ms of latency per model invocation. When your verification pipeline runs multiple models sequentially (face detection, face matching, OCR, anti-spoofing), those milliseconds compound. Local inference eliminates the network overhead entirely.

Real-world impact: Our full verification pipeline — face detection, face matching, depth estimation, OCR, MRZ extraction, anti-spoofing, document fraud analysis — completes in seconds, not minutes. Every model call is a local function invocation with zero network latency. Users notice the difference.

The Control Argument

When you depend on a cloud AI API, you depend on:

Their uptime — AWS Rekognition goes down, your verification goes down. You have no fallback. No degraded mode. Just a 503 and an angry user.

Their pricing — cloud providers change pricing. They deprecate APIs. They add "enterprise-only" gates. You have zero leverage.

Their model updates — when the cloud provider updates their model, your thresholds might shift. That face match score that was 0.95 yesterday might be 0.88 today. You find out in production.

Their data jurisdiction — your users are in the EU but the closest Rekognition region is us-east-1? Congratulations, you just exported biometric data across the Atlantic.

When we run our own models, we control everything. We choose the model. We choose the version. We set the thresholds. We decide when to update. We know exactly what code is running on every request, because we wrote it.

Model updates happen on our schedule, after our test suite passes, after we've validated against our calibration dataset. Not when a cloud provider pushes a silent update on a Tuesday afternoon.

The Trade-offs (Honest)

This approach isn't free of trade-offs. Here's what we gave up:

No GPU acceleration (yet)

Cloud AI runs on beefy GPU clusters. We run on CPU. INT8 quantization and ONNX Runtime session tuning close most of the gap, but a V100 would still be faster for raw inference. For our volume, CPU is more than sufficient. When it isn't, we'll add a GPU — still on our hardware.

We maintain the models

When a cloud provider updates their face recognition model, you get the improvement for free. When we want better accuracy, we have to find, evaluate, benchmark, and deploy a new model ourselves. This is more work. It's also how you build genuine expertise.

Scaling requires infrastructure, not just a credit card

Cloud AI scales by throwing money at AWS. We scale by optimizing code, quantizing models, tuning concurrency, and adding hardware when needed. It's slower to scale but cheaper to operate — and we keep full control throughout.

These trade-offs are real. We've accepted them knowingly, because the benefits — privacy, cost, latency, control — outweigh them for our use case. A startup doing 100 verifications a day might reasonably choose a cloud API for speed of development. But if you're building a KYC platform, the infrastructure is the product. You can't outsource it.

It's a Philosophy, Not a Limitation

The cloud AI model is seductive. Write ten lines of code, get state-of-the-art face recognition. Ship it. Move on. Someone else worries about the models, the hardware, the updates.

But "someone else" is processing your users' faces. "Someone else" knows how many verifications you run. "Someone else" has a data processing agreement that they wrote, protecting their interests. "Someone else" can change their pricing, their terms, or their model at any time.

We chose a different path. Open-source models. Local inference. Encrypted at rest. Purged on schedule. No third-party processors in the ML pipeline. It's more work to build. It's more work to maintain. But it means we can look our users in the eye and say:

"Your face never leaves our server. No cloud API saw it. No third party processed it. We ran the models ourselves, on hardware we control, and the result stays between you and us."

That's not a limitation. That's a promise.

Read the Docs INT8 Quantization Post

Why We Don't Use Cloud AI APIs