We Can See Your Heartbeat
Through Your Camera
Your face changes colour 72 times a minute. You can't see it. We can. This is the story of how we use remote photoplethysmography — detecting blood flow through a standard webcam — to prove that the person on the other side of the camera is alive.
The Invisible Pulse
Every time your heart beats, it pushes a wave of oxygenated blood through your capillaries. When that wave reaches the capillaries in your face, it changes the amount of light your skin absorbs — specifically in the green channel. The change is tiny: about 0.1-0.5% of the total reflected light. Invisible to the human eye.
But not invisible to a camera sensor recording at 30 frames per second.
If you extract the average green pixel intensity from a face region across 90 frames, you get a noisy time series. Apply a bandpass filter to isolate the 0.7–4.0 Hz range (42–240 BPM), run an FFT, and a peak emerges at the person's heart rate. That peak is a physiological fingerprint that cannot be faked by a photograph, a screen replay, or a deepfake video that doesn't model subcutaneous blood flow.
This technique is called remote photoplethysmography — rPPG. It was first demonstrated by Verkruysse et al. in 2008 using a $30 webcam and ambient light. Nearly two decades later, it's one of the most powerful liveness signals available — and FaceVault runs it on every camera-based verification session.
Why It Matters for KYC
The core problem in identity verification isn't matching faces. ArcFace does that with 99.83% accuracy. The problem is: is the face real?
A printed photograph has no pulse. A screen replay of a recorded video has no pulse. A deepfake generated frame-by-frame has no pulse. Even a sophisticated real-time face swap running through OBS has no pulse — the face swapping algorithm preserves skin colour but doesn't model the micro-fluctuations caused by blood flow.
rPPG is the one signal that requires an actual cardiovascular system on the other side of the camera. You can't fake physics.
Printed photo
Zero temporal variation. FFT shows flat noise floor. No peak anywhere in the 0.7–4.0 Hz band.
Screen replay
Screen refresh rate (60 Hz) dominates. Any residual "pulse" from the original video is buried under display artefacts and moire patterns.
Deepfake / face swap
GAN-generated frames model skin tone, not hemodynamics. The colour variations that encode blood flow are treated as noise and smoothed out by the generator.
Capturing 90 Frames in 3 Seconds
rPPG capture happens client-side, inside the user's browser, during the selfie step. While the user is looking at the camera for their selfie photo, we're quietly recording 90 frames at 30 FPS in the background. The user doesn't notice — it takes exactly 3 seconds.
// 128x96 canvas — tiny frames, fast upload
const canvas = document.createElement('canvas');
canvas.width = 128;
canvas.height = 96;
// Capture at 30 FPS for 3 seconds
const interval = setInterval(() => {
ctx.drawImage(videoStream, 0, 0, 128, 96);
canvas.toBlob(blob => frames.push(blob),
'image/jpeg', 0.6);
}, 33); // ~30 FPS
// SHA-256 hash chain for tamper detection
// hash[i] = SHA256(hash[i-1] || frame[i]) Each frame is 128×96 pixels at JPEG quality 0.6 — roughly 4 KB per frame. The entire 90-frame payload is about 360 KB. That's less than a single high-res selfie.
The frames are uploaded in a single batch to POST /{session_id}/rppg. The server validates each frame has a valid JPEG SOI header (0xFF 0xD8 0xFF) and silently discards any that don't — no error messages, no oracle for attackers to probe.
hash[i] = SHA256(hash[i-1] || frame[i]). This creates a tamper-evident chain — reordering, dropping, or substituting even a single frame breaks the chain. The final hash is sent alongside the frames for server-side verification.
The POS Method: Extracting Pulse from Pixels
The server receives 90 tiny JPEG frames. Now we need to extract a blood volume pulse signal from pixel data. We use the Plane Orthogonal to Skin (POS) method by Wang et al. (2017) — a colour-space projection specifically designed for rPPG.
Here's the pipeline, step by step:
Step 1: Extract colour signals
For each frame, crop the centre 50% of the face region (avoiding hair and background). Compute the mean R, G, and B pixel values. This gives us three time series of 90 values each: R(t), G(t), B(t).
Step 2: Normalise
Divide each channel by its temporal mean to remove the static skin colour component. What remains is the relative fluctuation — the tiny periodic changes caused by blood flow. A person with dark skin and a person with light skin will produce different absolute RGB values, but after normalisation, the pulse signal has comparable amplitude.
Step 3: POS projection
This is the key insight from Wang et al. The blood volume pulse affects R, G, and B channels differently. POS projects the normalised signals onto two orthogonal axes that separate pulse from motion noise:
S1 = G(t) - B(t)
S2 = G(t) + B(t) - 2×R(t)
α = std(S1) / std(S2)
pulse = S1 + α × S2 The adaptive α parameter balances the two components based on the signal's own statistics. This makes POS robust to different skin tones, lighting conditions, and camera white balance settings.
Step 4: Bandpass filter
Apply a 3rd-order Butterworth bandpass filter (0.7–4.0 Hz). The low cutoff at 0.7 Hz (42 BPM) removes breathing and slow lighting changes. The high cutoff at 4.0 Hz (240 BPM) removes camera noise and electrical interference. What survives this filter is, almost exclusively, the cardiac pulse signal.
filtfilt (forward-backward filtering) for zero phase distortion.
Finding the Heartbeat in the Frequency Domain
After filtering, we have a clean-ish pulse signal in the time domain. But we need to measure it. Is there actually a heartbeat in here, or just noise?
We run an FFT (Fast Fourier Transform) on the filtered signal, zero-padded to the next power of 2 (at least 256 points). This transforms the time-domain pulse into a frequency spectrum — a graph showing how much energy is present at each frequency.
What the FFT reveals
Clear dominant peak at the cardiac frequency. SNR > 5 dB.
Flat noise floor. No dominant frequency. SNR < 1 dB.
We measure two things from the power spectrum:
Peak frequency — the dominant frequency in the 0.7–4.0 Hz band. Multiply by 60 to get BPM. A real person at rest will produce a peak between 50–120 BPM.
Signal-to-Noise Ratio (SNR) — how much the peak stands out above the surrounding noise. A real pulse typically produces an SNR above 5 dB. A photograph produces an SNR below 1 dB — there's nothing to detect but sensor noise.
From Spectrum to Score
The FFT gives us a peak frequency and an SNR. The scoring function converts these into a single rPPG score from 0.0 to 1.0:
SNR scoring (70% weight)
BPM plausibility (30% weight)
rppg_score = snr_score × 0.70 + bpm_score × 0.30
# Example: SNR = 6.5 dB (score 0.8), BPM = 72 (score 1.0)
# rppg_score = 0.8 × 0.7 + 1.0 × 0.3 = 0.86 SNR carries 70% of the weight because it's the primary indicator of whether a genuine cardiac signal is present. BPM plausibility is a sanity check — even a noisy signal should produce a biologically plausible heart rate if it's real.
One Signal in a 12-Signal Orchestra
rPPG doesn't work alone. It's one of 12 anti-spoofing signals in FaceVault's fusion engine, each exploiting a different physical property that attacks can't simultaneously satisfy:
rPPG
Blood flow
Eye specular
Corneal reflections
Blendshapes
Micro-expressions
GAN texture
Spectral forensics
Depth
3D geometry
Blink
Eye closure
Moire FFT
Screen detection
ELA
Splice detection
+4 more
EXIF, noise, colour, JPEG
rPPG carries a 10% weight in the fusion. That might seem low — but it's by design. Not every session can generate rPPG data (file uploads from older devices, browsers that block camera access). The fusion engine normalises weights across available signals only, so when rPPG is present, it pulls its weight. When it's absent from a camera session, a missing rPPG penalty of 0.15 is injected — because a live camera session that produces no heartbeat signal is suspicious.
What It Defeats
Printed photographs
BLOCKEDZero temporal variation in skin colour. The FFT shows nothing but flat noise. SNR < 1 dB, score ≈ 0.14.
Screen replays
BLOCKEDThe screen's refresh rate (50/60 Hz) adds a dominant artefact far above the cardiac band. Any residual pulse from the original recording is destroyed by the display's gamma curve and backlight PWM.
Deepfakes & face swaps
BLOCKEDGAN generators model appearance, not hemodynamics. The sub-pixel colour fluctuations that encode blood flow are treated as noise by the generator and smoothed away. No generator architecture in production today preserves rPPG signals.
Silicone masks
WEAKENEDThin masks may transmit some blood flow signal from the wearer's face underneath. However, the signal is severely attenuated (SNR 2–4 dB vs typical 5–7 dB), pushing the score into the review band. Combined with depth and texture signals, masks are consistently flagged.
Where It Struggles
rPPG isn't magic. It has real limitations, and we think being honest about them matters more than marketing:
Low light
In very dim environments (< 50 lux), the camera sensor's noise floor overwhelms the pulse signal. SNR drops below 3 dB and the score bottoms out. This is physics — no algorithm can extract signal from noise that isn't there.
Excessive head movement
The current implementation uses a fixed ROI (centre 50%). Large head movements shift the face out of the ROI, causing discontinuities in the colour signal. Face-tracking ROI would help — it's on the roadmap.
Browser compatibility
rPPG requires getUserMedia() and a stable video stream. Some browsers, privacy extensions, or corporate firewalls block camera access. When rPPG frames can't be captured, the system falls back to the remaining 11 signals. No user is ever rejected solely because rPPG was unavailable.
By the Numbers
90
Frames captured
3s
Capture duration
360
KB total payload
0.7–4
Hz bandpass range
5+
dB SNR (real pulse)
50ms
Server analysis time
Your Face Proves You're Alive
Remote photoplethysmography is one of those technologies that sounds like science fiction until you see the math. Extract colour channels from video frames. Normalise. Project onto a colour space tuned for hemodynamics. Filter. FFT. Read the peak.
And there it is — a heartbeat, measured through a webcam, from 90 frames captured in 3 seconds, weighing less than a single photograph. No special hardware. No infrared sensors. No finger clips. Just physics, signal processing, and the fact that your skin changes colour with every beat of your heart.
That's the liveness signal that deepfakes can't fake. And it's running on every FaceVault verification session, right now.
References & Further Reading
Algorithmic Principles of Remote PPG — Wang et al., IEEE TBME 2017 (the POS method used by FaceVault)
Remote plethysmographic imaging using ambient light — Verkruysse et al., Optics Express 2008 (first webcam rPPG demonstration)
DeepFake Detection: A Survey — Tolosana et al., IEEE TIFS 2020
DeepFakesON-Phys: rPPG for Face Forgery Detection — Hernández-Ortega et al., 2020 (rPPG as a deepfake detector)
Deepfake Defense: An IDS/IPS for Identity Verification — the full 12-signal anti-spoofing pipeline
How FaceVault Verifies a Face in Under 30 Seconds — the verification pipeline this signal feeds into