Data Strategy8 min read

How Much Training Data a Custom Vitals Model Needs

A data-strategy brief on dataset size, diversity, and reference-device needs for a reliable camera vitals model, written for automotive Tier-1 engineering teams.

tryvitalsapp.com Research Team·June 20, 2026

How Much Training Data a Custom Vitals Model Needs

Engineering leads at automotive Tier-1 suppliers tend to stall on the same anxiety when a contactless vitals feature reaches the planning stage: the assumption that a camera-based heart rate or respiration model will demand tens of thousands of recorded subjects before it works at all. That fear shapes budgets, kills pilots, and pushes teams toward generic licensed engines that were never tuned for their cabin camera. The reality is more nuanced. The volume of training data for a custom vitals model is far less important than how the data is distributed across people, conditions, and the exact sensor that will ship in the vehicle. Getting that distribution right is what separates a model that passes a demo from one that survives a Euro NCAP review.

"Public remote photoplethysmography datasets are dominated by lighter skin tones, with darker Fitzpatrick types often making up under 10 percent of subjects, a gap that directly produces biased error rates in deployed models." - analysis drawn from Dasari et al., demographic bias study in rPPG datasets, 2021.

Why training data for a custom vitals model is about distribution, not just volume

Remote photoplethysmography (rPPG) recovers a pulse signal from tiny color changes in skin reflectance captured on video. A model learns to separate that faint signal from noise: motion, lighting shifts, compression artifacts, and sensor characteristics unique to one camera. The question of how much training data a custom vitals model needs is really three separate questions about subject count, condition coverage, and reference-device quality.

The published research offers useful anchors. The widely used UBFC-rPPG benchmark, released by Bobbia and colleagues in 2017, contains only 42 participants recorded with a single consumer webcam, yet it remains a workhorse for algorithm development. VIPL-HR, introduced by Niu et al. in 2018, expanded to 107 subjects across nine acquisition scenarios and multiple devices. More recent collection efforts such as VitalVideo (2023) reached roughly 900 subjects across six skin tones, and the MCD-rPPG effort gathered around 600 subjects with synchronized PPG and ECG. The pattern is clear: condition and demographic diversity, not raw subject counts in the tens of thousands, is what moved the field forward.

For a custom build targeting one specific camera, the math is more forgiving than most procurement teams expect. A model fine-tuned for a single known sensor does not need to generalize across every camera on earth. It needs to generalize across every person and every condition that sensor will encounter. That narrowing of scope is the central reason custom builds can succeed on dataset sizes that would be hopeless for a universal engine.

Dataset size benchmarks for a custom rPPG model training dataset

The table below summarizes how dataset requirements scale with the ambition of the build. These ranges reflect patterns seen across public datasets and applied collection programs, not a guarantee for any single project.

Build goal	Typical subject range	Condition coverage needed	Reference device	Relative data burden
Proof of concept on one camera	20 to 50	Controlled lighting, limited motion	Finger pulse oximeter	Low
Production model, single fixed use case	150 to 400	Multiple lighting states, natural motion	Medical-grade PPG or ECG	Medium
Safety-relevant automotive cabin model	400 to 1,000+	Day, dusk, night, IR, vibration, occlusion	ECG plus contact PPG	High
Multi-camera platform model	1,000+	All of the above across devices	ECG plus contact PPG	Very high

A few practical observations follow from these ranges:

Subject diversity matters more than session count. Two hundred well-distributed people beat fifty people recorded ten times each.
Skin tone coverage across the full Fitzpatrick scale is non-negotiable for any deployed product, because reflectance physics changes the signal-to-noise ratio.
Motion and lighting variety create most of the real-world difficulty, so a cabin model needs vibration, glare, and low-light examples rather than just more calm seated clips.
Age, facial hair, glasses, and makeup all shift the signal and should appear in proportion to the expected user base.

Industry applications and their data profiles

Automotive driver monitoring

Cabin cameras are usually near-infrared, run at constrained frame rates, and face brutal lighting transitions from tunnels to direct sun. A driver monitoring model needs data captured through the actual IR sensor under those conditions, not borrowed RGB footage from a lab. Vibration and steering motion must be represented. Because regulatory frameworks such as Euro NCAP increasingly reward occupant monitoring, the reference-grade ground truth bar is higher here than in most consumer categories.

Iot and smart home devices

Fixed cameras in mirrors, panels, and appliances see users at varying distances and angles but in relatively stable lighting. The data burden is lower than automotive, though distance variation and incidental motion still need coverage. A few hundred diverse subjects through the target sensor is often enough for a dependable single-use model.

Smart glasses and wearable optics

Head-mounted cameras introduce constant micro-motion and unusual viewing geometry of the skin. These builds benefit from data captured during natural head movement and from reference signals that stay synchronized despite the wearer moving.

Current research and evidence

Two research threads should reassure teams worried about an impossible data-collection mountain.

First, demographic bias is a measurement problem, not just a volume problem. Work by Dasari and colleagues in 2021 documented that public rPPG datasets skew heavily toward lighter skin tones, and that this skew translates into measurably worse error for darker-skinned users. The fix is targeted recruitment, not simply more total footage. A deliberately balanced 400-subject set can outperform an unbalanced 2,000-subject set on the populations that matter for compliance.

Second, transfer learning and synthetic augmentation sharply reduce the real-world data needed for a new camera. A 2024 line of work on test-time and source-free domain adaptation, including the SFDA-rPPG framework, showed that models pretrained on existing corpora can adapt to a new target domain with limited or even unlabeled target data. Comparative studies of convolutional transfer-learning regressors for rPPG, published in MDPI venues, reported that pretrained backbones cut the dependency on large high-quality datasets. Generative approaches that synthesize plausible pulse signals further broaden heart-rate and lighting coverage without recruiting more people.

The combined message: a custom vitals model rarely starts from zero. A strong pretrained foundation plus a focused, well-designed collection campaign on the target sensor is the efficient path. Ground truth quality is the part that cannot be shortcut. Whether the reference is a contact PPG oximeter or an ECG chest signal, synchronization to the video frame and clinical-grade accuracy of that reference set the ceiling on what the model can learn. Sloppy ground truth poisons even a large dataset.

The future of training data for custom vitals models

Three shifts are reshaping the data-burden conversation for Tier-1 suppliers. Synthetic data generation is maturing fast, with GAN and physics-based pipelines producing labeled pulse signals that fill rare corners of the distribution such as extreme heart rates or underrepresented skin tones. Privacy-preserving collection, including differentially private synthetic training data, is becoming a procurement requirement as biometric regulation tightens. And foundation-style pretraining means future custom builds will increasingly be adaptation tasks rather than ground-up training efforts, shrinking the marginal data a new camera requires.

The net direction is encouraging: the absolute volume of newly recorded subjects needed per camera is trending down, while the premium on diversity and ground-truth rigor is trending up. Teams that plan their collection around those two levers spend less and ship more reliable models.

Frequently asked questions

How many subjects do I really need for a production camera vitals model? For a single fixed use case, applied programs commonly target 150 to 400 diverse subjects recorded through the production sensor. Safety-relevant automotive cabin models usually need 400 to 1,000 or more because of the wide range of lighting, motion, and IR conditions involved. Diversity across these subjects matters more than sheer count.

What counts as acceptable ground truth vitals data? A synchronized, clinical-grade reference signal. Contact photoplethysmography from a finger oximeter is the common baseline, while ECG is preferred for heart-rate-variability and safety-critical work. The reference must be time-aligned to individual video frames, because misaligned ground truth caps achievable accuracy regardless of dataset size.

Can transfer learning reduce the data I have to collect? Yes. Research on transfer learning and domain adaptation for rPPG shows that pretrained models adapt to a new camera with substantially less target data, sometimes even unlabeled data. This is why a custom build for one specific sensor rarely starts from scratch and why data burdens are lower than many teams assume.

Why can't I just use a public dataset instead of collecting my own? Public datasets are recorded on different cameras, mostly visible-light RGB, and skew toward lighter skin tones. They are excellent for pretraining but cannot capture your sensor's noise profile, frame rate, or IR response. A focused collection on the target camera closes that gap.

Circadify is building custom-trained rPPG models matched to a client's exact camera, sensor, and deployment conditions, with data strategy designed around distribution and reference quality rather than brute-force volume. Automotive Tier-1 teams weighing the data burden of a contactless vitals program can scope a realistic collection plan through a custom build inquiry at circadify.com/custom-builds.

rPPGcustom vitals modeltraining dataautomotive Tier-1ground truthdata strategy

Back to Blog