Custom rPPG Models for IR and Thermal Cameras: How They Work
A technical deep-dive into building custom rPPG models for infrared and thermal camera hardware. Covers NIR, SWIR, and LWIR modalities with applications for automotive, security, and industrial OEMs.

The majority of rPPG research has focused on visible-light RGB cameras, but production hardware increasingly relies on infrared imaging. Automotive driver monitoring systems use 940 nm NIR flood illumination. Access-control and security platforms operate in mixed visible/IR modes. Industrial monitoring applications demand performance in total darkness. Building a custom rPPG model IR thermal camera deployments require is a fundamentally different engineering challenge than adapting a webcam-trained model -- the underlying signal physics change with wavelength, and the entire training pipeline must be rebuilt around the target spectral band.
"In the near-infrared, you are no longer measuring the same hemoglobin absorption signal. You are measuring a different physiological contrast mechanism, and your model must learn that mechanism from scratch." -- Adapted from Nowara et al., IEEE CVPRW 2021
This post examines how rPPG signal extraction works across the infrared spectrum, what custom model training looks like for each modality, and where the technology is being deployed.
Analysis: rPPG Signal Physics Across the Infrared Spectrum
The rPPG signal in visible light originates primarily from the differential absorption of hemoglobin in arterial blood. Oxygenated hemoglobin (HbO2) and deoxyhemoglobin (Hb) have distinct absorption spectra in the 400-700 nm range, with the strongest pulsatile signal appearing in the green channel (~540 nm) where the absorption difference is greatest. As wavelength increases beyond the visible range, the signal physics shift.
Near-Infrared (NIR): 700-1000 nm
In the NIR band, hemoglobin absorption decreases but does not disappear. The dominant signal mechanism transitions from surface-level absorption to deeper-tissue volumetric scattering. NIR photons penetrate further into tissue, and the pulsatile signal reflects deeper arterial blood-volume changes. The signal exists, but its amplitude, morphology, and spatial distribution differ from the visible-light signal. A model trained on RGB data has learned to look for the wrong signal characteristics.
Active NIR illumination (common in automotive and security applications) provides a controlled, consistent light source that eliminates ambient-illumination variability -- a major advantage over passive visible-light systems. However, the illumination geometry (beam angle, power distribution, distance to subject) directly affects the signal and must be accounted for in training.
Short-Wave Infrared (SWIR): 1000-2500 nm
SWIR imaging is less common in consumer applications but is used in specialized industrial and medical contexts. Water absorption increases significantly in this band, and the rPPG signal mechanism shifts toward detecting water-content changes associated with arterial pulsation. The signal is weaker and noisier, requiring more sophisticated temporal filtering and longer observation windows. Custom models for SWIR must be trained with high-quality reference signals and substantial data to learn the subtle signal characteristics.
Long-Wave Infrared (LWIR / Thermal): 8000-14000 nm
Thermal cameras detect emitted radiation, not reflected light. The rPPG-relevant signal in thermal imagery comes from temperature fluctuations on the skin surface driven by arterial blood flow. These fluctuations are extremely small (tens of millikelvins) and are superimposed on larger thermal variations from respiration, ambient temperature changes, and metabolic heat production. Extracting pulse rate from thermal imagery requires custom models trained specifically on thermal data with paired ground-truth physiological signals.
Comparison: rPPG Signal Characteristics by Spectral Band
| Characteristic | Visible RGB (400-700 nm) | NIR (700-1000 nm) | SWIR (1000-2500 nm) | LWIR / Thermal (8-14 um) |
|---|---|---|---|---|
| Primary signal mechanism | Hemoglobin absorption (surface) | Volumetric scattering (deeper tissue) | Water absorption variation | Surface temperature fluctuation |
| Signal amplitude | Strong (~1-2% reflectance change) | Moderate (~0.3-0.8%) | Weak (~0.1-0.3%) | Very weak (10-50 mK) |
| Ambient light dependency | High (passive illumination) | Low (active illumination typical) | Low (active illumination) | None (emitted radiation) |
| Darkness operation | Not possible | Yes (with active illumination) | Yes (with active illumination) | Yes (fully passive) |
| Skin-tone sensitivity | Significant (melanin absorption) | Reduced (melanin transparent in NIR) | Minimal | None |
| Public training data availability | Extensive (UBFC, PURE, COHFACE, etc.) | Very limited | Essentially none | Very limited |
| Custom training requirement | Optional for same-class cameras | Required | Required | Required |
| Typical deployment | Consumer devices, telehealth | Automotive DMS, security | Industrial, specialized medical | Military, building monitoring |
The table makes clear that any deployment outside the visible RGB band requires custom model training. There is no viable path to NIR, SWIR, or thermal rPPG using only public-dataset-trained models.
Applications: IR and Thermal rPPG in Production Hardware
Automotive Driver Monitoring (NIR)
The automotive DMS market is the single largest driver of NIR rPPG development. Camera modules operating at 940 nm (chosen to be invisible to the driver and to avoid interference with eye-tracking IR LEDs at 850 nm) are standard across new-vehicle platforms targeting European NCAP and Euro NCAP protocols. Custom rPPG model training for these modules must account for:
- Single-channel imagery (no RGB decomposition; signal must be extracted from intensity alone)
- Active illumination non-uniformity (LED beam patterns create spatial intensity gradients)
- Driver-cabin vibration (road surface, engine, HVAC introduce broadband motion artifacts)
- Windshield spectral filtering (coated windshields attenuate specific NIR wavelengths)
- Extreme dynamic range (sunlight ingress through windows creates high-contrast scenes)
Each of these factors is specific to the Tier-1 supplier's hardware configuration and must be represented in the training data.
Security and Access Control (Dual RGB-NIR)
Security cameras increasingly ship with dual-mode sensors that capture RGB during daytime and switch to NIR with active illumination at night. An rPPG model for these platforms must either (a) operate in both modalities using a shared architecture with modality-specific input normalization, or (b) consist of two specialized sub-models with a modality-detection switch. Custom training requires paired data collection under both operating modes with the specific camera module.
Building and Occupancy Monitoring (Thermal)
Thermal cameras deployed for occupancy analytics in smart buildings can extract pulse-rate information as a secondary signal alongside presence detection and occupancy counting. The model operates on 14-bit thermal frames with very different noise characteristics (fixed-pattern noise, non-uniformity correction artifacts) than visible-light imagery. Custom training must be performed on the specific thermal core (microbolometer array) with its native non-uniformity correction firmware.
In-Cabin Wellness for Commercial Vehicles (NIR)
Long-haul trucking and fleet management applications are deploying NIR cameras for continuous driver wellness monitoring. The rPPG model must be robust to extended-duration operation (hours-long sessions), gradual changes in driver posture, and vibration profiles specific to heavy commercial vehicles. Custom training datasets for this application capture multi-hour driving sessions with continuous reference physiological monitoring.
Research Foundations
Key publications supporting IR and thermal rPPG custom model development:
- Nowara et al., IEEE CVPRW 2021 -- One of the foundational demonstrations of rPPG signal extraction from NIR imagery. Showed that dedicated NIR-trained models can extract pulse rate from 940 nm camera footage, but models trained on RGB data fail completely on NIR input.
- Negishi et al., Sensors 2020 -- Demonstrated contactless pulse-rate estimation from thermal (LWIR) facial video. Reported that the thermal rPPG signal is detectable but requires careful ROI selection (periorbital and forehead regions where arterial vasculature is closest to the surface) and temporal filtering tuned to the low-amplitude thermal signal.
- Magdalena Nowara et al., IEEE TBIOM 2022 -- Extended NIR rPPG to multi-wavelength configurations, showing that combining 850 nm and 940 nm channels improves signal robustness. Custom multi-spectral models outperformed single-wavelength models in low-SNR conditions.
- Kuang et al., Biomedical Optics Express 2023 -- Modeled the depth-dependent optical path of NIR photons in tissue and showed that the NIR rPPG signal originates from deeper vascular structures than the visible-light signal. This finding has direct implications for ROI selection and spatial feature learning in custom NIR models.
- Chen et al., IEEE TMI 2024 -- Proposed a cross-spectral transfer learning framework for rPPG, enabling partial knowledge transfer from RGB-trained models to NIR models. Reduced the NIR training data requirement by approximately 40% compared to training from scratch, but still required target-sensor NIR data for the final fine-tuning stage.
Future Directions
Multi-spectral fusion on a single sensor. Camera modules that simultaneously capture RGB and NIR on the same sensor array (using RGBIR Bayer patterns) are entering the market. Custom rPPG models for these sensors can learn to dynamically weight visible and infrared channels based on ambient conditions, extracting the best available signal at all times.
Thermal rPPG with higher-resolution sensors. As thermal microbolometer arrays move from 160x120 to 640x480 and beyond, the spatial resolution becomes sufficient for sub-facial ROI analysis. This enables custom models to identify optimal pulse-extraction regions automatically rather than relying on fixed ROI templates.
Event cameras for rPPG. Neuromorphic event cameras (which output per-pixel brightness changes rather than frames) offer microsecond temporal resolution with no motion blur. Early research suggests they may capture the rPPG signal with extremely high temporal fidelity. Custom model architectures for event-camera rPPG are a nascent but promising research direction.
Self-supervised pre-training on unlabeled IR video. Collecting ground-truth paired data (video + reference PPG) is the primary bottleneck in custom IR model training. Self-supervised methods that learn useful representations from unlabeled IR video -- then fine-tune with a small labeled dataset -- could substantially reduce the data collection burden.
FAQ
Can a visible-light rPPG model be adapted to work on NIR cameras?
Not directly. The signal physics differ too fundamentally. However, transfer learning from an RGB-trained backbone can provide useful initialization for an NIR model. Chen et al. (IEEE TMI 2024) showed that cross-spectral transfer learning reduces the NIR training data requirement by approximately 40%, but real NIR training data is still required for the fine-tuning stage.
Is rPPG possible with thermal cameras in practice?
Yes, but with significant constraints. The thermal rPPG signal is very low amplitude (10-50 mK) and requires high-sensitivity thermal cores, careful ROI selection, and custom-trained temporal filtering. Pulse-rate estimation from thermal imagery has been demonstrated in controlled environments (Negishi et al., Sensors 2020). Deployment in uncontrolled environments remains challenging.
What frame rate is needed for NIR rPPG?
A minimum of 20 fps is generally required for pulse-rate estimation. For waveform-level BVP analysis (pulse waveform morphology, pulse transit time), 30 fps or higher is recommended. Automotive NIR cameras typically operate at 30-60 fps, which is sufficient for rPPG applications.
Does skin tone affect NIR rPPG performance?
Significantly less than visible-light rPPG. Melanin, which is the primary source of skin-tone-dependent signal variation in RGB rPPG, is largely transparent in the NIR band (>850 nm). Nowara et al. (IEEE CVPRW 2021) reported substantially reduced performance variation across skin tones in NIR compared to RGB. This is one of the key advantages of NIR-based rPPG for diverse populations.
How much training data is needed for a custom IR rPPG model?
For NIR models with transfer learning from an RGB backbone: 300-1,000 paired clips on the target sensor typically achieve strong performance. For thermal models trained from scratch: 1,000-3,000 paired clips are generally required due to the lower signal amplitude and more complex noise characteristics. Data should span the intended demographic diversity and operating conditions.
IR and thermal rPPG represent the next frontier of camera-based physiological sensing, but they demand custom model builds tuned to the specific sensor, wavelength, and deployment environment. If your hardware team is building a product around NIR, SWIR, or thermal imaging and needs an rPPG model engineered for your sensor stack, start a custom-build conversation with the Circadify team.
