Remote photoplethysmography (rPPG) is gaining prominence for its non-invasive approach to monitoring physiological signals using only cameras. Despite its promise, the adaptability of rPPG models to new, unseen domains is hindered due to the environmental sensitivity of physiological signals.
To address this issue, we pioneer the Test-Time Adaptation (TTA) in rPPG, enabling the adaptation of pre-trained models to the target domain during inference, sidestepping the need for annotations or source data due to privacy considerations. Particularly, utilizing only the user's face video stream as the accessible target domain data, the rPPG model is adjusted by tuning on each single instance it encounters.
However, 1) TTA algorithms are designed predominantly for classification tasks, ill-suited in regression tasks such as rPPG due to inadequate supervision. 2) Tuning pre-trained models in a single-instance manner introduces variability and instability, posing challenges to effectively filtering domain-relevant from domain-irrelevant features while simultaneously preserving the learned information.
To overcome these challenges, we present Bi-TTA, a novel expert knowledge-based Bidirectional Test-Time Adapter framework. Specifically, leveraging two expert-knowledge priors for providing self-supervision, our Bi-TTA primarily comprises two modules: a prospective adaptation (PA) module using sharpness-aware minimization to eliminate domain-irrelevant noise, enhancing the stability and efficacy during the adaptation process, and a retrospective stabilization (RS) module to dynamically reinforce crucial learned model parameters, averting performance degradation caused by overfitting or catastrophic forgetting.
To this end, we established a large-scale benchmark for rPPG tasks under TTA protocol, promoting advancements in both the rPPG and TTA fields. The experimental results demonstrate the significant superiority of our approach over the state-of-the-art (SoTA).
Visualization of Remote photoplethysmography (rPPG) workflow. Traditional physiological measurement devices like electrocardiograms (ECG), heart rate bands, and finger-clips, despite their accuracy, are often expensive and uncomfortable to wear. In contrast, using only ordinary cameras, rPPG offers a non-invasive and more convenient alternative by extracting blood volume pulses (BVP) from facial videos, analyzing the skin's light absorption variations to measure HR, HRV, and RF. These monitors are especially important for tracking health status and sympathetic activity levels.
TTA aims to fine-tune a pre-trained source model during inference time, as illustrated in the figure below, without accessing the distribution and labeling of both source data and target data and naturally eliminate the need for intensive re-training.
Visualization of Domain Adaptation (DA) and Test-Time Adaptation (TTA) methodologies. DA utilizes batch learning with labeled target data, while TTA dynamically refines the model during inference without relying on target data labels or distribution. Both DA and TTA obviate the requirement for source data. Note that domain generalization (DG) is excluded as it does not focus on specific target domain adaptation.
Illustration of STMap construction and the implementation of our proposed expert knowledge-based priors. (a) The process of generating STMap, encompassing face alignment and cropping, local signal extraction, and the subsequent integration. (b) The calculation process of Temporal consistency loss (TCL), aimed at minimizing significant prediction discrepancies between original and temporally shifted HR predictions. (c) The calculation process the Spatial consistency loss (SCL), focused on penalizing pronounced disparities across different facial regions. Note that the boxes colored in pink represent the loss outcomes.
Illustration of the proposed Bidirectional Test-Time Adapter (Bi-TTA). Black arrows → indicate the adaptation process purely with the proposed two priors, i.e., TCL and SCL. Orange arrows → denote that the Prospective adaptation (PA) module adjusts model parameter using the gradient of representative neighborhood with a radius ρ. The green ones → show that the Retrospective stabilization (RS) is activated when there is an oscillation, which is a sign of performance degradation, for maintaining the essential learned adaptation ability with former tuning gradients.
Experimental results on Heart Rate (HR) estimation. "Ours w/o P.R." denotes that only the expert knowledge-based priors is adopted, without the bidirectional adaptation strategy. The best results are highlighted in bold, and the second-best results are underlined. It is evident that TTA methods generally perform better than DG approaches. Among these TTA methods, our Bi-TTA demonstrates significantly superior performance, showcasing our effectiveness especially when both prospective and retrospective adaptations are synergistically employed. This highlights Bi-TTA's robustness in adapting to various unseen domains, underscoring its potential for real-world applications.
@inproceedings{li2024bi,
author = {Haodong Li and Hao Lu and Ying-Cong Chen},
booktitle = {Proceedings of the European conference on computer vision (ECCV)},
title = {Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement},
year = {2024}
}