Heart rate analysis
This tutorial shows how to extract heart rate estimates using photoplethysmography (PPG) data and accelerometer data. The pipeline consists of a stepwise approach to determine signal quality, assessing both PPG morphology and accounting for periodic artifacts using the accelerometer. Based on the signal quality, we extract high-quality segments and estimate the heart rate for every 2 s using the smoothed pseudo Wigner-Ville Distribution.
Load data
This pipeline requires accelerometer and PPG data to run. In this example we loaded data from a participant of the Personalized Parkinson Project. We load the corresponding dataframes using the load_tsdf_dataframe function. The channel green
represents the values obtained with PPG using green light.
In this example we use the interally developed TSDF
(documentation) to load and store data [1].
However, we are aware that there are other common data formats. For example, the following functions can be used depending on the file extension of the data:
.csv:
pandas.read_csv()
(documentation).json:
json.load()
(documentation)
from pathlib import Path
from paradigma.util import load_tsdf_dataframe
path_to_data = Path('../../tests/data')
path_to_prepared_data = path_to_data / '1.prepared_data'
ppg_prefix = 'PPG'
imu_prefix = 'IMU'
df_ppg, metadata_time_ppg, _ = load_tsdf_dataframe(
path_to_data=path_to_prepared_data / ppg_prefix,
prefix=ppg_prefix
)
df_imu, metadata_time_imu, _ = load_tsdf_dataframe(
path_to_data=path_to_prepared_data / imu_prefix,
prefix=imu_prefix
)
# Drop the gyroscope columns from the IMU data
cols_to_drop = df_imu.filter(regex='^gyroscope_').columns
df_acc = df_imu.drop(cols_to_drop, axis=1)
display(df_ppg, df_acc)
time | green | |
---|---|---|
0 | 0.00000 | 649511 |
1 | 0.00996 | 648214 |
2 | 0.01992 | 646786 |
3 | 0.02988 | 645334 |
4 | 0.03984 | 644317 |
... | ... | ... |
64770 | 644.54720 | 553884 |
64771 | 644.55716 | 554088 |
64772 | 644.56712 | 554240 |
64773 | 644.57708 | 555134 |
64774 | 644.58704 | 557205 |
64775 rows × 2 columns
time | accelerometer_x | accelerometer_y | accelerometer_z | |
---|---|---|---|---|
0 | 0.00000 | 0.550718 | 0.574163 | -0.273684 |
1 | 0.01004 | 0.535885 | 0.623445 | -0.254545 |
2 | 0.02008 | 0.504306 | 0.651675 | -0.251675 |
3 | 0.03012 | 0.488517 | 0.686603 | -0.265550 |
4 | 0.04016 | 0.494258 | 0.725359 | -0.278469 |
... | ... | ... | ... | ... |
72942 | 730.74468 | 0.234928 | -0.516268 | -0.802871 |
72943 | 730.75472 | 0.245455 | -0.514354 | -0.806699 |
72944 | 730.76476 | 0.243541 | -0.511005 | -0.807177 |
72945 | 730.77480 | 0.240191 | -0.514354 | -0.808134 |
72946 | 730.78484 | 0.243541 | -0.511005 | -0.808134 |
72947 rows × 4 columns
Step 1: Preprocess data
The first step after loading the data is preprocessing using the preprocess_ppg_data. This begins by isolating segments containing both PPG and IMU data, discarding portions where one modality (e.g., PPG) extends beyond the other, such as when the PPG recording is longer than the accelerometer data. This functionality requires the starting times (metadata_time_ppg.start_iso8601
and metadata_time_imu.start_iso8601
) in iso8601 format as inputs. After this step, the preprocess_ppg_data function resamples the PPG and accelerometer data to uniformly distributed timestamps, addressing the fixed but non-uniform sampling rates of the sensors. After this, a bandpass Butterworth filter (4th-order, bandpass frequencies: 0.4–3.5 Hz) is applied to the PPG signal, while a high-pass Butterworth filter (4th-order, cut-off frequency: 0.2 Hz) is applied to the accelerometer data.
Note: the printed shapes are (rows, columns) with each row corresponding to a single data point and each column representing a data column (e.g.time). The number of rows of the overlapping segments of PPG and accelerometer are not the same due to sampling differences (other sensors and possibly other sampling frequencies).
from paradigma.config import PPGConfig, IMUConfig
from paradigma.preprocessing import preprocess_ppg_data
ppg_config = PPGConfig()
imu_config = IMUConfig()
print(f"Original data shapes:\n- PPG data: {df_ppg.shape}\n- Accelerometer data: {df_imu.shape}")
df_ppg_proc, df_acc_proc = preprocess_ppg_data(
df_ppg=df_ppg,
df_acc=df_acc,
ppg_config=ppg_config,
imu_config=imu_config,
start_time_ppg=metadata_time_ppg.start_iso8601,
start_time_imu=metadata_time_imu.start_iso8601
)
print(f"Overlapping preprocessed data shapes:\n- PPG data: {df_ppg_proc.shape}\n- Accelerometer data: {df_acc_proc.shape}")
display(df_ppg_proc, df_acc_proc)
Original data shapes:
- PPG data: (64775, 2)
- Accelerometer data: (72947, 7)
Overlapping preprocessed data shapes:
- PPG data: (19338, 2)
- Accelerometer data: (64459, 4)
time | green | |
---|---|---|
0 | 0.000000 | -1434.856838 |
1 | 0.033333 | -3461.717789 |
2 | 0.066667 | -5172.913796 |
3 | 0.100000 | -6343.203160 |
4 | 0.133333 | -6875.904054 |
... | ... | ... |
19333 | 644.433333 | -2403.547097 |
19334 | 644.466667 | -7434.311944 |
19335 | 644.500000 | -8214.847078 |
19336 | 644.533333 | -5194.393329 |
19337 | 644.566667 | 70.147000 |
19338 rows × 2 columns
time | accelerometer_x | accelerometer_y | accelerometer_z | |
---|---|---|---|---|
0 | 0.00 | 0.053078 | 0.010040 | -0.273154 |
1 | 0.01 | 0.038337 | 0.058802 | -0.256899 |
2 | 0.02 | 0.006824 | 0.086559 | -0.256739 |
3 | 0.03 | -0.009156 | 0.120855 | -0.273280 |
4 | 0.04 | -0.003770 | 0.159316 | -0.289007 |
... | ... | ... | ... | ... |
64454 | 644.54 | 0.104185 | 0.101627 | -0.029342 |
64455 | 644.55 | 0.120403 | 0.090413 | 0.051113 |
64456 | 644.56 | 0.101289 | 0.128684 | 0.037951 |
64457 | 644.57 | 0.073139 | 0.137399 | 0.030834 |
64458 | 644.58 | 0.004578 | 0.102862 | 0.025627 |
64459 rows × 4 columns
Step 2: Extract signal quality features
The preprocessed data (PPG & accelerometer) is windowed into overlapping windows of length ppg_config.window_length_s
with a window step of ppg_config.window_step_length_s
. From the PPG windows 10 time- and frequency domain features are extracted to assess PPG morphology and from the accelerometer windows one relative power feature is calculated to assess periodic motion artifacts.
The detailed steps are encapsulated in extract_signal_quality_features
(documentation can be found here).
from paradigma.config import HeartRateConfig
from paradigma.pipelines.heart_rate_pipeline import extract_signal_quality_features
ppg_config = HeartRateConfig('ppg')
acc_config = HeartRateConfig('imu')
print("The default window length for the signal quality feature extraction is set to", ppg_config.window_length_s, "seconds.")
print("The default step size for the signal quality feature extraction is set to", ppg_config.window_step_length_s, "seconds.")
df_features = extract_signal_quality_features(
df_ppg=df_ppg_proc,
df_acc=df_acc_proc,
ppg_config=ppg_config,
acc_config=acc_config,
)
df_features
The default window length for the signal quality feature extraction is set to 6 seconds.
The default step size for the signal quality feature extraction is set to 1 seconds.
time | var | mean | median | kurtosis | skewness | signal_to_noise | auto_corr | f_dom | rel_power | spectral_entropy | acc_power_ratio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 1.697510e+07 | 3418.696392 | 2892.815105 | 2.540447 | 0.277237 | 3.241634 | 0.282452 | 1.171875 | 0.179922 | 0.561186 | 0.014196 |
1 | 1.0 | 1.635033e+07 | 3350.769158 | 2892.815105 | 2.647951 | 0.420163 | 3.202313 | 0.276526 | 1.171875 | 0.201972 | 0.547812 | 0.015343 |
2 | 2.0 | 1.921248e+07 | 3698.963099 | 3310.773556 | 2.333687 | 0.481974 | 3.521882 | 0.210984 | 0.820312 | 0.264922 | 0.509015 | 0.042583 |
3 | 3.0 | 1.646562e+07 | 3389.325775 | 2946.588526 | 2.608187 | 0.733913 | 3.346236 | 0.225823 | 0.703125 | 0.270707 | 0.496051 | 0.034215 |
4 | 4.0 | 1.603135e+07 | 3364.366313 | 2912.280798 | 2.268768 | 0.440357 | 3.314565 | 0.305998 | 0.820312 | 0.264097 | 0.496697 | 0.056204 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
634 | 634.0 | 2.281660e+06 | 1106.006037 | 838.307729 | 4.058037 | 0.674184 | 2.072660 | 0.172052 | 0.703125 | 0.204028 | 0.577570 | 0.060486 |
635 | 635.0 | 2.224441e+06 | 1113.194933 | 838.307729 | 3.684159 | 0.067979 | 2.118019 | 0.236408 | 0.703125 | 0.221044 | 0.560362 | 0.053120 |
636 | 636.0 | 5.103141e+06 | 1627.618918 | 1030.113303 | 3.913068 | 0.852331 | 2.040094 | 0.235822 | 0.585938 | 0.157667 | 0.514717 | 0.058853 |
637 | 637.0 | 7.428728e+06 | 2093.736007 | 1520.640067 | 3.188753 | 0.305780 | 2.457446 | 0.322034 | 0.468750 | 0.150582 | 0.458477 | 0.061269 |
638 | 638.0 | 1.549534e+07 | 2974.462555 | 2482.703944 | 4.042171 | -0.628008 | 2.228730 | 0.095918 | 0.468750 | 0.102976 | 0.493830 | 0.105554 |
639 rows × 12 columns
Step 3: Signal quality classification
A trained logistic classifier is used to predict PPG signal quality and returns the pred_sqa_proba
, which is the posterior probability of a PPG window to look like the typical PPG morphology (higher probability indicates toward the typical PPG morphology). The relative power feature from the accelerometer is compared to a threshold for periodic artifacts and therefore pred_sqa_acc_label
is used to return a label indicating predicted periodic motion artifacts (label 0) or no periodic motion artifacts (label 1).
The classification step is implemented in signal_quality_classification
(documentation can be found here).
from importlib.resources import files
from paradigma.pipelines.heart_rate_pipeline import signal_quality_classification
ppg_quality_classifier_package_filename = 'ppg_quality_clf_package.pkl'
full_path_to_classifier_package = files('paradigma') / 'assets' / ppg_quality_classifier_package_filename
config = HeartRateConfig()
df_sqa = signal_quality_classification(
df=df_features,
config=config,
full_path_to_classifier_package=full_path_to_classifier_package
)
df_sqa
time | pred_sqa_proba | pred_sqa_acc_label | |
---|---|---|---|
0 | 0.0 | 0.006031 | 1 |
1 | 1.0 | 0.011725 | 1 |
2 | 2.0 | 0.068063 | 1 |
3 | 3.0 | 0.080808 | 1 |
4 | 4.0 | 0.074591 | 1 |
... | ... | ... | ... |
634 | 634.0 | 0.001387 | 1 |
635 | 635.0 | 0.001570 | 1 |
636 | 636.0 | 0.000380 | 1 |
637 | 637.0 | 0.000406 | 1 |
638 | 638.0 | 0.000004 | 1 |
639 rows × 3 columns
Step 4: Heart rate estimation
For heart rate estimation, we extract segments of config.tfd_length
using estimate_heart_rate. We calculate the smoothed-pseudo Wigner-Ville Distribution (SPWVD) to obtain the frequency content of the PPG signal over time. We extract for every timestamp in the SPWVD the frequency with the highest power. For every non-overlapping 2 s window we average the corresponding frequencies to obtain a heart rate per window.
Note: for the test data we set the tfd_length to 10 s instead of the default of 30 s, because the small PPP test data doesn’t have 30 s of consecutive high-quality PPG data.
from paradigma.pipelines.heart_rate_pipeline import estimate_heart_rate
print("The standard default minimal window length for the heart rate extraction is set to", config.tfd_length, "seconds.")
# set the minimal window length for the heart rate extraction to 10 seconds instead of default of 30 seconds.
config.set_tfd_length(10)
df_hr = estimate_heart_rate(
df_sqa=df_sqa,
df_ppg_preprocessed=df_ppg_proc,
config=config
)
df_hr
The standard default minimal window length for the heart rate extraction is set to 30 seconds.
time | heart_rate | |
---|---|---|
0 | 56.0 | 86.404715 |
1 | 58.0 | 86.640472 |
2 | 60.0 | 86.345776 |
3 | 62.0 | 84.872299 |
4 | 64.0 | 84.872299 |
5 | 66.0 | 84.194499 |
Step 5: Heart rate aggregation
The final step is to aggregate all 2 s heart rate estimates using aggregate_heart_rate. In the current example, the mode and 99th percentile are calculated. We hypothesize that the mode gives representation of the resting heart rate while the 99th percentile indicates the maximum heart rate. In Parkinson’s disease, we expect that these two measures could reflect autonomic (dys)functioning. The nr_hr_est
in the metadata indicates based on how many 2 s windows these aggregates are determined.
import pprint
from paradigma.pipelines.heart_rate_pipeline import aggregate_heart_rate
hr_values = df_hr['heart_rate'].values
df_hr_agg = aggregate_heart_rate(
hr_values=hr_values,
aggregates = ['mode', '99p']
)
pprint.pprint(df_hr_agg)
{'hr_aggregates': {'99p_heart_rate': 86.62868369351669,
'mode_heart_rate': 84.8722986247544},
'metadata': {'nr_hr_est': 6}}