Heart rate analysis

This tutorial shows how to extract heart rate estimates using photoplethysmography (PPG) data and accelerometer data. The pipeline consists of a stepwise approach to determine signal quality, assessing both PPG morphology and accounting for periodic artifacts using the accelerometer. Based on the signal quality, we extract high-quality segments and estimate the heart rate for every 2 s using the smoothed pseudo Wigner-Ville Distribution.

In this tutorial, we use two days of data from a participant of the Personalized Parkinson Project to demonstrate the functionalities. Since ParaDigMa expects contiguous time series, the collected data was stored in two segments each with contiguous timestamps. Per segment, we load the data and perform the following steps:

  1. Preprocess the time series data

  2. Extract signal quality features

  3. Signal quality classification

  4. Heart rate estimation

We then combine the output of the different segments for the final step:

  1. Heart rate aggregation

Load data

This pipeline requires accelerometer and PPG data to run. Here, we start by loading a single contiguous time series (segment), for which we continue running steps 1-4. Below we show how to run these steps for multiple segments. The channel green represents the values obtained with PPG using green light.

In this example we use the interally developed TSDF (documentation) to load and store data [1]. However, we are aware that there are other common data formats. For example, the following functions can be used depending on the file extension of the data:

from pathlib import Path
from paradigma.util import load_tsdf_dataframe

# Set the path to where the prepared data is saved and load the data.
# Note: the test data is stored in TSDF, but you can load your data in your own way
path_to_prepared_data =  Path('../../example_data')

ppg_prefix = 'ppg'
imu_prefix = 'imu'

segment_nr = '0001' 

df_ppg, metadata_time_ppg, metadata_values_ppg = load_tsdf_dataframe(
    path_to_data=path_to_prepared_data / ppg_prefix, 
    prefix=f'PPG_segment{segment_nr}'
)
df_imu, metadata_time_imu, metadata_values_imu = load_tsdf_dataframe(
    path_to_data=path_to_prepared_data / imu_prefix, 
    prefix=f'IMU_segment{segment_nr}'
)

# Drop the gyroscope columns from the IMU data
cols_to_drop = df_imu.filter(regex='^gyroscope_').columns
df_acc = df_imu.drop(cols_to_drop, axis=1)

display(df_ppg, df_acc)
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Cell In[1], line 13
      9 imu_prefix = 'imu'
     11 segment_nr = '0001' 
---> 13 df_ppg, metadata_time_ppg, metadata_values_ppg = load_tsdf_dataframe(
     14     path_to_data=path_to_prepared_data / ppg_prefix, 
     15     prefix=f'PPG_segment{segment_nr}'
     16 )
     17 df_imu, metadata_time_imu, metadata_values_imu = load_tsdf_dataframe(
     18     path_to_data=path_to_prepared_data / imu_prefix, 
     19     prefix=f'IMU_segment{segment_nr}'
     20 )
     22 # Drop the gyroscope columns from the IMU data

File ~/work/paradigma/paradigma/src/paradigma/util.py:134, in load_tsdf_dataframe(path_to_data, prefix, meta_suffix, time_suffix, values_suffix)
    131 time_filename = f"{prefix}_{time_suffix}"
    132 values_filename = f"{prefix}_{values_suffix}"
--> 134 metadata_time, metadata_values = read_metadata(path_to_data, meta_filename, time_filename, values_filename)
    135 df = tsdf.load_dataframe_from_binaries([metadata_time, metadata_values], tsdf.constants.ConcatenationType.columns)
    137 return df, metadata_time, metadata_values

File ~/work/paradigma/paradigma/src/paradigma/util.py:121, in read_metadata(input_path, meta_filename, time_filename, values_filename)
    118 def read_metadata(
    119     input_path: str, meta_filename: str, time_filename: str, values_filename: str
    120 ) -> Tuple[TSDFMetadata, TSDFMetadata]:
--> 121     metadata_dict = tsdf.load_metadata_from_path(
    122         os.path.join(input_path, meta_filename)
    123     )
    124     metadata_time = metadata_dict[time_filename]
    125     metadata_values = metadata_dict[values_filename]

File ~/.cache/pypoetry/virtualenvs/paradigma-1HID61PK-py3.12/lib/python3.12/site-packages/tsdf/read_tsdf.py:84, in load_metadata_from_path(path)
     82 # The data is isomorphic to a JSON
     83 with open(path, "r") as file:
---> 84     data = json.load(file)
     86 abs_path = os.path.realpath(path)
     87 # Parse the data and verify that it complies with TSDF requirements

File /usr/lib/python3.12/json/__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
    275         parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    276     """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    277     a JSON document) to a Python object.
    278 
   (...)    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
    296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

File /usr/lib/python3.12/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File /usr/lib/python3.12/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File /usr/lib/python3.12/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Step 1: Preprocess data

The first step after loading the data is preprocessing using the preprocess_ppg_data. This begins by isolating segments containing both PPG and IMU data, discarding portions where one modality (e.g., PPG) extends beyond the other, such as when the PPG recording is longer than the accelerometer data. This functionality requires the starting times (metadata_time_ppg.start_iso8601 and metadata_time_imu.start_iso8601) in iso8601 format as inputs. After this step, the preprocess_ppg_data function resamples the PPG and accelerometer data to uniformly distributed timestamps, addressing the fixed but non-uniform sampling rates of the sensors. After this, a bandpass Butterworth filter (4th-order, bandpass frequencies: 0.4–3.5 Hz) is applied to the PPG signal, while a high-pass Butterworth filter (4th-order, cut-off frequency: 0.2 Hz) is applied to the accelerometer data.

Note: the printed shapes are (rows, columns) with each row corresponding to a single data point and each column representing a data column (e.g.time). The number of rows of the overlapping segments of PPG and accelerometer are not the same due to sampling differences (other sensors and possibly other sampling frequencies).

from paradigma.config import PPGConfig, IMUConfig
from paradigma.preprocessing import preprocess_ppg_data

ppg_config = PPGConfig()
imu_config = IMUConfig()

print(f"Original data shapes:\n- PPG data: {df_ppg.shape}\n- Accelerometer data: {df_imu.shape}")
df_ppg_proc, df_acc_proc = preprocess_ppg_data(
    df_ppg=df_ppg, 
    df_acc=df_acc, 
    ppg_config=ppg_config, 
    imu_config=imu_config, 
    start_time_ppg=metadata_time_ppg.start_iso8601,
    start_time_imu=metadata_time_imu.start_iso8601
)

print(f"Overlapping preprocessed data shapes:\n- PPG data: {df_ppg_proc.shape}\n- Accelerometer data: {df_acc_proc.shape}")
display(df_ppg_proc, df_acc_proc)
Original data shapes:
- PPG data: (1029375, 2)
- Accelerometer data: (3455331, 7)
Overlapping preprocessed data shapes:
- PPG data: (1030188, 2)
- Accelerometer data: (3433961, 4)
time green
0 0.000000 -26.315811
1 0.033333 91.335299
2 0.066667 181.603416
3 0.100000 225.760466
4 0.133333 219.937282
... ... ...
1030183 34339.433333 224556.234611
1030184 34339.466667 210075.529517
1030185 34339.500000 163811.629247
1030186 34339.533333 94537.897763
1030187 34339.566667 12915.304284

1030188 rows × 2 columns

time accelerometer_x accelerometer_y accelerometer_z
0 0.00 -0.002324 -0.001442 -0.002116
1 0.01 -0.000390 -0.000914 -0.007396
2 0.02 0.000567 0.002474 -0.005445
3 0.03 -0.000425 0.002414 -0.002099
4 0.04 -0.002807 -0.001408 -0.000218
... ... ... ... ...
3433956 34339.56 -0.402941 0.038710 0.461449
3433957 34339.57 -0.659832 0.098696 0.817136
3433958 34339.58 -0.464138 0.033607 0.471552
3433959 34339.59 -0.389065 0.108485 0.622471
3433960 34339.60 -0.082625 -0.014490 0.119875

3433961 rows × 4 columns

Step 2: Extract signal quality features

The preprocessed data (PPG & accelerometer) is windowed into overlapping windows of length ppg_config.window_length_s with a window step of ppg_config.window_step_length_s. From the PPG windows 10 time- and frequency domain features are extracted to assess PPG morphology and from the accelerometer windows one relative power feature is calculated to assess periodic motion artifacts.

The detailed steps are encapsulated in extract_signal_quality_features (documentation can be found here).

from paradigma.config import HeartRateConfig
from paradigma.pipelines.heart_rate_pipeline import extract_signal_quality_features

ppg_config = HeartRateConfig('ppg')
acc_config = HeartRateConfig('imu')

print("The default window length for the signal quality feature extraction is set to", ppg_config.window_length_s, "seconds.")
print("The default step size for the signal quality feature extraction is set to", ppg_config.window_step_length_s, "seconds.")

df_features = extract_signal_quality_features(
    df_ppg=df_ppg_proc,
    df_acc=df_acc_proc,
    ppg_config=ppg_config, 
    acc_config=acc_config, 
)

df_features
The default window length for the signal quality feature extraction is set to 6 seconds.
The default step size for the signal quality feature extraction is set to 1 seconds.
time var mean median kurtosis skewness signal_to_noise auto_corr f_dom rel_power spectral_entropy acc_power_ratio
0 0.0 1.145652e+05 282.401234 238.829637 2.170853 0.107401 3.320049 0.544165 0.585938 0.138454 0.516336 0.026409
1 1.0 1.102401e+05 271.582177 236.891936 2.251393 -0.029309 3.041878 0.491829 0.585938 0.160433 0.511626 0.023402
2 2.0 1.061479e+05 262.348604 225.915756 2.415221 0.216631 2.818552 0.469092 0.585938 0.167007 0.525025 0.028592
3 3.0 9.514719e+04 245.089445 203.417715 2.481465 0.110420 2.677071 0.415071 0.585938 0.170626 0.550495 0.019296
4 4.0 7.393010e+04 218.379138 187.583267 2.405921 0.084566 2.796140 0.338369 0.585938 0.121113 0.595214 0.020083
... ... ... ... ... ... ... ... ... ... ... ... ...
34329 34329.0 8.176078e+06 1613.021494 438.201240 6.122772 -1.792336 1.378694 0.104389 0.351562 0.046616 0.356027 0.110219
34330 34330.0 3.512188e+07 3307.888927 1069.775894 8.160698 1.746472 1.442643 0.142226 0.351562 0.049424 0.371163 0.178742
34331 34331.0 1.181350e+08 6648.535487 2743.478312 5.654373 0.018587 1.558314 0.136803 0.351562 0.048211 0.366386 0.153351
34332 34332.0 1.252829e+09 20165.525309 6452.244225 6.805051 -1.222184 1.310088 0.666123 0.351562 0.037812 0.359105 0.154910
34333 34333.0 1.008217e+10 42271.328020 12552.656437 23.756877 4.167326 1.212179 0.044647 0.585938 0.113283 0.632749 0.093221

34334 rows × 12 columns

Step 3: Signal quality classification

A trained logistic classifier is used to predict PPG signal quality and returns the pred_sqa_proba, which is the posterior probability of a PPG window to look like the typical PPG morphology (higher probability indicates toward the typical PPG morphology). The relative power feature from the accelerometer is compared to a threshold for periodic artifacts and therefore pred_sqa_acc_label is used to return a label indicating predicted periodic motion artifacts (label 0) or no periodic motion artifacts (label 1).

The classification step is implemented in signal_quality_classification (documentation can be found here).

from importlib.resources import files
from paradigma.pipelines.heart_rate_pipeline import signal_quality_classification

ppg_quality_classifier_package_filename = 'ppg_quality_clf_package.pkl'
full_path_to_classifier_package = files('paradigma') / 'assets' / ppg_quality_classifier_package_filename

config = HeartRateConfig()

df_sqa = signal_quality_classification(
    df=df_features, 
    config=config, 
    full_path_to_classifier_package=full_path_to_classifier_package
)

df_sqa
time pred_sqa_proba pred_sqa_acc_label
0 0.0 1.121315e-02 1
1 1.0 7.126135e-03 1
2 2.0 7.017555e-03 1
3 3.0 4.134224e-03 1
4 4.0 9.195340e-04 1
... ... ... ...
34329 34329.0 1.782669e-08 1
34330 34330.0 2.078262e-06 0
34331 34331.0 1.190223e-07 0
34332 34332.0 1.383614e-08 0
34333 34333.0 7.587516e-07 1

34334 rows × 3 columns

Store as TSDF

The predicted probabilities (and optionally other features) can be stored and loaded in TSDF as demonstrated below.

import tsdf
from paradigma.util import write_df_data

# Set 'path_to_data' to the directory where you want to save the data
metadata_time_store = tsdf.TSDFMetadata(metadata_time_ppg.get_plain_tsdf_dict_copy(), path_to_prepared_data)
metadata_values_store = tsdf.TSDFMetadata(metadata_values_ppg.get_plain_tsdf_dict_copy(), path_to_prepared_data)

# Select the columns to be saved 
metadata_time_store.channels = ['time']
metadata_values_store.channels = ['pred_sqa_proba', 'pred_sqa_acc_label']

# Set the units
metadata_time_store.units = ['Relative seconds']
metadata_values_store.units = ['Unitless', 'Unitless']
metadata_time_store.data_type = float
metadata_values_store.data_type = float

# Set the filenames
meta_store_filename = f'segment{segment_nr}_meta.json'
values_store_filename = meta_store_filename.replace('_meta.json', '_values.bin')
time_store_filename = meta_store_filename.replace('_meta.json', '_time.bin')

metadata_values_store.file_name = values_store_filename
metadata_time_store.file_name = time_store_filename

write_df_data(metadata_time_store, metadata_values_store, path_to_prepared_data, meta_store_filename, df_sqa)
df_sqa, _, _ = load_tsdf_dataframe(path_to_prepared_data, prefix=f'segment{segment_nr}')
df_sqa.head()
time pred_sqa_proba pred_sqa_acc_label
0 0.0 0.011213 1.0
1 1.0 0.007126 1.0
2 2.0 0.007018 1.0
3 3.0 0.004134 1.0
4 4.0 0.000920 1.0

Step 4: Heart rate estimation

For heart rate estimation, we extract segments of config.tfd_length using estimate_heart_rate. We calculate the smoothed-pseudo Wigner-Ville Distribution (SPWVD) to obtain the frequency content of the PPG signal over time. We extract for every timestamp in the SPWVD the frequency with the highest power. For every non-overlapping 2 s window we average the corresponding frequencies to obtain a heart rate per window.

Note: for the test data we set the tfd_length to 10 s instead of the default of 30 s, because the small PPP test data doesn’t have 30 s of consecutive high-quality PPG data.

from paradigma.pipelines.heart_rate_pipeline import estimate_heart_rate

print("The standard default minimal window length for the heart rate extraction is set to", config.tfd_length, "seconds.")
# set the minimal window length for the heart rate extraction to 10 seconds instead of default of 30 seconds.
#config.set_tfd_length(10)       

df_hr = estimate_heart_rate(
    df_sqa=df_sqa, 
    df_ppg_preprocessed=df_ppg_proc, 
    config=config
)

df_hr
The standard default minimal window length for the heart rate extraction is set to 30 seconds.
time heart_rate
0 47.0 80.372915
1 49.0 79.769382
2 51.0 79.136408
3 53.0 78.606477
4 55.0 77.870461
... ... ...
825 32876.0 78.220133
826 32878.0 78.047301
827 32880.0 78.047301
828 32882.0 78.238326
829 32884.0 78.556701

830 rows × 2 columns

Run steps 1 - 4 for multiple segments

If your data is also stored in multiple segments, you can modify segments in the cell below to a list of the filenames of your respective segmented data.

import pandas as pd
from pathlib import Path
from importlib.resources import files

from paradigma.util import load_tsdf_dataframe
from paradigma.config import PPGConfig, IMUConfig, HeartRateConfig
from paradigma.preprocessing import preprocess_ppg_data
from paradigma.pipelines.heart_rate_pipeline import extract_signal_quality_features, signal_quality_classification, estimate_heart_rate

# Set the path to where the prepared data is saved
path_to_prepared_data =  Path('../../example_data')

ppg_prefix = 'ppg'
imu_prefix = 'imu'

# Set the path to the classifier package
ppg_quality_classifier_package_filename = 'ppg_quality_clf_package.pkl'
full_path_to_classifier_package = files('paradigma') / 'assets' / ppg_quality_classifier_package_filename

# Create a list of dataframes to store the estimated heart rates of all segments
list_df_hr = []

segments = ['0001', '0002'] # list with all available segments

for segment_nr in segments:
    
    # Load the data
    df_ppg, metadata_time_ppg, _ = load_tsdf_dataframe(
        path_to_data=path_to_prepared_data / ppg_prefix, 
        prefix=f'PPG_segment{segment_nr}'
    )
    df_imu, metadata_time_imu, _ = load_tsdf_dataframe(
        path_to_data=path_to_prepared_data / imu_prefix, 
        prefix=f'IMU_segment{segment_nr}'   
    )

    # Drop the gyroscope columns from the IMU data
    cols_to_drop = df_imu.filter(regex='^gyroscope_').columns
    df_acc = df_imu.drop(cols_to_drop, axis=1)

    # 1: Preprocess the data

    ppg_config = PPGConfig()
    imu_config = IMUConfig()

    df_ppg_proc, df_acc_proc = preprocess_ppg_data(
        df_ppg=df_ppg, 
        df_acc=df_acc, 
        ppg_config=ppg_config, 
        imu_config=imu_config, 
        start_time_ppg=metadata_time_ppg.start_iso8601,
        start_time_imu=metadata_time_imu.start_iso8601
    )

    # 2: Extract signal quality features
    ppg_config = HeartRateConfig('ppg')
    acc_config = HeartRateConfig('imu')

    df_features = extract_signal_quality_features(
        df_ppg=df_ppg_proc,
        df_acc=df_acc_proc,
        ppg_config=ppg_config, 
        acc_config=acc_config, 
    )
    
    # 3: Signal quality classification
    config = HeartRateConfig()

    df_sqa = signal_quality_classification(
        df=df_features, 
        config=config, 
        full_path_to_classifier_package=full_path_to_classifier_package
    )

    # 4: Estimate heart rate
    df_hr = estimate_heart_rate(
        df_sqa=df_sqa, 
        df_ppg_preprocessed=df_ppg_proc, 
        config=config
    )

    # Add the hr estimations of the current segment to the list
    df_hr['segment_nr'] = segment_nr
    list_df_hr.append(df_hr)

df_hr = pd.concat(list_df_hr, ignore_index=True)

Step 5: Heart rate aggregation

The final step is to aggregate all 2 s heart rate estimates using aggregate_heart_rate. In the current example, the mode and 99th percentile are calculated. We hypothesize that the mode gives representation of the resting heart rate while the 99th percentile indicates the maximum heart rate. In Parkinson’s disease, we expect that these two measures could reflect autonomic (dys)functioning. The nr_hr_est in the metadata indicates based on how many 2 s windows these aggregates are determined.

import pprint
from paradigma.pipelines.heart_rate_pipeline import aggregate_heart_rate

hr_values = df_hr['heart_rate'].values
df_hr_agg = aggregate_heart_rate(
    hr_values=hr_values, 
    aggregates = ['mode', '99p']
)

pprint.pprint(df_hr_agg)
{'hr_aggregates': {'99p_heart_rate': 85.77263444520081,
                   'mode_heart_rate': 63.59175662414131},
 'metadata': {'nr_hr_est': 8684}}