# Pulse rate analysis This tutorial shows how to extract pulse rate estimates using photoplethysmography (PPG) data and accelerometer data. The pipeline consists of a stepwise approach to determine signal quality, assessing both PPG morphology and accounting for periodic artifacts using the accelerometer. Based on the signal quality, we extract high-quality segments and estimate the pulse rate for every 2 s using the smoothed pseudo Wigner-Ville Distribution. In this tutorial, we use two days of data from a participant of the Personalized Parkinson Project to demonstrate the functionalities. Since `ParaDigMa` expects contiguous time series, the collected data was stored in two segments each with contiguous timestamps. Per segment, we load the data and perform the following steps: 1. Preprocess the time series data 2. Extract signal quality features 3. Signal quality classification 4. Pulse rate estimation We then combine the output of the different segments for the final step: 5. Pulse rate aggregation ## Load data This pipeline requires accelerometer and PPG data to run. Here, we start by loading a single contiguous time series (segment), for which we continue running steps 1-4. [Below](#multiple_segments_cell) we show how to run these steps for multiple segments. The channel `green` represents the values obtained with PPG using green light. In this example we use the interally developed `TSDF` ([documentation](https://biomarkersparkinson.github.io/tsdf/)) to load and store data [[1](https://arxiv.org/abs/2211.11294)]. However, we are aware that there are other common data formats. For example, the following functions can be used depending on the file extension of the data: - _.csv_: `pandas.read_csv()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)) - _.json_: `json.load()` ([documentation](https://docs.python.org/3/library/json.html#json.load)) ```python from pathlib import Path from paradigma.util import load_tsdf_dataframe # Set the path to where the prepared data is saved and load the data. # Note: the test data is stored in TSDF, but you can load your data in your own way path_to_prepared_data = Path('../../example_data') ppg_prefix = 'ppg' imu_prefix = 'imu' segment_nr = '0001' df_ppg, metadata_time_ppg, metadata_values_ppg = load_tsdf_dataframe( path_to_data=path_to_prepared_data / ppg_prefix, prefix=f'PPG_segment{segment_nr}' ) df_imu, metadata_time_imu, metadata_values_imu = load_tsdf_dataframe( path_to_data=path_to_prepared_data / imu_prefix, prefix=f'IMU_segment{segment_nr}' ) # Drop the gyroscope columns from the IMU data cols_to_drop = df_imu.filter(regex='^gyroscope_').columns df_acc = df_imu.drop(cols_to_drop, axis=1) display(df_ppg, df_acc) ```

	time	green
0	0.00000	262316
1	0.03340	262320
2	0.06680	262446
3	0.10020	262770
4	0.13360	262623
...	...	...
1029370	34339.49720	1049632
1029371	34339.53056	1049632
1029372	34339.56392	1049632
1029373	34339.59728	1049632
1029374	34339.63064	1020788

1029375 rows × 2 columns

	time	accelerometer_x	accelerometer_y	accelerometer_z
0	0.000000	-0.474641	-0.379426	0.770335
1	0.009933	-0.472727	-0.378947	0.765072
2	0.019867	-0.471770	-0.375598	0.766986
3	0.029800	-0.472727	-0.375598	0.770335
4	0.039733	-0.475120	-0.379426	0.772249
...	...	...	...	...
3455326	34339.561333	-0.257895	-0.319139	-0.761244
3455327	34339.571267	-0.555502	-0.153110	-0.671292
3455328	34339.581200	-0.286124	-0.263636	-0.981340
3455329	34339.591133	-0.232536	-0.161722	-0.832536
3455330	34339.601067	0.180383	-0.368421	-1.525837

3455331 rows × 4 columns

## Step 1: Preprocess data The first step after loading the data is preprocessing using the [preprocess_ppg_data](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/preprocessing.py#:~:text=preprocess_ppg_data). This begins by isolating segments containing both PPG and IMU data, discarding portions where one modality (e.g., PPG) extends beyond the other, such as when the PPG recording is longer than the accelerometer data. This functionality requires the starting times (`metadata_time_ppg.start_iso8601` and `metadata_time_imu.start_iso8601`) in iso8601 format as inputs. After this step, the preprocess_ppg_data function resamples the PPG and accelerometer data to uniformly distributed timestamps, addressing the fixed but non-uniform sampling rates of the sensors. After this, a bandpass Butterworth filter (4th-order, bandpass frequencies: 0.4--3.5 Hz) is applied to the PPG signal, while a high-pass Butterworth filter (4th-order, cut-off frequency: 0.2 Hz) is applied to the accelerometer data. Note: the printed shapes are (rows, columns) with each row corresponding to a single data point and each column representing a data column (e.g.time). The number of rows of the overlapping segments of PPG and accelerometer are not the same due to sampling differences (other sensors and possibly other sampling frequencies). ```python from paradigma.config import PPGConfig, IMUConfig from paradigma.preprocessing import preprocess_ppg_data ppg_config = PPGConfig() imu_config = IMUConfig() print(f"Original data shapes:\n- PPG data: {df_ppg.shape}\n- Accelerometer data: {df_imu.shape}") df_ppg_proc, df_acc_proc = preprocess_ppg_data( df_ppg=df_ppg, df_acc=df_acc, ppg_config=ppg_config, imu_config=imu_config, start_time_ppg=metadata_time_ppg.start_iso8601, start_time_imu=metadata_time_imu.start_iso8601 ) print(f"Overlapping preprocessed data shapes:\n- PPG data: {df_ppg_proc.shape}\n- Accelerometer data: {df_acc_proc.shape}") display(df_ppg_proc, df_acc_proc) ``` Original data shapes: - PPG data: (1029375, 2) - Accelerometer data: (3455331, 7) Overlapping preprocessed data shapes: - PPG data: (1030188, 2) - Accelerometer data: (3433961, 4)

	time	green
0	0.000000	-26.315811
1	0.033333	91.335299
2	0.066667	181.603416
3	0.100000	225.760466
4	0.133333	219.937282
...	...	...
1030183	34339.433333	224556.234611
1030184	34339.466667	210075.529517
1030185	34339.500000	163811.629247
1030186	34339.533333	94537.897763
1030187	34339.566667	12915.304284

1030188 rows × 2 columns

	time	accelerometer_x	accelerometer_y	accelerometer_z
0	0.00	-0.002324	-0.001442	-0.002116
1	0.01	-0.000390	-0.000914	-0.007396
2	0.02	0.000567	0.002474	-0.005445
3	0.03	-0.000425	0.002414	-0.002099
4	0.04	-0.002807	-0.001408	-0.000218
...	...	...	...	...
3433956	34339.56	-0.402941	0.038710	0.461449
3433957	34339.57	-0.659832	0.098696	0.817136
3433958	34339.58	-0.464138	0.033607	0.471552
3433959	34339.59	-0.389065	0.108485	0.622471
3433960	34339.60	-0.082625	-0.014490	0.119875

3433961 rows × 4 columns

## Step 2: Extract signal quality features The preprocessed data (PPG & accelerometer) is windowed into overlapping windows of length `ppg_config.window_length_s` with a window step of `ppg_config.window_step_length_s`. From the PPG windows 10 time- and frequency domain features are extracted to assess PPG morphology and from the accelerometer windows one relative power feature is calculated to assess periodic motion artifacts. The detailed steps are encapsulated in `extract_signal_quality_features` (documentation can be found [here](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/pipelines/pulse_rate_pipeline.py#:~:text=extract_signal_quality_features)). ```python from paradigma.config import PulseRateConfig from paradigma.pipelines.pulse_rate_pipeline import extract_signal_quality_features ppg_config = PulseRateConfig('ppg') acc_config = PulseRateConfig('imu') print("The default window length for the signal quality feature extraction is set to", ppg_config.window_length_s, "seconds.") print("The default step size for the signal quality feature extraction is set to", ppg_config.window_step_length_s, "seconds.") df_features = extract_signal_quality_features( df_ppg=df_ppg_proc, df_acc=df_acc_proc, ppg_config=ppg_config, acc_config=acc_config, ) df_features ``` The default window length for the signal quality feature extraction is set to 6 seconds. The default step size for the signal quality feature extraction is set to 1 seconds.

	time	var	mean	median	kurtosis	skewness	signal_to_noise	auto_corr	f_dom	rel_power	spectral_entropy	acc_power_ratio
0	0.0	1.145652e+05	282.401234	238.829637	2.170853	0.107401	3.320049	0.544165	0.585938	0.138454	0.516336	0.026409
1	1.0	1.102401e+05	271.582177	236.891936	2.251393	-0.029309	3.041878	0.491829	0.585938	0.160433	0.511626	0.023402
2	2.0	1.061479e+05	262.348604	225.915756	2.415221	0.216631	2.818552	0.469092	0.585938	0.167007	0.525025	0.028592
3	3.0	9.514719e+04	245.089445	203.417715	2.481465	0.110420	2.677071	0.415071	0.585938	0.170626	0.550495	0.019296
4	4.0	7.393010e+04	218.379138	187.583266	2.405921	0.084566	2.796140	0.338369	0.585938	0.121113	0.595214	0.020083
...	...	...	...	...	...	...	...	...	...	...	...	...
34329	34329.0	8.176078e+06	1613.021494	438.201240	6.122772	-1.792336	1.378694	0.104389	0.351562	0.046616	0.356027	0.110219
34330	34330.0	3.512188e+07	3307.888927	1069.775894	8.160698	1.746472	1.442643	0.142226	0.351562	0.049424	0.371163	0.178742
34331	34331.0	1.181350e+08	6648.535487	2743.478312	5.654373	0.018587	1.558314	0.136803	0.351562	0.048211	0.366386	0.153351
34332	34332.0	1.252829e+09	20165.525309	6452.244225	6.805051	-1.222184	1.310088	0.666123	0.351562	0.037812	0.359105	0.154910
34333	34333.0	1.008217e+10	42271.328020	12552.656437	23.756877	4.167326	1.212179	0.044647	0.585938	0.113283	0.632749	0.093221

34334 rows × 12 columns

## Step 3: Signal quality classification A trained logistic classifier is used to predict PPG signal quality and returns the `pred_sqa_proba`, which is the posterior probability of a PPG window to look like the typical PPG morphology (higher probability indicates toward the typical PPG morphology). The relative power feature from the accelerometer is compared to a threshold for periodic artifacts and therefore `pred_sqa_acc_label` is used to return a label indicating predicted periodic motion artifacts (label 0) or no periodic motion artifacts (label 1). The classification step is implemented in `signal_quality_classification` (documentation can be found [here](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/pipelines/pulse_rate_pipeline.py#:~:text=signal_quality_classification)). ```python from importlib.resources import files from paradigma.pipelines.pulse_rate_pipeline import signal_quality_classification ppg_quality_classifier_package_filename = 'ppg_quality_clf_package.pkl' full_path_to_classifier_package = files('paradigma') / 'assets' / ppg_quality_classifier_package_filename config = PulseRateConfig() df_sqa = signal_quality_classification( df=df_features, config=config, full_path_to_classifier_package=full_path_to_classifier_package ) df_sqa ```

	time	pred_sqa_proba	pred_sqa_acc_label
0	0.0	1.121315e-02	1
1	1.0	7.126135e-03	1
2	2.0	7.017555e-03	1
3	3.0	4.134224e-03	1
4	4.0	9.195340e-04	1
...	...	...	...
34329	34329.0	1.782669e-08	0
34330	34330.0	2.078262e-06	0
34331	34331.0	1.190223e-07	0
34332	34332.0	1.383614e-08	0
34333	34333.0	7.587516e-07	1

34334 rows × 3 columns

#### Store as TSDF The predicted probabilities (and optionally other features) can be stored and loaded in TSDF as demonstrated below. ```python import tsdf from paradigma.util import write_df_data # Set 'path_to_data' to the directory where you want to save the data metadata_time_store = tsdf.TSDFMetadata(metadata_time_ppg.get_plain_tsdf_dict_copy(), path_to_prepared_data) metadata_values_store = tsdf.TSDFMetadata(metadata_values_ppg.get_plain_tsdf_dict_copy(), path_to_prepared_data) # Select the columns to be saved metadata_time_store.channels = ['time'] metadata_values_store.channels = ['pred_sqa_proba', 'pred_sqa_acc_label'] # Set the units metadata_time_store.units = ['Relative seconds'] metadata_values_store.units = ['Unitless', 'Unitless'] metadata_time_store.data_type = float metadata_values_store.data_type = float # Set the filenames meta_store_filename = f'segment{segment_nr}_meta.json' values_store_filename = meta_store_filename.replace('_meta.json', '_values.bin') time_store_filename = meta_store_filename.replace('_meta.json', '_time.bin') metadata_values_store.file_name = values_store_filename metadata_time_store.file_name = time_store_filename write_df_data(metadata_time_store, metadata_values_store, path_to_prepared_data, meta_store_filename, df_sqa) ``` ```python df_sqa, _, _ = load_tsdf_dataframe(path_to_prepared_data, prefix=f'segment{segment_nr}') df_sqa.head() ```

	time	pred_sqa_proba	pred_sqa_acc_label
0	0.0	0.011213	1.0
1	1.0	0.007126	1.0
2	2.0	0.007018	1.0
3	3.0	0.004134	1.0
4	4.0	0.000920	1.0

## Step 4: Pulse rate estimation For pulse rate estimation, we extract segments of `config.tfd_length` using [estimate_pulse_rate](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/pipelines/pulse_rate_pipeline.py#:~:text=estimate_pulse_rate). We calculate the smoothed-pseudo Wigner-Ville Distribution (SPWVD) to obtain the frequency content of the PPG signal over time. We extract for every timestamp in the SPWVD the frequency with the highest power. For every non-overlapping 2 s window we average the corresponding frequencies to obtain a pulse rate per window. Note: for the test data we set the tfd_length to 10 s instead of the default of 30 s, because the small PPP test data doesn't have 30 s of consecutive high-quality PPG data. ```python from paradigma.pipelines.pulse_rate_pipeline import estimate_pulse_rate print("The standard default minimal window length for the pulse rate extraction is set to", config.tfd_length, "seconds.") df_pr = estimate_pulse_rate( df_sqa=df_sqa, df_ppg_preprocessed=df_ppg_proc, config=config ) df_pr ``` The standard default minimal window length for the pulse rate extraction is set to 30 seconds.

	time	pulse_rate
0	47.0	80.372915
1	49.0	79.769382
2	51.0	79.136408
3	53.0	78.606477
4	55.0	77.870461
...	...	...
801	32876.0	78.220133
802	32878.0	78.047301
803	32880.0	78.047301
804	32882.0	78.238326
805	32884.0	78.556701

806 rows × 2 columns

### Run steps 1 - 4 for multiple segments If your data is also stored in multiple segments, you can modify `segments` in the cell below to a list of the filenames of your respective segmented data. ```python import pandas as pd from pathlib import Path from importlib.resources import files from paradigma.util import load_tsdf_dataframe from paradigma.config import PPGConfig, IMUConfig, PulseRateConfig from paradigma.preprocessing import preprocess_ppg_data from paradigma.pipelines.pulse_rate_pipeline import extract_signal_quality_features, signal_quality_classification, estimate_pulse_rate # Set the path to where the prepared data is saved path_to_prepared_data = Path('../../example_data') ppg_prefix = 'ppg' imu_prefix = 'imu' # Set the path to the classifier package ppg_quality_classifier_package_filename = 'ppg_quality_clf_package.pkl' full_path_to_classifier_package = files('paradigma') / 'assets' / ppg_quality_classifier_package_filename # Create a list of dataframes to store the estimated pulse rates of all segments list_df_pr = [] segments = ['0001', '0002'] # list with all available segments for segment_nr in segments: # Load the data df_ppg, metadata_time_ppg, _ = load_tsdf_dataframe( path_to_data=path_to_prepared_data / ppg_prefix, prefix=f'PPG_segment{segment_nr}' ) df_imu, metadata_time_imu, _ = load_tsdf_dataframe( path_to_data=path_to_prepared_data / imu_prefix, prefix=f'IMU_segment{segment_nr}' ) # Drop the gyroscope columns from the IMU data cols_to_drop = df_imu.filter(regex='^gyroscope_').columns df_acc = df_imu.drop(cols_to_drop, axis=1) # 1: Preprocess the data ppg_config = PPGConfig() imu_config = IMUConfig() df_ppg_proc, df_acc_proc = preprocess_ppg_data( df_ppg=df_ppg, df_acc=df_acc, ppg_config=ppg_config, imu_config=imu_config, start_time_ppg=metadata_time_ppg.start_iso8601, start_time_imu=metadata_time_imu.start_iso8601 ) # 2: Extract signal quality features ppg_config = PulseRateConfig('ppg') acc_config = PulseRateConfig('imu') df_features = extract_signal_quality_features( df_ppg=df_ppg_proc, df_acc=df_acc_proc, ppg_config=ppg_config, acc_config=acc_config, ) # 3: Signal quality classification config = PulseRateConfig() df_sqa = signal_quality_classification( df=df_features, config=config, full_path_to_classifier_package=full_path_to_classifier_package ) # 4: Estimate pulse rate df_pr = estimate_pulse_rate( df_sqa=df_sqa, df_ppg_preprocessed=df_ppg_proc, config=config ) # Add the hr estimations of the current segment to the list df_pr['segment_nr'] = segment_nr list_df_pr.append(df_pr) df_hr = pd.concat(list_df_pr, ignore_index=True) ``` ## Step 5: Pulse rate aggregation The final step is to aggregate all 2 s pulse rate estimates using [aggregate_pulse_rate](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/pipelines/pulse_rate_pipeline.py#:~:text=aggregate_pulse_rate). In the current example, the mode and 99th percentile are calculated. We hypothesize that the mode gives representation of the resting pulse rate while the 99th percentile indicates the maximum pulse rate. In Parkinson's disease, we expect that these two measures could reflect autonomic (dys)functioning. The `nr_pr_est` in the metadata indicates based on how many 2 s windows these aggregates are determined. ```python import pprint from paradigma.pipelines.pulse_rate_pipeline import aggregate_pulse_rate pr_values = df_pr['pulse_rate'].values df_pr_agg = aggregate_pulse_rate( pr_values=pr_values, aggregates = ['mode', '99p'] ) pprint.pprint(df_pr_agg) ``` {'metadata': {'nr_pr_est': 806}, 'pr_aggregates': {'99p_pulse_rate': 87.65865011636926, 'mode_pulse_rate': 81.25613346418058}}