Data preparation

ParaDigMa requires the sensor data to be of a specific format. This tutorial provides examples of how to prepare your input data for subsequent analysis. In the end, the input for ParaDigMa is a dataframe consisting of:

  • A column time, representing the seconds relative to the first row of the dataframe;

  • One or multiple of the following sensor column categories:

    • Accelerometer: accelerometer_x, accelerometer_y and accelerometer_z in g

    • Gyroscope: gyroscope_x, gyroscope_y and gyroscope_z in deg/s

    • PPG: green

The final dataframe should be resampled to 100 Hz, have the correct units for the sensor columns, and the correct format for the time column. Also note that the gait pipeline expects a specific orientation of sensor axes, as explained in Coordinate system.

Load data

This example uses data of the Personalized Parkinson Project, which is stored in Time Series Data Format (TSDF). Inertial Measurements Units (IMU) and photoplethysmography (PPG) data are sampled at a different sampling frequency and therefore stored separately. Note that ParaDigMa works independent of data storage format; it only requires a pandas dataframe as input.

from pathlib import Path
from paradigma.util import load_tsdf_dataframe

path_to_raw_data = Path('../../tests/data/data_preparation_tutorial')
path_to_imu_data = path_to_raw_data / 'imu'

df_imu, imu_time, imu_values = load_tsdf_dataframe(
    path_to_data=path_to_imu_data, 
    prefix='IMU'
)

df_imu.head()
time acceleration_x acceleration_y acceleration_z rotation_x rotation_y rotation_z
0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 -32.012195 26.097561
1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 -34.634146 24.695122
2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 -36.768293 22.926829
3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 -38.719512 21.158537
4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 -41.280488 20.304878
import os
from paradigma.util import load_tsdf_dataframe

path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')

df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(
    path_to_data=path_to_ppg_data, 
    prefix='PPG'
)

df_ppg.head()
time green
0 0.000000 649511
1 9.959961 648214
2 9.959961 646786
3 9.959961 645334
4 9.960205 644317

The timestamps in this dataset correspond to delta milliseconds, and the data is not uniformly distributed as can be observed.

Prepare dataframe

Change column names

To safeguard robustness of the pipeline, ParaDigMa fixes column names to a predefined standard.

from paradigma.constants import DataColumns

accelerometer_columns = [DataColumns.ACCELEROMETER_X, DataColumns.ACCELEROMETER_Y, DataColumns.ACCELEROMETER_Z]
gyroscope_columns = [DataColumns.GYROSCOPE_X, DataColumns.GYROSCOPE_Y, DataColumns.GYROSCOPE_Z]

# Rename dataframe columns
df_imu = df_imu.rename(columns={
    'time': DataColumns.TIME,
    'acceleration_x': DataColumns.ACCELEROMETER_X,
    'acceleration_y': DataColumns.ACCELEROMETER_Y,
    'acceleration_z': DataColumns.ACCELEROMETER_Z,
    'rotation_x': DataColumns.GYROSCOPE_X,
    'rotation_y': DataColumns.GYROSCOPE_Y,
    'rotation_z': DataColumns.GYROSCOPE_Z,
})

# Set columns to a fixed order
df_imu = df_imu[[DataColumns.TIME] + accelerometer_columns + gyroscope_columns]

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 -32.012195 26.097561
1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 -34.634146 24.695122
2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 -36.768293 22.926829
3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 -38.719512 21.158537
4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 -41.280488 20.304878
from paradigma.constants import DataColumns

ppg_columns = [DataColumns.PPG]

# Rename dataframe columns
df_ppg = df_ppg.rename(columns={
    'time': DataColumns.TIME,
    'ppg': DataColumns.PPG,
})

# Set columns to a fixed order
df_ppg = df_ppg[[DataColumns.TIME] + ppg_columns]

df_ppg.head()
time green
0 0.000000 649511
1 9.959961 648214
2 9.959961 646786
3 9.959961 645334
4 9.960205 644317

Change units

ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities.

from paradigma.util import convert_units_accelerometer, convert_units_gyroscope

accelerometer_units = 'm/s^2'
gyroscope_units = 'deg/s'

accelerometer_data = df_imu[accelerometer_columns].values
gyroscope_data = df_imu[gyroscope_columns].values

# Convert units to expected format
df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)
df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.000000 -0.550718 0.574163 -0.273684 -115.670732 -32.012195 26.097561
1 10.040039 -0.535885 0.623445 -0.254545 -110.609757 -34.634146 24.695122
2 10.040039 -0.504306 0.651675 -0.251675 -103.231708 -36.768293 22.926829
3 10.040039 -0.488517 0.686603 -0.265550 -96.280488 -38.719512 21.158537
4 10.039795 -0.494258 0.725359 -0.278469 -92.560976 -41.280488 20.304878

Account for watch side

For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read Coordinate System and set the axes of the data accordingly.

# Change the orientation of the sensor according to the documented coordinate system
df_imu[DataColumns.ACCELEROMETER_Y] *= -1
df_imu[DataColumns.ACCELEROMETER_Z] *= -1
df_imu[DataColumns.GYROSCOPE_Y] *= -1
df_imu[DataColumns.GYROSCOPE_Z] *= -1

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.000000 -0.550718 -0.574163 0.273684 -115.670732 32.012195 -26.097561
1 10.040039 -0.535885 -0.623445 0.254545 -110.609757 34.634146 -24.695122
2 10.040039 -0.504306 -0.651675 0.251675 -103.231708 36.768293 -22.926829
3 10.040039 -0.488517 -0.686603 0.265550 -96.280488 38.719512 -21.158537
4 10.039795 -0.494258 -0.725359 0.278469 -92.560976 41.280488 -20.304878

Change time column

ParaDigMa expects the data to be in seconds relative to the first row, which should be equal to 0. The toolbox has the built-in function transform_time_array to help users transform their time column to the correct format if the timestamps have been sampled in delta time between timestamps. In the near future, the functionalities for transforming other types (e.g., datetime format) shall be provided.

from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_imu[DataColumns.TIME] = transform_time_array(
    time_array=df_imu[DataColumns.TIME], 
    input_unit_type=TimeUnit.DIFFERENCE_MS, 
    output_unit_type=TimeUnit.RELATIVE_S,
)

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.00000 -0.550718 -0.574163 0.273684 -115.670732 32.012195 -26.097561
1 0.01004 -0.535885 -0.623445 0.254545 -110.609757 34.634146 -24.695122
2 0.02008 -0.504306 -0.651675 0.251675 -103.231708 36.768293 -22.926829
3 0.03012 -0.488517 -0.686603 0.265550 -96.280488 38.719512 -21.158537
4 0.04016 -0.494258 -0.725359 0.278469 -92.560976 41.280488 -20.304878
from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_ppg[DataColumns.TIME] = transform_time_array(
    time_array=df_ppg[DataColumns.TIME], 
    input_unit_type=TimeUnit.DIFFERENCE_MS, 
    output_unit_type=TimeUnit.RELATIVE_S,
)

df_ppg.head()
time green
0 0.00000 649511
1 0.00996 648214
2 0.01992 646786
3 0.02988 645334
4 0.03984 644317

These dataframes are ready to be processed by ParaDigMa.