Data preparation

ParaDigMa requires the sensor data to be of a specific format. This tutorial provides examples of how to prepare your input data for subsequent analysis. In the end, the input for ParaDigMa is a dataframe consisting of:

  • A time column representing the seconds relative to the first row of the dataframe;

  • One or multiple of the following sensor column categories:

    • Triaxial accelerometer (x, y, z) in g

    • Triaxial gyroscope (x, y, z) in deg/s

    • Photoplethysmography (PPG)

The final dataframe should be resampled to 100 Hz, have the correct units for the sensor columns, and the correct format for the time column. Also note that the gait pipeline expects a specific orientation of sensor axes, as explained in Coordinate system.

Load data

This example uses data of the Personalized Parkinson Project, which is stored in Time Series Data Format (TSDF). Inertial Measurements Units (IMU) and photoplethysmography (PPG) data are sampled at a different sampling frequency and therefore stored separately. Note that ParaDigMa works independent of data storage format; it only requires a pandas dataframe as input.

from pathlib import Path
from paradigma.util import load_tsdf_dataframe

path_to_raw_data = Path('../../tests/data/data_preparation_tutorial')
path_to_imu_data = path_to_raw_data / 'imu'

df_imu, imu_time, imu_values = load_tsdf_dataframe(
    path_to_data=path_to_imu_data,
    prefix='IMU'
)

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 -32.012195 26.097561
1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 -34.634146 24.695122
2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 -36.768293 22.926829
3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 -38.719512 21.158537
4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 -41.280488 20.304878
import os
from paradigma.util import load_tsdf_dataframe

path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')

df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(
    path_to_data=path_to_ppg_data,
    prefix='PPG'
)

df_ppg.head()
time green
0 0.000000 649511
1 9.959961 648214
2 9.959961 646786
3 9.959961 645334
4 9.960205 644317

The timestamps in this dataset correspond to delta milliseconds, and the data is not uniformly distributed as can be observed.

Prepare dataframe

Set column names

You are free to choose column names, although we recommend using the column names set in ParaDigMa for convenience in subsequent data processing steps. These are accessible through the class DataColumns, which can be imported from paradigma.constants. For example, we recommend setting acc_x_colname to DataColumns.ACCELEROMETER_X. Again, this is not strictly necessary for future steps.

time_colname = 'time'  # DataColumns.TIME

acc_x_colname = 'accelerometer_x'  # DataColumns.ACCELEROMETER_X
acc_y_colname = 'accelerometer_y'  # DataColumns.ACCELEROMETER_Y
acc_z_colname = 'accelerometer_z'  # DataColumns.ACCELEROMETER_Z
gyr_x_colname = 'gyroscope_x'  # DataColumns.GYROSCOPE_X
gyr_y_colname = 'gyroscope_y'  # DataColumns.GYROSCOPE_Y
gyr_z_colname = 'gyroscope_z'  # DataColumns.GYROSCOPE_Z

ppg_colname = 'green'  # DataColumns.PPG

Change units

ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities.

from paradigma.util import convert_units_accelerometer, convert_units_gyroscope

# Set to units of the sampled data
accelerometer_units = 'm/s^2'
gyroscope_units = 'deg/s'

# State the column names
accelerometer_columns = [acc_x_colname, acc_y_colname, acc_z_colname]
gyroscope_columns = [gyr_x_colname, gyr_y_colname, gyr_z_colname]

accelerometer_data = df_imu[accelerometer_columns].values
gyroscope_data = df_imu[gyroscope_columns].values

# Convert units to expected format
df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)
df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.000000 -0.550718 0.574163 -0.273684 -115.670732 -32.012195 26.097561
1 10.040039 -0.535885 0.623445 -0.254545 -110.609757 -34.634146 24.695122
2 10.040039 -0.504306 0.651675 -0.251675 -103.231708 -36.768293 22.926829
3 10.040039 -0.488517 0.686603 -0.265550 -96.280488 -38.719512 21.158537
4 10.039795 -0.494258 0.725359 -0.278469 -92.560976 -41.280488 20.304878

Account for watch side

For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read Coordinate System and set the axes of the data accordingly.

# Change the orientation of the sensor according to the documented coordinate system. The following
# changes are specific to the used sensor and its orientation relative to predefined coordinate system.
df_imu[acc_y_colname] *= -1
df_imu[acc_z_colname] *= -1
df_imu[gyr_y_colname] *= -1
df_imu[gyr_z_colname] *= -1

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.000000 -0.550718 -0.574163 0.273684 -115.670732 32.012195 -26.097561
1 10.040039 -0.535885 -0.623445 0.254545 -110.609757 34.634146 -24.695122
2 10.040039 -0.504306 -0.651675 0.251675 -103.231708 36.768293 -22.926829
3 10.040039 -0.488517 -0.686603 0.265550 -96.280488 38.719512 -21.158537
4 10.039795 -0.494258 -0.725359 0.278469 -92.560976 41.280488 -20.304878

Change time column

ParaDigMa expects the data to be in seconds relative to the first row, which should be equal to 0. The toolbox has the built-in function transform_time_array to help users transform their time column to the correct format if the timestamps have been sampled in delta time between timestamps. In the near future, the functionalities for transforming other types (e.g., datetime format) shall be provided.

from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_imu[time_colname] = transform_time_array(
    time_array=df_imu[time_colname],
    input_unit_type=TimeUnit.DIFFERENCE_MS,
    output_unit_type=TimeUnit.RELATIVE_S,
)

df_imu.head()
time accelerometer_x accelerometer_y accelerometer_z gyroscope_x gyroscope_y gyroscope_z
0 0.00000 -0.550718 -0.574163 0.273684 -115.670732 32.012195 -26.097561
1 0.01004 -0.535885 -0.623445 0.254545 -110.609757 34.634146 -24.695122
2 0.02008 -0.504306 -0.651675 0.251675 -103.231708 36.768293 -22.926829
3 0.03012 -0.488517 -0.686603 0.265550 -96.280488 38.719512 -21.158537
4 0.04016 -0.494258 -0.725359 0.278469 -92.560976 41.280488 -20.304878
from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_ppg[time_colname] = transform_time_array(
    time_array=df_ppg[time_colname],
    input_unit_type=TimeUnit.DIFFERENCE_MS,
    output_unit_type=TimeUnit.RELATIVE_S,
)

df_ppg.head()
time green
0 0.00000 649511
1 0.00996 648214
2 0.01992 646786
3 0.02988 645334
4 0.03984 644317

These dataframes are ready to be processed by ParaDigMa.