Data preparation

ParaDigMa requires the sensor data to be of a specific format. This tutorial provides examples of how to prepare your input data for subsequent analysis. In the end, the input for ParaDigMa is a dataframe consisting of:

A column time, representing the seconds relative to the first row of the dataframe;
One or multiple of the following sensor column categories:
- Accelerometer: accelerometer_x, accelerometer_y and accelerometer_z in g
- Gyroscope: gyroscope_x, gyroscope_y and gyroscope_z in deg/s
- PPG: green

The final dataframe should be resampled to 100 Hz, have the correct units for the sensor columns, and the correct format for the time column. Also note that the gait pipeline expects a specific orientation of sensor axes, as explained in Coordinate system.

Load data

This example uses data of the Personalized Parkinson Project, which is stored in Time Series Data Format (TSDF). Inertial Measurements Units (IMU) and photoplethysmography (PPG) data are sampled at a different sampling frequency and therefore stored separately. Note that ParaDigMa works independent of data storage format; it only requires a pandas dataframe as input.

from pathlib import Path
from paradigma.util import load_tsdf_dataframe

path_to_raw_data = Path('../../tests/data/data_preparation_tutorial')
path_to_imu_data = path_to_raw_data / 'imu'

df_imu, imu_time, imu_values = load_tsdf_dataframe(
    path_to_data=path_to_imu_data, 
    prefix='IMU'
)

df_imu.head()

	time	acceleration_x	acceleration_y	acceleration_z	rotation_x	rotation_y	rotation_z
0	0.000000	-5.402541	5.632536	-2.684842	-115.670732	-32.012195	26.097561
1	10.040039	-5.257034	6.115995	-2.497091	-110.609757	-34.634146	24.695122
2	10.040039	-4.947244	6.392928	-2.468928	-103.231708	-36.768293	22.926829
3	10.040039	-4.792349	6.735574	-2.605048	-96.280488	-38.719512	21.158537
4	10.039795	-4.848675	7.115770	-2.731780	-92.560976	-41.280488	20.304878

import os
from paradigma.util import load_tsdf_dataframe

path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')

df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(
    path_to_data=path_to_ppg_data, 
    prefix='PPG'
)

df_ppg.head()

	time	green
0	0.000000	649511
1	9.959961	648214
2	9.959961	646786
3	9.959961	645334
4	9.960205	644317

The timestamps in this dataset correspond to delta milliseconds, and the data is not uniformly distributed as can be observed.

Prepare dataframe

Change column names

To safeguard robustness of the pipeline, ParaDigMa fixes column names to a predefined standard.

from paradigma.constants import DataColumns

accelerometer_columns = [DataColumns.ACCELEROMETER_X, DataColumns.ACCELEROMETER_Y, DataColumns.ACCELEROMETER_Z]
gyroscope_columns = [DataColumns.GYROSCOPE_X, DataColumns.GYROSCOPE_Y, DataColumns.GYROSCOPE_Z]

# Rename dataframe columns
df_imu = df_imu.rename(columns={
    'time': DataColumns.TIME,
    'acceleration_x': DataColumns.ACCELEROMETER_X,
    'acceleration_y': DataColumns.ACCELEROMETER_Y,
    'acceleration_z': DataColumns.ACCELEROMETER_Z,
    'rotation_x': DataColumns.GYROSCOPE_X,
    'rotation_y': DataColumns.GYROSCOPE_Y,
    'rotation_z': DataColumns.GYROSCOPE_Z,
})

# Set columns to a fixed order
df_imu = df_imu[[DataColumns.TIME] + accelerometer_columns + gyroscope_columns]

df_imu.head()

	time	accelerometer_x	accelerometer_y	accelerometer_z	gyroscope_x	gyroscope_y	gyroscope_z
0	0.000000	-5.402541	5.632536	-2.684842	-115.670732	-32.012195	26.097561
1	10.040039	-5.257034	6.115995	-2.497091	-110.609757	-34.634146	24.695122
2	10.040039	-4.947244	6.392928	-2.468928	-103.231708	-36.768293	22.926829
3	10.040039	-4.792349	6.735574	-2.605048	-96.280488	-38.719512	21.158537
4	10.039795	-4.848675	7.115770	-2.731780	-92.560976	-41.280488	20.304878

from paradigma.constants import DataColumns

ppg_columns = [DataColumns.PPG]

# Rename dataframe columns
df_ppg = df_ppg.rename(columns={
    'time': DataColumns.TIME,
    'ppg': DataColumns.PPG,
})

# Set columns to a fixed order
df_ppg = df_ppg[[DataColumns.TIME] + ppg_columns]

df_ppg.head()

	time	green
0	0.000000	649511
1	9.959961	648214
2	9.959961	646786
3	9.959961	645334
4	9.960205	644317

Change units

ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities.

from paradigma.util import convert_units_accelerometer, convert_units_gyroscope

accelerometer_units = 'm/s^2'
gyroscope_units = 'deg/s'

accelerometer_data = df_imu[accelerometer_columns].values
gyroscope_data = df_imu[gyroscope_columns].values

# Convert units to expected format
df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)
df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)

df_imu.head()

	time	accelerometer_x	accelerometer_y	accelerometer_z	gyroscope_x	gyroscope_y	gyroscope_z
0	0.000000	-0.550718	0.574163	-0.273684	-115.670732	-32.012195	26.097561
1	10.040039	-0.535885	0.623445	-0.254545	-110.609757	-34.634146	24.695122
2	10.040039	-0.504306	0.651675	-0.251675	-103.231708	-36.768293	22.926829
3	10.040039	-0.488517	0.686603	-0.265550	-96.280488	-38.719512	21.158537
4	10.039795	-0.494258	0.725359	-0.278469	-92.560976	-41.280488	20.304878

Account for watch side

For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read Coordinate System and set the axes of the data accordingly.

# Change the orientation of the sensor according to the documented coordinate system
df_imu[DataColumns.ACCELEROMETER_Y] *= -1
df_imu[DataColumns.ACCELEROMETER_Z] *= -1
df_imu[DataColumns.GYROSCOPE_Y] *= -1
df_imu[DataColumns.GYROSCOPE_Z] *= -1

df_imu.head()

	time	accelerometer_x	accelerometer_y	accelerometer_z	gyroscope_x	gyroscope_y	gyroscope_z
0	0.000000	-0.550718	-0.574163	0.273684	-115.670732	32.012195	-26.097561
1	10.040039	-0.535885	-0.623445	0.254545	-110.609757	34.634146	-24.695122
2	10.040039	-0.504306	-0.651675	0.251675	-103.231708	36.768293	-22.926829
3	10.040039	-0.488517	-0.686603	0.265550	-96.280488	38.719512	-21.158537
4	10.039795	-0.494258	-0.725359	0.278469	-92.560976	41.280488	-20.304878

Change time column

ParaDigMa expects the data to be in seconds relative to the first row, which should be equal to 0. The toolbox has the built-in function transform_time_array to help users transform their time column to the correct format if the timestamps have been sampled in delta time between timestamps. In the near future, the functionalities for transforming other types (e.g., datetime format) shall be provided.

from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_imu[DataColumns.TIME] = transform_time_array(
    time_array=df_imu[DataColumns.TIME], 
    input_unit_type=TimeUnit.DIFFERENCE_MS, 
    output_unit_type=TimeUnit.RELATIVE_S,
)

df_imu.head()

	time	accelerometer_x	accelerometer_y	accelerometer_z	gyroscope_x	gyroscope_y	gyroscope_z
0	0.00000	-0.550718	-0.574163	0.273684	-115.670732	32.012195	-26.097561
1	0.01004	-0.535885	-0.623445	0.254545	-110.609757	34.634146	-24.695122
2	0.02008	-0.504306	-0.651675	0.251675	-103.231708	36.768293	-22.926829
3	0.03012	-0.488517	-0.686603	0.265550	-96.280488	38.719512	-21.158537
4	0.04016	-0.494258	-0.725359	0.278469	-92.560976	41.280488	-20.304878

from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array

df_ppg[DataColumns.TIME] = transform_time_array(
    time_array=df_ppg[DataColumns.TIME], 
    input_unit_type=TimeUnit.DIFFERENCE_MS, 
    output_unit_type=TimeUnit.RELATIVE_S,
)

df_ppg.head()

	time	green
0	0.00000	649511
1	0.00996	648214
2	0.01992	646786
3	0.02988	645334
4	0.03984	644317

These dataframes are ready to be processed by ParaDigMa.