Data preparation
ParaDigMa requires the sensor data to be of a specific format. This tutorial provides examples of how to prepare your input data for subsequent analysis. In the end, the input for ParaDigMa is a dataframe consisting of:
A column
time
, representing the seconds relative to the first row of the dataframe;One or multiple of the following sensor column categories:
Accelerometer:
accelerometer_x
,accelerometer_y
andaccelerometer_z
in gGyroscope:
gyroscope_x
,gyroscope_y
andgyroscope_z
in deg/sPPG:
green
The final dataframe should be resampled to 100 Hz, have the correct units for the sensor columns, and the correct format for the time
column. Also note that the gait pipeline expects a specific orientation of sensor axes, as explained in Coordinate system.
Load data
This example uses data of the Personalized Parkinson Project, which is stored in Time Series Data Format (TSDF). Inertial Measurements Units (IMU) and photoplethysmography (PPG) data are sampled at a different sampling frequency and therefore stored separately. Note that ParaDigMa works independent of data storage format; it only requires a pandas
dataframe as input.
from pathlib import Path
from paradigma.util import load_tsdf_dataframe
path_to_raw_data = Path('../../tests/data/data_preparation_tutorial')
path_to_imu_data = path_to_raw_data / 'imu'
df_imu, imu_time, imu_values = load_tsdf_dataframe(
path_to_data=path_to_imu_data,
prefix='IMU'
)
df_imu.head()
time | acceleration_x | acceleration_y | acceleration_z | rotation_x | rotation_y | rotation_z | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | -5.402541 | 5.632536 | -2.684842 | -115.670732 | -32.012195 | 26.097561 |
1 | 10.040039 | -5.257034 | 6.115995 | -2.497091 | -110.609757 | -34.634146 | 24.695122 |
2 | 10.040039 | -4.947244 | 6.392928 | -2.468928 | -103.231708 | -36.768293 | 22.926829 |
3 | 10.040039 | -4.792349 | 6.735574 | -2.605048 | -96.280488 | -38.719512 | 21.158537 |
4 | 10.039795 | -4.848675 | 7.115770 | -2.731780 | -92.560976 | -41.280488 | 20.304878 |
import os
from paradigma.util import load_tsdf_dataframe
path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')
df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(
path_to_data=path_to_ppg_data,
prefix='PPG'
)
df_ppg.head()
time | green | |
---|---|---|
0 | 0.000000 | 649511 |
1 | 9.959961 | 648214 |
2 | 9.959961 | 646786 |
3 | 9.959961 | 645334 |
4 | 9.960205 | 644317 |
The timestamps in this dataset correspond to delta milliseconds, and the data is not uniformly distributed as can be observed.
Prepare dataframe
Change column names
To safeguard robustness of the pipeline, ParaDigMa fixes column names to a predefined standard.
from paradigma.constants import DataColumns
accelerometer_columns = [DataColumns.ACCELEROMETER_X, DataColumns.ACCELEROMETER_Y, DataColumns.ACCELEROMETER_Z]
gyroscope_columns = [DataColumns.GYROSCOPE_X, DataColumns.GYROSCOPE_Y, DataColumns.GYROSCOPE_Z]
# Rename dataframe columns
df_imu = df_imu.rename(columns={
'time': DataColumns.TIME,
'acceleration_x': DataColumns.ACCELEROMETER_X,
'acceleration_y': DataColumns.ACCELEROMETER_Y,
'acceleration_z': DataColumns.ACCELEROMETER_Z,
'rotation_x': DataColumns.GYROSCOPE_X,
'rotation_y': DataColumns.GYROSCOPE_Y,
'rotation_z': DataColumns.GYROSCOPE_Z,
})
# Set columns to a fixed order
df_imu = df_imu[[DataColumns.TIME] + accelerometer_columns + gyroscope_columns]
df_imu.head()
time | accelerometer_x | accelerometer_y | accelerometer_z | gyroscope_x | gyroscope_y | gyroscope_z | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | -5.402541 | 5.632536 | -2.684842 | -115.670732 | -32.012195 | 26.097561 |
1 | 10.040039 | -5.257034 | 6.115995 | -2.497091 | -110.609757 | -34.634146 | 24.695122 |
2 | 10.040039 | -4.947244 | 6.392928 | -2.468928 | -103.231708 | -36.768293 | 22.926829 |
3 | 10.040039 | -4.792349 | 6.735574 | -2.605048 | -96.280488 | -38.719512 | 21.158537 |
4 | 10.039795 | -4.848675 | 7.115770 | -2.731780 | -92.560976 | -41.280488 | 20.304878 |
from paradigma.constants import DataColumns
ppg_columns = [DataColumns.PPG]
# Rename dataframe columns
df_ppg = df_ppg.rename(columns={
'time': DataColumns.TIME,
'ppg': DataColumns.PPG,
})
# Set columns to a fixed order
df_ppg = df_ppg[[DataColumns.TIME] + ppg_columns]
df_ppg.head()
time | green | |
---|---|---|
0 | 0.000000 | 649511 |
1 | 9.959961 | 648214 |
2 | 9.959961 | 646786 |
3 | 9.959961 | 645334 |
4 | 9.960205 | 644317 |
Change units
ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities.
from paradigma.util import convert_units_accelerometer, convert_units_gyroscope
accelerometer_units = 'm/s^2'
gyroscope_units = 'deg/s'
accelerometer_data = df_imu[accelerometer_columns].values
gyroscope_data = df_imu[gyroscope_columns].values
# Convert units to expected format
df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)
df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)
df_imu.head()
time | accelerometer_x | accelerometer_y | accelerometer_z | gyroscope_x | gyroscope_y | gyroscope_z | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | -0.550718 | 0.574163 | -0.273684 | -115.670732 | -32.012195 | 26.097561 |
1 | 10.040039 | -0.535885 | 0.623445 | -0.254545 | -110.609757 | -34.634146 | 24.695122 |
2 | 10.040039 | -0.504306 | 0.651675 | -0.251675 | -103.231708 | -36.768293 | 22.926829 |
3 | 10.040039 | -0.488517 | 0.686603 | -0.265550 | -96.280488 | -38.719512 | 21.158537 |
4 | 10.039795 | -0.494258 | 0.725359 | -0.278469 | -92.560976 | -41.280488 | 20.304878 |
Account for watch side
For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read Coordinate System and set the axes of the data accordingly.
# Change the orientation of the sensor according to the documented coordinate system
df_imu[DataColumns.ACCELEROMETER_Y] *= -1
df_imu[DataColumns.ACCELEROMETER_Z] *= -1
df_imu[DataColumns.GYROSCOPE_Y] *= -1
df_imu[DataColumns.GYROSCOPE_Z] *= -1
df_imu.head()
time | accelerometer_x | accelerometer_y | accelerometer_z | gyroscope_x | gyroscope_y | gyroscope_z | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | -0.550718 | -0.574163 | 0.273684 | -115.670732 | 32.012195 | -26.097561 |
1 | 10.040039 | -0.535885 | -0.623445 | 0.254545 | -110.609757 | 34.634146 | -24.695122 |
2 | 10.040039 | -0.504306 | -0.651675 | 0.251675 | -103.231708 | 36.768293 | -22.926829 |
3 | 10.040039 | -0.488517 | -0.686603 | 0.265550 | -96.280488 | 38.719512 | -21.158537 |
4 | 10.039795 | -0.494258 | -0.725359 | 0.278469 | -92.560976 | 41.280488 | -20.304878 |
Change time column
ParaDigMa expects the data to be in seconds relative to the first row, which should be equal to 0. The toolbox has the built-in function transform_time_array
to help users transform their time column to the correct format if the timestamps have been sampled in delta time between timestamps. In the near future, the functionalities for transforming other types (e.g., datetime format) shall be provided.
from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array
df_imu[DataColumns.TIME] = transform_time_array(
time_array=df_imu[DataColumns.TIME],
input_unit_type=TimeUnit.DIFFERENCE_MS,
output_unit_type=TimeUnit.RELATIVE_S,
)
df_imu.head()
time | accelerometer_x | accelerometer_y | accelerometer_z | gyroscope_x | gyroscope_y | gyroscope_z | |
---|---|---|---|---|---|---|---|
0 | 0.00000 | -0.550718 | -0.574163 | 0.273684 | -115.670732 | 32.012195 | -26.097561 |
1 | 0.01004 | -0.535885 | -0.623445 | 0.254545 | -110.609757 | 34.634146 | -24.695122 |
2 | 0.02008 | -0.504306 | -0.651675 | 0.251675 | -103.231708 | 36.768293 | -22.926829 |
3 | 0.03012 | -0.488517 | -0.686603 | 0.265550 | -96.280488 | 38.719512 | -21.158537 |
4 | 0.04016 | -0.494258 | -0.725359 | 0.278469 | -92.560976 | 41.280488 | -20.304878 |
from paradigma.constants import TimeUnit
from paradigma.util import transform_time_array
df_ppg[DataColumns.TIME] = transform_time_array(
time_array=df_ppg[DataColumns.TIME],
input_unit_type=TimeUnit.DIFFERENCE_MS,
output_unit_type=TimeUnit.RELATIVE_S,
)
df_ppg.head()
time | green | |
---|---|---|
0 | 0.00000 | 649511 |
1 | 0.00996 | 648214 |
2 | 0.01992 | 646786 |
3 | 0.02988 | 645334 |
4 | 0.03984 | 644317 |
These dataframes are ready to be processed by ParaDigMa.