{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data preparation\n",
"ParaDigMa requires the sensor data to be of a specific format. This tutorial provides examples of how to prepare your input data for subsequent analysis. In the end, the input for ParaDigMa is a dataframe consisting of:\n",
"* A column `time`, representing the seconds relative to the first row of the dataframe;\n",
"* One or multiple of the following sensor column categories:\n",
" * Accelerometer: `accelerometer_x`, `accelerometer_y` and `accelerometer_z` in _g_\n",
" * Gyroscope: `gyroscope_x`, `gyroscope_y` and `gyroscope_z` in _deg/s_\n",
" * PPG: `green` \n",
"\n",
"The final dataframe should be resampled to 100 Hz, have the correct units for the sensor columns, and the correct format for the `time` column. Also note that the _gait_ pipeline expects a specific orientation of sensor axes, as explained in [Coordinate system](../guides/coordinate_system.md)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load data\n",
"This example uses data of the [Personalized Parkinson Project](https://pubmed.ncbi.nlm.nih.gov/31315608/), which is stored in Time Series Data Format (TSDF). Inertial Measurements Units (IMU) and photoplethysmography (PPG) data are sampled at a different sampling frequency and therefore stored separately. Note that ParaDigMa works independent of data storage format; it only requires a `pandas` dataframe as input."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" acceleration_x | \n",
" acceleration_y | \n",
" acceleration_z | \n",
" rotation_x | \n",
" rotation_y | \n",
" rotation_z | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" -5.402541 | \n",
" 5.632536 | \n",
" -2.684842 | \n",
" -115.670732 | \n",
" -32.012195 | \n",
" 26.097561 | \n",
"
\n",
" \n",
" 1 | \n",
" 10.040039 | \n",
" -5.257034 | \n",
" 6.115995 | \n",
" -2.497091 | \n",
" -110.609757 | \n",
" -34.634146 | \n",
" 24.695122 | \n",
"
\n",
" \n",
" 2 | \n",
" 10.040039 | \n",
" -4.947244 | \n",
" 6.392928 | \n",
" -2.468928 | \n",
" -103.231708 | \n",
" -36.768293 | \n",
" 22.926829 | \n",
"
\n",
" \n",
" 3 | \n",
" 10.040039 | \n",
" -4.792349 | \n",
" 6.735574 | \n",
" -2.605048 | \n",
" -96.280488 | \n",
" -38.719512 | \n",
" 21.158537 | \n",
"
\n",
" \n",
" 4 | \n",
" 10.039795 | \n",
" -4.848675 | \n",
" 7.115770 | \n",
" -2.731780 | \n",
" -92.560976 | \n",
" -41.280488 | \n",
" 20.304878 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time acceleration_x acceleration_y acceleration_z rotation_x \\\n",
"0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 \n",
"1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 \n",
"2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 \n",
"3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 \n",
"4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 \n",
"\n",
" rotation_y rotation_z \n",
"0 -32.012195 26.097561 \n",
"1 -34.634146 24.695122 \n",
"2 -36.768293 22.926829 \n",
"3 -38.719512 21.158537 \n",
"4 -41.280488 20.304878 "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pathlib import Path\n",
"from paradigma.util import load_tsdf_dataframe\n",
"\n",
"path_to_raw_data = Path('../../tests/data/data_preparation_tutorial')\n",
"path_to_imu_data = path_to_raw_data / 'imu'\n",
"\n",
"df_imu, imu_time, imu_values = load_tsdf_dataframe(\n",
" path_to_data=path_to_imu_data, \n",
" prefix='IMU'\n",
")\n",
"\n",
"df_imu.head()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" green | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" 649511 | \n",
"
\n",
" \n",
" 1 | \n",
" 9.959961 | \n",
" 648214 | \n",
"
\n",
" \n",
" 2 | \n",
" 9.959961 | \n",
" 646786 | \n",
"
\n",
" \n",
" 3 | \n",
" 9.959961 | \n",
" 645334 | \n",
"
\n",
" \n",
" 4 | \n",
" 9.960205 | \n",
" 644317 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time green\n",
"0 0.000000 649511\n",
"1 9.959961 648214\n",
"2 9.959961 646786\n",
"3 9.959961 645334\n",
"4 9.960205 644317"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"from paradigma.util import load_tsdf_dataframe\n",
"\n",
"path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')\n",
"\n",
"df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(\n",
" path_to_data=path_to_ppg_data, \n",
" prefix='PPG'\n",
")\n",
"\n",
"df_ppg.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The timestamps in this dataset correspond to delta milliseconds, and the data is not uniformly distributed as can be observed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare dataframe\n",
"\n",
"#### Change column names\n",
"To safeguard robustness of the pipeline, ParaDigMa fixes column names to a predefined standard."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" accelerometer_x | \n",
" accelerometer_y | \n",
" accelerometer_z | \n",
" gyroscope_x | \n",
" gyroscope_y | \n",
" gyroscope_z | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" -5.402541 | \n",
" 5.632536 | \n",
" -2.684842 | \n",
" -115.670732 | \n",
" -32.012195 | \n",
" 26.097561 | \n",
"
\n",
" \n",
" 1 | \n",
" 10.040039 | \n",
" -5.257034 | \n",
" 6.115995 | \n",
" -2.497091 | \n",
" -110.609757 | \n",
" -34.634146 | \n",
" 24.695122 | \n",
"
\n",
" \n",
" 2 | \n",
" 10.040039 | \n",
" -4.947244 | \n",
" 6.392928 | \n",
" -2.468928 | \n",
" -103.231708 | \n",
" -36.768293 | \n",
" 22.926829 | \n",
"
\n",
" \n",
" 3 | \n",
" 10.040039 | \n",
" -4.792349 | \n",
" 6.735574 | \n",
" -2.605048 | \n",
" -96.280488 | \n",
" -38.719512 | \n",
" 21.158537 | \n",
"
\n",
" \n",
" 4 | \n",
" 10.039795 | \n",
" -4.848675 | \n",
" 7.115770 | \n",
" -2.731780 | \n",
" -92.560976 | \n",
" -41.280488 | \n",
" 20.304878 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n",
"0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 \n",
"1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 \n",
"2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 \n",
"3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 \n",
"4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 \n",
"\n",
" gyroscope_y gyroscope_z \n",
"0 -32.012195 26.097561 \n",
"1 -34.634146 24.695122 \n",
"2 -36.768293 22.926829 \n",
"3 -38.719512 21.158537 \n",
"4 -41.280488 20.304878 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from paradigma.constants import DataColumns\n",
"\n",
"accelerometer_columns = [DataColumns.ACCELEROMETER_X, DataColumns.ACCELEROMETER_Y, DataColumns.ACCELEROMETER_Z]\n",
"gyroscope_columns = [DataColumns.GYROSCOPE_X, DataColumns.GYROSCOPE_Y, DataColumns.GYROSCOPE_Z]\n",
"\n",
"# Rename dataframe columns\n",
"df_imu = df_imu.rename(columns={\n",
" 'time': DataColumns.TIME,\n",
" 'acceleration_x': DataColumns.ACCELEROMETER_X,\n",
" 'acceleration_y': DataColumns.ACCELEROMETER_Y,\n",
" 'acceleration_z': DataColumns.ACCELEROMETER_Z,\n",
" 'rotation_x': DataColumns.GYROSCOPE_X,\n",
" 'rotation_y': DataColumns.GYROSCOPE_Y,\n",
" 'rotation_z': DataColumns.GYROSCOPE_Z,\n",
"})\n",
"\n",
"# Set columns to a fixed order\n",
"df_imu = df_imu[[DataColumns.TIME] + accelerometer_columns + gyroscope_columns]\n",
"\n",
"df_imu.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" green | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" 649511 | \n",
"
\n",
" \n",
" 1 | \n",
" 9.959961 | \n",
" 648214 | \n",
"
\n",
" \n",
" 2 | \n",
" 9.959961 | \n",
" 646786 | \n",
"
\n",
" \n",
" 3 | \n",
" 9.959961 | \n",
" 645334 | \n",
"
\n",
" \n",
" 4 | \n",
" 9.960205 | \n",
" 644317 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time green\n",
"0 0.000000 649511\n",
"1 9.959961 648214\n",
"2 9.959961 646786\n",
"3 9.959961 645334\n",
"4 9.960205 644317"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from paradigma.constants import DataColumns\n",
"\n",
"ppg_columns = [DataColumns.PPG]\n",
"\n",
"# Rename dataframe columns\n",
"df_ppg = df_ppg.rename(columns={\n",
" 'time': DataColumns.TIME,\n",
" 'ppg': DataColumns.PPG,\n",
"})\n",
"\n",
"# Set columns to a fixed order\n",
"df_ppg = df_ppg[[DataColumns.TIME] + ppg_columns]\n",
"\n",
"df_ppg.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Change units\n",
"ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" accelerometer_x | \n",
" accelerometer_y | \n",
" accelerometer_z | \n",
" gyroscope_x | \n",
" gyroscope_y | \n",
" gyroscope_z | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" -0.550718 | \n",
" 0.574163 | \n",
" -0.273684 | \n",
" -115.670732 | \n",
" -32.012195 | \n",
" 26.097561 | \n",
"
\n",
" \n",
" 1 | \n",
" 10.040039 | \n",
" -0.535885 | \n",
" 0.623445 | \n",
" -0.254545 | \n",
" -110.609757 | \n",
" -34.634146 | \n",
" 24.695122 | \n",
"
\n",
" \n",
" 2 | \n",
" 10.040039 | \n",
" -0.504306 | \n",
" 0.651675 | \n",
" -0.251675 | \n",
" -103.231708 | \n",
" -36.768293 | \n",
" 22.926829 | \n",
"
\n",
" \n",
" 3 | \n",
" 10.040039 | \n",
" -0.488517 | \n",
" 0.686603 | \n",
" -0.265550 | \n",
" -96.280488 | \n",
" -38.719512 | \n",
" 21.158537 | \n",
"
\n",
" \n",
" 4 | \n",
" 10.039795 | \n",
" -0.494258 | \n",
" 0.725359 | \n",
" -0.278469 | \n",
" -92.560976 | \n",
" -41.280488 | \n",
" 20.304878 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n",
"0 0.000000 -0.550718 0.574163 -0.273684 -115.670732 \n",
"1 10.040039 -0.535885 0.623445 -0.254545 -110.609757 \n",
"2 10.040039 -0.504306 0.651675 -0.251675 -103.231708 \n",
"3 10.040039 -0.488517 0.686603 -0.265550 -96.280488 \n",
"4 10.039795 -0.494258 0.725359 -0.278469 -92.560976 \n",
"\n",
" gyroscope_y gyroscope_z \n",
"0 -32.012195 26.097561 \n",
"1 -34.634146 24.695122 \n",
"2 -36.768293 22.926829 \n",
"3 -38.719512 21.158537 \n",
"4 -41.280488 20.304878 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from paradigma.util import convert_units_accelerometer, convert_units_gyroscope\n",
"\n",
"accelerometer_units = 'm/s^2'\n",
"gyroscope_units = 'deg/s'\n",
"\n",
"accelerometer_data = df_imu[accelerometer_columns].values\n",
"gyroscope_data = df_imu[gyroscope_columns].values\n",
"\n",
"# Convert units to expected format\n",
"df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)\n",
"df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)\n",
"\n",
"df_imu.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Account for watch side\n",
"For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read [Coordinate System](../guides/coordinate_system.md) and set the axes of the data accordingly."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" accelerometer_x | \n",
" accelerometer_y | \n",
" accelerometer_z | \n",
" gyroscope_x | \n",
" gyroscope_y | \n",
" gyroscope_z | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" -0.550718 | \n",
" -0.574163 | \n",
" 0.273684 | \n",
" -115.670732 | \n",
" 32.012195 | \n",
" -26.097561 | \n",
"
\n",
" \n",
" 1 | \n",
" 10.040039 | \n",
" -0.535885 | \n",
" -0.623445 | \n",
" 0.254545 | \n",
" -110.609757 | \n",
" 34.634146 | \n",
" -24.695122 | \n",
"
\n",
" \n",
" 2 | \n",
" 10.040039 | \n",
" -0.504306 | \n",
" -0.651675 | \n",
" 0.251675 | \n",
" -103.231708 | \n",
" 36.768293 | \n",
" -22.926829 | \n",
"
\n",
" \n",
" 3 | \n",
" 10.040039 | \n",
" -0.488517 | \n",
" -0.686603 | \n",
" 0.265550 | \n",
" -96.280488 | \n",
" 38.719512 | \n",
" -21.158537 | \n",
"
\n",
" \n",
" 4 | \n",
" 10.039795 | \n",
" -0.494258 | \n",
" -0.725359 | \n",
" 0.278469 | \n",
" -92.560976 | \n",
" 41.280488 | \n",
" -20.304878 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n",
"0 0.000000 -0.550718 -0.574163 0.273684 -115.670732 \n",
"1 10.040039 -0.535885 -0.623445 0.254545 -110.609757 \n",
"2 10.040039 -0.504306 -0.651675 0.251675 -103.231708 \n",
"3 10.040039 -0.488517 -0.686603 0.265550 -96.280488 \n",
"4 10.039795 -0.494258 -0.725359 0.278469 -92.560976 \n",
"\n",
" gyroscope_y gyroscope_z \n",
"0 32.012195 -26.097561 \n",
"1 34.634146 -24.695122 \n",
"2 36.768293 -22.926829 \n",
"3 38.719512 -21.158537 \n",
"4 41.280488 -20.304878 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Change the orientation of the sensor according to the documented coordinate system\n",
"df_imu[DataColumns.ACCELEROMETER_Y] *= -1\n",
"df_imu[DataColumns.ACCELEROMETER_Z] *= -1\n",
"df_imu[DataColumns.GYROSCOPE_Y] *= -1\n",
"df_imu[DataColumns.GYROSCOPE_Z] *= -1\n",
"\n",
"df_imu.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Change time column\n",
"ParaDigMa expects the data to be in seconds relative to the first row, which should be equal to 0. The toolbox has the built-in function `transform_time_array` to help users transform their time column to the correct format if the timestamps have been sampled in delta time between timestamps. In the near future, the functionalities for transforming other types (e.g., datetime format) shall be provided."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" accelerometer_x | \n",
" accelerometer_y | \n",
" accelerometer_z | \n",
" gyroscope_x | \n",
" gyroscope_y | \n",
" gyroscope_z | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.00000 | \n",
" -0.550718 | \n",
" -0.574163 | \n",
" 0.273684 | \n",
" -115.670732 | \n",
" 32.012195 | \n",
" -26.097561 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.01004 | \n",
" -0.535885 | \n",
" -0.623445 | \n",
" 0.254545 | \n",
" -110.609757 | \n",
" 34.634146 | \n",
" -24.695122 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.02008 | \n",
" -0.504306 | \n",
" -0.651675 | \n",
" 0.251675 | \n",
" -103.231708 | \n",
" 36.768293 | \n",
" -22.926829 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.03012 | \n",
" -0.488517 | \n",
" -0.686603 | \n",
" 0.265550 | \n",
" -96.280488 | \n",
" 38.719512 | \n",
" -21.158537 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.04016 | \n",
" -0.494258 | \n",
" -0.725359 | \n",
" 0.278469 | \n",
" -92.560976 | \n",
" 41.280488 | \n",
" -20.304878 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n",
"0 0.00000 -0.550718 -0.574163 0.273684 -115.670732 \n",
"1 0.01004 -0.535885 -0.623445 0.254545 -110.609757 \n",
"2 0.02008 -0.504306 -0.651675 0.251675 -103.231708 \n",
"3 0.03012 -0.488517 -0.686603 0.265550 -96.280488 \n",
"4 0.04016 -0.494258 -0.725359 0.278469 -92.560976 \n",
"\n",
" gyroscope_y gyroscope_z \n",
"0 32.012195 -26.097561 \n",
"1 34.634146 -24.695122 \n",
"2 36.768293 -22.926829 \n",
"3 38.719512 -21.158537 \n",
"4 41.280488 -20.304878 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from paradigma.constants import TimeUnit\n",
"from paradigma.util import transform_time_array\n",
"\n",
"df_imu[DataColumns.TIME] = transform_time_array(\n",
" time_array=df_imu[DataColumns.TIME], \n",
" input_unit_type=TimeUnit.DIFFERENCE_MS, \n",
" output_unit_type=TimeUnit.RELATIVE_S,\n",
")\n",
"\n",
"df_imu.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" green | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.00000 | \n",
" 649511 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.00996 | \n",
" 648214 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.01992 | \n",
" 646786 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.02988 | \n",
" 645334 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.03984 | \n",
" 644317 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time green\n",
"0 0.00000 649511\n",
"1 0.00996 648214\n",
"2 0.01992 646786\n",
"3 0.02988 645334\n",
"4 0.03984 644317"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from paradigma.constants import TimeUnit\n",
"from paradigma.util import transform_time_array\n",
"\n",
"df_ppg[DataColumns.TIME] = transform_time_array(\n",
" time_array=df_ppg[DataColumns.TIME], \n",
" input_unit_type=TimeUnit.DIFFERENCE_MS, \n",
" output_unit_type=TimeUnit.RELATIVE_S,\n",
")\n",
"\n",
"df_ppg.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These dataframes are ready to be processed by ParaDigMa."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "paradigma-Fn6RLG4_-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}