{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data preparation\n", "ParaDigMa requires the sensor data to be of a specific format. This tutorial provides examples of how to prepare your input data for subsequent analysis. In the end, the input for ParaDigMa is a dataframe consisting of:\n", "* A column `time`, representing the seconds relative to the first row of the dataframe;\n", "* One or multiple of the following sensor column categories:\n", " * Accelerometer: `accelerometer_x`, `accelerometer_y` and `accelerometer_z` in _g_\n", " * Gyroscope: `gyroscope_x`, `gyroscope_y` and `gyroscope_z` in _deg/s_\n", " * PPG: `green` \n", "\n", "The final dataframe should be resampled to 100 Hz, have the correct units for the sensor columns, and the correct format for the `time` column. Also note that the _gait_ pipeline expects a specific orientation of sensor axes, as explained in [Coordinate system](../guides/coordinate_system.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data\n", "This example uses data of the [Personalized Parkinson Project](https://pubmed.ncbi.nlm.nih.gov/31315608/), which is stored in Time Series Data Format (TSDF). Inertial Measurements Units (IMU) and photoplethysmography (PPG) data are sampled at a different sampling frequency and therefore stored separately. Note that ParaDigMa works independent of data storage format; it only requires a `pandas` dataframe as input." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeacceleration_xacceleration_yacceleration_zrotation_xrotation_yrotation_z
00.000000-5.4025415.632536-2.684842-115.670732-32.01219526.097561
110.040039-5.2570346.115995-2.497091-110.609757-34.63414624.695122
210.040039-4.9472446.392928-2.468928-103.231708-36.76829322.926829
310.040039-4.7923496.735574-2.605048-96.280488-38.71951221.158537
410.039795-4.8486757.115770-2.731780-92.560976-41.28048820.304878
\n", "
" ], "text/plain": [ " time acceleration_x acceleration_y acceleration_z rotation_x \\\n", "0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 \n", "1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 \n", "2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 \n", "3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 \n", "4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 \n", "\n", " rotation_y rotation_z \n", "0 -32.012195 26.097561 \n", "1 -34.634146 24.695122 \n", "2 -36.768293 22.926829 \n", "3 -38.719512 21.158537 \n", "4 -41.280488 20.304878 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pathlib import Path\n", "from paradigma.util import load_tsdf_dataframe\n", "\n", "path_to_raw_data = Path('../../tests/data/data_preparation_tutorial')\n", "path_to_imu_data = path_to_raw_data / 'imu'\n", "\n", "df_imu, imu_time, imu_values = load_tsdf_dataframe(\n", " path_to_data=path_to_imu_data, \n", " prefix='IMU'\n", ")\n", "\n", "df_imu.head()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timegreen
00.000000649511
19.959961648214
29.959961646786
39.959961645334
49.960205644317
\n", "
" ], "text/plain": [ " time green\n", "0 0.000000 649511\n", "1 9.959961 648214\n", "2 9.959961 646786\n", "3 9.959961 645334\n", "4 9.960205 644317" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "from paradigma.util import load_tsdf_dataframe\n", "\n", "path_to_ppg_data = os.path.join(path_to_raw_data, 'ppg')\n", "\n", "df_ppg, ppg_time, ppg_values = load_tsdf_dataframe(\n", " path_to_data=path_to_ppg_data, \n", " prefix='PPG'\n", ")\n", "\n", "df_ppg.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The timestamps in this dataset correspond to delta milliseconds, and the data is not uniformly distributed as can be observed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare dataframe\n", "\n", "#### Change column names\n", "To safeguard robustness of the pipeline, ParaDigMa fixes column names to a predefined standard." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_xaccelerometer_yaccelerometer_zgyroscope_xgyroscope_ygyroscope_z
00.000000-5.4025415.632536-2.684842-115.670732-32.01219526.097561
110.040039-5.2570346.115995-2.497091-110.609757-34.63414624.695122
210.040039-4.9472446.392928-2.468928-103.231708-36.76829322.926829
310.040039-4.7923496.735574-2.605048-96.280488-38.71951221.158537
410.039795-4.8486757.115770-2.731780-92.560976-41.28048820.304878
\n", "
" ], "text/plain": [ " time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n", "0 0.000000 -5.402541 5.632536 -2.684842 -115.670732 \n", "1 10.040039 -5.257034 6.115995 -2.497091 -110.609757 \n", "2 10.040039 -4.947244 6.392928 -2.468928 -103.231708 \n", "3 10.040039 -4.792349 6.735574 -2.605048 -96.280488 \n", "4 10.039795 -4.848675 7.115770 -2.731780 -92.560976 \n", "\n", " gyroscope_y gyroscope_z \n", "0 -32.012195 26.097561 \n", "1 -34.634146 24.695122 \n", "2 -36.768293 22.926829 \n", "3 -38.719512 21.158537 \n", "4 -41.280488 20.304878 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.constants import DataColumns\n", "\n", "accelerometer_columns = [DataColumns.ACCELEROMETER_X, DataColumns.ACCELEROMETER_Y, DataColumns.ACCELEROMETER_Z]\n", "gyroscope_columns = [DataColumns.GYROSCOPE_X, DataColumns.GYROSCOPE_Y, DataColumns.GYROSCOPE_Z]\n", "\n", "# Rename dataframe columns\n", "df_imu = df_imu.rename(columns={\n", " 'time': DataColumns.TIME,\n", " 'acceleration_x': DataColumns.ACCELEROMETER_X,\n", " 'acceleration_y': DataColumns.ACCELEROMETER_Y,\n", " 'acceleration_z': DataColumns.ACCELEROMETER_Z,\n", " 'rotation_x': DataColumns.GYROSCOPE_X,\n", " 'rotation_y': DataColumns.GYROSCOPE_Y,\n", " 'rotation_z': DataColumns.GYROSCOPE_Z,\n", "})\n", "\n", "# Set columns to a fixed order\n", "df_imu = df_imu[[DataColumns.TIME] + accelerometer_columns + gyroscope_columns]\n", "\n", "df_imu.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timegreen
00.000000649511
19.959961648214
29.959961646786
39.959961645334
49.960205644317
\n", "
" ], "text/plain": [ " time green\n", "0 0.000000 649511\n", "1 9.959961 648214\n", "2 9.959961 646786\n", "3 9.959961 645334\n", "4 9.960205 644317" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.constants import DataColumns\n", "\n", "ppg_columns = [DataColumns.PPG]\n", "\n", "# Rename dataframe columns\n", "df_ppg = df_ppg.rename(columns={\n", " 'time': DataColumns.TIME,\n", " 'ppg': DataColumns.PPG,\n", "})\n", "\n", "# Set columns to a fixed order\n", "df_ppg = df_ppg[[DataColumns.TIME] + ppg_columns]\n", "\n", "df_ppg.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Change units\n", "ParaDigMa expects acceleration to be measured in g, and rotation in deg/s. Units can be converted conveniently using ParaDigMa functionalities." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_xaccelerometer_yaccelerometer_zgyroscope_xgyroscope_ygyroscope_z
00.000000-0.5507180.574163-0.273684-115.670732-32.01219526.097561
110.040039-0.5358850.623445-0.254545-110.609757-34.63414624.695122
210.040039-0.5043060.651675-0.251675-103.231708-36.76829322.926829
310.040039-0.4885170.686603-0.265550-96.280488-38.71951221.158537
410.039795-0.4942580.725359-0.278469-92.560976-41.28048820.304878
\n", "
" ], "text/plain": [ " time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n", "0 0.000000 -0.550718 0.574163 -0.273684 -115.670732 \n", "1 10.040039 -0.535885 0.623445 -0.254545 -110.609757 \n", "2 10.040039 -0.504306 0.651675 -0.251675 -103.231708 \n", "3 10.040039 -0.488517 0.686603 -0.265550 -96.280488 \n", "4 10.039795 -0.494258 0.725359 -0.278469 -92.560976 \n", "\n", " gyroscope_y gyroscope_z \n", "0 -32.012195 26.097561 \n", "1 -34.634146 24.695122 \n", "2 -36.768293 22.926829 \n", "3 -38.719512 21.158537 \n", "4 -41.280488 20.304878 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.util import convert_units_accelerometer, convert_units_gyroscope\n", "\n", "accelerometer_units = 'm/s^2'\n", "gyroscope_units = 'deg/s'\n", "\n", "accelerometer_data = df_imu[accelerometer_columns].values\n", "gyroscope_data = df_imu[gyroscope_columns].values\n", "\n", "# Convert units to expected format\n", "df_imu[accelerometer_columns] = convert_units_accelerometer(accelerometer_data, accelerometer_units)\n", "df_imu[gyroscope_columns] = convert_units_gyroscope(gyroscope_data, gyroscope_units)\n", "\n", "df_imu.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Account for watch side\n", "For the Gait & Arm Swing pipeline, it is essential to ensure correct sensor axes orientation. For more information please read [Coordinate System](../guides/coordinate_system.md) and set the axes of the data accordingly." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_xaccelerometer_yaccelerometer_zgyroscope_xgyroscope_ygyroscope_z
00.000000-0.550718-0.5741630.273684-115.67073232.012195-26.097561
110.040039-0.535885-0.6234450.254545-110.60975734.634146-24.695122
210.040039-0.504306-0.6516750.251675-103.23170836.768293-22.926829
310.040039-0.488517-0.6866030.265550-96.28048838.719512-21.158537
410.039795-0.494258-0.7253590.278469-92.56097641.280488-20.304878
\n", "
" ], "text/plain": [ " time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n", "0 0.000000 -0.550718 -0.574163 0.273684 -115.670732 \n", "1 10.040039 -0.535885 -0.623445 0.254545 -110.609757 \n", "2 10.040039 -0.504306 -0.651675 0.251675 -103.231708 \n", "3 10.040039 -0.488517 -0.686603 0.265550 -96.280488 \n", "4 10.039795 -0.494258 -0.725359 0.278469 -92.560976 \n", "\n", " gyroscope_y gyroscope_z \n", "0 32.012195 -26.097561 \n", "1 34.634146 -24.695122 \n", "2 36.768293 -22.926829 \n", "3 38.719512 -21.158537 \n", "4 41.280488 -20.304878 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Change the orientation of the sensor according to the documented coordinate system\n", "df_imu[DataColumns.ACCELEROMETER_Y] *= -1\n", "df_imu[DataColumns.ACCELEROMETER_Z] *= -1\n", "df_imu[DataColumns.GYROSCOPE_Y] *= -1\n", "df_imu[DataColumns.GYROSCOPE_Z] *= -1\n", "\n", "df_imu.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Change time column\n", "ParaDigMa expects the data to be in seconds relative to the first row, which should be equal to 0. The toolbox has the built-in function `transform_time_array` to help users transform their time column to the correct format if the timestamps have been sampled in delta time between timestamps. In the near future, the functionalities for transforming other types (e.g., datetime format) shall be provided." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_xaccelerometer_yaccelerometer_zgyroscope_xgyroscope_ygyroscope_z
00.00000-0.550718-0.5741630.273684-115.67073232.012195-26.097561
10.01004-0.535885-0.6234450.254545-110.60975734.634146-24.695122
20.02008-0.504306-0.6516750.251675-103.23170836.768293-22.926829
30.03012-0.488517-0.6866030.265550-96.28048838.719512-21.158537
40.04016-0.494258-0.7253590.278469-92.56097641.280488-20.304878
\n", "
" ], "text/plain": [ " time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n", "0 0.00000 -0.550718 -0.574163 0.273684 -115.670732 \n", "1 0.01004 -0.535885 -0.623445 0.254545 -110.609757 \n", "2 0.02008 -0.504306 -0.651675 0.251675 -103.231708 \n", "3 0.03012 -0.488517 -0.686603 0.265550 -96.280488 \n", "4 0.04016 -0.494258 -0.725359 0.278469 -92.560976 \n", "\n", " gyroscope_y gyroscope_z \n", "0 32.012195 -26.097561 \n", "1 34.634146 -24.695122 \n", "2 36.768293 -22.926829 \n", "3 38.719512 -21.158537 \n", "4 41.280488 -20.304878 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.constants import TimeUnit\n", "from paradigma.util import transform_time_array\n", "\n", "df_imu[DataColumns.TIME] = transform_time_array(\n", " time_array=df_imu[DataColumns.TIME], \n", " input_unit_type=TimeUnit.DIFFERENCE_MS, \n", " output_unit_type=TimeUnit.RELATIVE_S,\n", ")\n", "\n", "df_imu.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timegreen
00.00000649511
10.00996648214
20.01992646786
30.02988645334
40.03984644317
\n", "
" ], "text/plain": [ " time green\n", "0 0.00000 649511\n", "1 0.00996 648214\n", "2 0.01992 646786\n", "3 0.02988 645334\n", "4 0.03984 644317" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.constants import TimeUnit\n", "from paradigma.util import transform_time_array\n", "\n", "df_ppg[DataColumns.TIME] = transform_time_array(\n", " time_array=df_ppg[DataColumns.TIME], \n", " input_unit_type=TimeUnit.DIFFERENCE_MS, \n", " output_unit_type=TimeUnit.RELATIVE_S,\n", ")\n", "\n", "df_ppg.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These dataframes are ready to be processed by ParaDigMa." ] } ], "metadata": { "kernelspec": { "display_name": "paradigma-Fn6RLG4_-py3.11", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 2 }