{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gait analysis\n", "This tutorial showcases the high-level functions composing the gait pipeline. Before following along, make sure all data preparation steps have been followed in the data preparation tutorial. \n", "\n", "To run the complete gait pipeline, a prerequisite is to have both accelerometer and gyroscope data, although a small part of the pipeline requires only accelerometer data. Roughly, the pipeline can be split into seven segments:\n", "1. Data preprocessing\n", "2. Gait feature extraction\n", "3. Gait detection\n", "4. Arm activity feature extraction\n", "5. Filtering gait\n", "6. Arm swing quantification\n", "7. Aggregation\n", "\n", "Using only accelerometer data, the first three steps can be completed. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[!WARNING] The gait pipeline has been developed on data of the Gait Up Physilog 4, and is currently being validated on the Verily Study Watch. Different sensors and positions on the wrist may affect outcomes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Throughout the tutorial, a small segment of data from a participant of the Personalized Parkinson Project is used to demonstrate the functionalities." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data\n", "Load the prepared data into memory. For example, the following functions can be used depending on the file extension of the data:\n", "- _.csv_: `pandas.read_csv()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html))\n", "- _.json_: `json.load()` ([documentation](https://docs.python.org/3/library/json.html#json.load))\n", "\n", "We use the interally developed `TSDF` ([documentation](https://biomarkersparkinson.github.io/tsdf/)) to load and store data [[1](https://arxiv.org/abs/2211.11294)]. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_xaccelerometer_yaccelerometer_zgyroscope_xgyroscope_ygyroscope_z
00.000000.5507180.574163-0.273684-115.67073232.012195-26.097561
10.010040.5358850.623445-0.254545-110.60975734.634146-24.695122
20.020080.5043060.651675-0.251675-103.23170836.768293-22.926829
30.030120.4885170.686603-0.265550-96.28048838.719512-21.158537
40.040160.4942580.725359-0.278469-92.56097641.280488-20.304878
........................
72942730.744680.234928-0.516268-0.8028710.975610-2.2560982.256098
72943730.754720.245455-0.514354-0.8066990.304878-1.7073171.768293
72944730.764760.243541-0.511005-0.8071770.304878-1.5853661.890244
72945730.774800.240191-0.514354-0.8081340.000000-1.2804881.585366
72946730.784840.243541-0.511005-0.808134-0.060976-1.0365851.219512
\n", "

72947 rows × 7 columns

\n", "
" ], "text/plain": [ " time accelerometer_x accelerometer_y accelerometer_z \\\n", "0 0.00000 0.550718 0.574163 -0.273684 \n", "1 0.01004 0.535885 0.623445 -0.254545 \n", "2 0.02008 0.504306 0.651675 -0.251675 \n", "3 0.03012 0.488517 0.686603 -0.265550 \n", "4 0.04016 0.494258 0.725359 -0.278469 \n", "... ... ... ... ... \n", "72942 730.74468 0.234928 -0.516268 -0.802871 \n", "72943 730.75472 0.245455 -0.514354 -0.806699 \n", "72944 730.76476 0.243541 -0.511005 -0.807177 \n", "72945 730.77480 0.240191 -0.514354 -0.808134 \n", "72946 730.78484 0.243541 -0.511005 -0.808134 \n", "\n", " gyroscope_x gyroscope_y gyroscope_z \n", "0 -115.670732 32.012195 -26.097561 \n", "1 -110.609757 34.634146 -24.695122 \n", "2 -103.231708 36.768293 -22.926829 \n", "3 -96.280488 38.719512 -21.158537 \n", "4 -92.560976 41.280488 -20.304878 \n", "... ... ... ... \n", "72942 0.975610 -2.256098 2.256098 \n", "72943 0.304878 -1.707317 1.768293 \n", "72944 0.304878 -1.585366 1.890244 \n", "72945 0.000000 -1.280488 1.585366 \n", "72946 -0.060976 -1.036585 1.219512 \n", "\n", "[72947 rows x 7 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pathlib import Path\n", "from paradigma.util import load_tsdf_dataframe\n", "\n", "# Set the path to the data file location\n", "path_to_data = Path('../../tests/data')\n", "path_to_prepared_data = path_to_data / '1.prepared_data' / 'imu'\n", "\n", "# Load the data from the file\n", "df_imu, _, _ = load_tsdf_dataframe(path_to_prepared_data, prefix='IMU')\n", "\n", "df_imu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Preprocess data\n", "The single function `preprocess_imu_data` in the cell below runs all necessary preprocessing steps. It requires the loaded dataframe, a configuration object `config` specifying parameters used for preprocessing, and a selection of sensors. For the sensors, options include `'accelerometer'`, `'gyroscope'`, or `'both'`.\n", "\n", "The function `preprocess_imu_data` processes the data as follows:\n", "1. Resample the data to ensure uniformly distributed sampling rate\n", "2. Apply filtering to separate the gravity component from the accelerometer" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The dataset of 730.79 seconds is automatically resampled to 100 Hz.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_xaccelerometer_yaccelerometer_zgyroscope_xgyroscope_ygyroscope_zaccelerometer_x_gravaccelerometer_y_gravaccelerometer_z_grav
00.000.0530780.010040-0.273154-115.67073232.012195-26.0975610.4976390.564123-0.000530
10.010.0383370.058802-0.256899-110.63630134.624710-24.7015370.4976660.5645100.002305
20.020.0068240.086559-0.256739-103.29276636.753000-22.9420020.4976980.5648870.005122
30.03-0.0091560.120855-0.273280-96.34906238.692931-21.1752270.4977330.5652540.007919
40.04-0.0037700.159316-0.289007-92.58573541.237328-20.3115310.4977720.5656100.010696
\n", "
" ], "text/plain": [ " time accelerometer_x accelerometer_y accelerometer_z gyroscope_x \\\n", "0 0.00 0.053078 0.010040 -0.273154 -115.670732 \n", "1 0.01 0.038337 0.058802 -0.256899 -110.636301 \n", "2 0.02 0.006824 0.086559 -0.256739 -103.292766 \n", "3 0.03 -0.009156 0.120855 -0.273280 -96.349062 \n", "4 0.04 -0.003770 0.159316 -0.289007 -92.585735 \n", "\n", " gyroscope_y gyroscope_z accelerometer_x_grav accelerometer_y_grav \\\n", "0 32.012195 -26.097561 0.497639 0.564123 \n", "1 34.624710 -24.701537 0.497666 0.564510 \n", "2 36.753000 -22.942002 0.497698 0.564887 \n", "3 38.692931 -21.175227 0.497733 0.565254 \n", "4 41.237328 -20.311531 0.497772 0.565610 \n", "\n", " accelerometer_z_grav \n", "0 -0.000530 \n", "1 0.002305 \n", "2 0.005122 \n", "3 0.007919 \n", "4 0.010696 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.config import IMUConfig\n", "from paradigma.preprocessing import preprocess_imu_data\n", "\n", "config = IMUConfig()\n", "\n", "df_preprocessed = preprocess_imu_data(\n", " df=df_imu, \n", " config=config,\n", " sensor='both',\n", " watch_side='left',\n", ")\n", "\n", "print(f\"The dataset of {df_preprocessed.shape[0] / config.sampling_frequency} seconds is automatically resampled to {config.sampling_frequency} Hz.\")\n", "df_preprocessed.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting dataframe shown above contains uniformly distributed timestamps with corresponding accelerometer and gyroscope values. Note the for accelerometer values, the following notation is used: \n", "- `accelerometer_x`: the accelerometer signal after filtering out the gravitational component\n", "- `accelerometer_x_grav`: the gravitational component of the accelerometer signal\n", "\n", "The accelerometer data is retained and used to compute gravity-related features for the classification tasks, because the gravity is informative of the position of the arm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Extract gait features\n", "With the data uniformly resampled and the gravitional component separated from the accelerometer signal, features can be extracted from the time series data. This step does not require gyroscope data. To extract the features, the pipeline executes the following steps:\n", "- Use overlapping windows to group timestamps\n", "- Extract temporal features\n", "- Use Fast Fourier Transform the transform the windowed data into the spectral domain\n", "- Extract spectral features\n", "- Combine both temporal and spectral features into a final dataframe\n", "\n", "These steps are encapsulated in `extract_gait_features` (documentation can be found [here](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/pipelines/gait_pipeline.py))." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A total of 34 features have been extracted from 725 6-second windows with 5 seconds overlap.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_x_grav_meanaccelerometer_y_grav_meanaccelerometer_z_grav_meanaccelerometer_x_grav_stdaccelerometer_y_grav_stdaccelerometer_z_grav_stdaccelerometer_std_normaccelerometer_x_power_below_gaitaccelerometer_y_power_below_gait...accelerometer_mfcc_3accelerometer_mfcc_4accelerometer_mfcc_5accelerometer_mfcc_6accelerometer_mfcc_7accelerometer_mfcc_8accelerometer_mfcc_9accelerometer_mfcc_10accelerometer_mfcc_11accelerometer_mfcc_12
00.00.5275570.485994-0.3140040.0453760.0610730.3125330.1873360.0024370.038174...0.6222300.7006910.1508850.4586760.0335950.2432430.1787620.0660510.0905670.128075
11.00.5323920.466982-0.4268830.0458340.0527770.2614920.1962700.0014980.012917...0.2880840.5813730.0867370.447655-0.1299080.2924780.0286560.1235520.0491430.081229
22.00.5451890.433756-0.5397770.0579790.0800840.1451260.2004610.0018400.001876...0.2775900.4817860.0694450.331342-0.1976690.244441-0.1517600.091923-0.1008240.003800
33.00.5565860.397208-0.6136910.0707650.1225090.0546810.1049500.0032530.001985...0.5213230.3219010.2372570.078296-0.0742400.283644-0.2396990.028845-0.0507430.036805
44.00.5718520.359068-0.6391960.0797650.1448450.0429240.0955470.0027940.002084...0.5382690.1112830.293136-0.069686-0.0594060.356973-0.2669530.0500410.0589630.082503
\n", "

5 rows × 35 columns

\n", "
" ], "text/plain": [ " time accelerometer_x_grav_mean accelerometer_y_grav_mean \\\n", "0 0.0 0.527557 0.485994 \n", "1 1.0 0.532392 0.466982 \n", "2 2.0 0.545189 0.433756 \n", "3 3.0 0.556586 0.397208 \n", "4 4.0 0.571852 0.359068 \n", "\n", " accelerometer_z_grav_mean accelerometer_x_grav_std \\\n", "0 -0.314004 0.045376 \n", "1 -0.426883 0.045834 \n", "2 -0.539777 0.057979 \n", "3 -0.613691 0.070765 \n", "4 -0.639196 0.079765 \n", "\n", " accelerometer_y_grav_std accelerometer_z_grav_std accelerometer_std_norm \\\n", "0 0.061073 0.312533 0.187336 \n", "1 0.052777 0.261492 0.196270 \n", "2 0.080084 0.145126 0.200461 \n", "3 0.122509 0.054681 0.104950 \n", "4 0.144845 0.042924 0.095547 \n", "\n", " accelerometer_x_power_below_gait accelerometer_y_power_below_gait ... \\\n", "0 0.002437 0.038174 ... \n", "1 0.001498 0.012917 ... \n", "2 0.001840 0.001876 ... \n", "3 0.003253 0.001985 ... \n", "4 0.002794 0.002084 ... \n", "\n", " accelerometer_mfcc_3 accelerometer_mfcc_4 accelerometer_mfcc_5 \\\n", "0 0.622230 0.700691 0.150885 \n", "1 0.288084 0.581373 0.086737 \n", "2 0.277590 0.481786 0.069445 \n", "3 0.521323 0.321901 0.237257 \n", "4 0.538269 0.111283 0.293136 \n", "\n", " accelerometer_mfcc_6 accelerometer_mfcc_7 accelerometer_mfcc_8 \\\n", "0 0.458676 0.033595 0.243243 \n", "1 0.447655 -0.129908 0.292478 \n", "2 0.331342 -0.197669 0.244441 \n", "3 0.078296 -0.074240 0.283644 \n", "4 -0.069686 -0.059406 0.356973 \n", "\n", " accelerometer_mfcc_9 accelerometer_mfcc_10 accelerometer_mfcc_11 \\\n", "0 0.178762 0.066051 0.090567 \n", "1 0.028656 0.123552 0.049143 \n", "2 -0.151760 0.091923 -0.100824 \n", "3 -0.239699 0.028845 -0.050743 \n", "4 -0.266953 0.050041 0.058963 \n", "\n", " accelerometer_mfcc_12 \n", "0 0.128075 \n", "1 0.081229 \n", "2 0.003800 \n", "3 0.036805 \n", "4 0.082503 \n", "\n", "[5 rows x 35 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.config import GaitConfig\n", "from paradigma.pipelines.gait_pipeline import extract_gait_features\n", "\n", "config = GaitConfig(step='gait')\n", "\n", "df_gait = extract_gait_features(\n", " df=df_preprocessed, \n", " config=config\n", ")\n", "\n", "print(f\"A total of {df_gait.shape[1]-1} features have been extracted from {df_gait.shape[0]} {config.window_length_s}-second windows with {config.window_length_s-config.window_step_length_s} seconds overlap.\")\n", "df_gait.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each row in this dataframe corresponds to a single window, with the window length and overlap set in the `config` object. Note that the `time` column has a 1-second interval instead of the 10-millisecond interval before, as it now represents the starting time of the window." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Gait detection\n", "For classification, ParaDigMa uses so-called Classifier Packages which contain a classifier, classification threshold, and a feature scaler as attributes. The classifier is a [random forest](https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.RandomForestClassifier.html) trained on a dataset of people with PD performing a wide range of activities in free-living conditions: [The Parkinson@Home Validation Study](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/). The classification threshold was set to limit the amount of false-positive predictions in the original study, i.e., to limit non-gait to be predicted as gait. The classification threshold can be changed by setting `clf_package.threshold` to a different float value. The feature scaler was similarly fitted on the original dataset, ensuring the features are within expected confined spaces to make reliable predictions." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Out of 725 windows, 53 (7.3%) were predicted as gait, and 672 (92.7%) as non-gait.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timepred_gait_proba
00.00.093240
11.00.093032
22.00.107607
33.00.132656
44.00.142432
\n", "
" ], "text/plain": [ " time pred_gait_proba\n", "0 0.0 0.093240\n", "1 1.0 0.093032\n", "2 2.0 0.107607\n", "3 3.0 0.132656\n", "4 4.0 0.142432" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from importlib.resources import files\n", "from paradigma.classification import ClassifierPackage\n", "from paradigma.pipelines.gait_pipeline import detect_gait\n", "\n", "# Set the path to the classifier package\n", "classifier_package_filename = 'gait_detection_clf_package.pkl'\n", "full_path_to_classifier_package = files('paradigma') / 'assets' / classifier_package_filename\n", "\n", "# Load the classifier package\n", "clf_package = ClassifierPackage.load(full_path_to_classifier_package)\n", "\n", "# Detecting gait returns the probability of gait for each window, which is concatenated to\n", "# the original dataframe\n", "df_gait['pred_gait_proba'] = detect_gait(\n", " df=df_gait,\n", " clf_package=clf_package\n", ")\n", "\n", "n_windows = df_gait.shape[0]\n", "n_predictions_gait = df_gait.loc[df_gait['pred_gait_proba'] >= clf_package.threshold].shape[0]\n", "perc_predictions_gait = round(100 * n_predictions_gait / n_windows, 1)\n", "n_predictions_non_gait = df_gait.loc[df_gait['pred_gait_proba'] < clf_package.threshold].shape[0]\n", "perc_predictions_non_gait = round(100 * n_predictions_non_gait / n_windows, 1)\n", "\n", "print(f\"Out of {n_windows} windows, {n_predictions_gait} ({perc_predictions_gait}%) were predicted as gait, and {n_predictions_non_gait} ({perc_predictions_non_gait}%) as non-gait.\")\n", "\n", "# Only the time and the predicted gait probability are shown, but the dataframe also contains\n", "# the extracted features\n", "df_gait[['time', 'pred_gait_proba']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, the `time` column indicates the start time of the window. Therefore, it can be observed that probabilities are predicted of overlapping windows, and not of individual timestamps. The function [`merge_timestamps_with_predictions`](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/util.py) can be used to retrieve predicted probabilities per timestamp by aggregating the predicted probabilities of overlapping windows. This function is included in the next step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Arm activity feature extraction\n", "The extraction of arm swing features is similar to the extraction of gait features, but we use a different window length and step length (`config.window_length_s`, `config.window_step_length_s`) to distinguish between gait segments with and without other arm activities. Therefore, the following steps are conducted sequentially by `extract_arm_activity_features`:\n", "- Start with the preprocessed data of step 1\n", "- Merge the gait predictions into the preprocessed data\n", "- Discard predicted non-gait activities\n", "- Create windows of the time series data and extract features" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A total of 61 features have been extracted from 50 3-second windows with 2.25 seconds overlap.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeaccelerometer_x_grav_meanaccelerometer_y_grav_meanaccelerometer_z_grav_meanaccelerometer_x_grav_stdaccelerometer_y_grav_stdaccelerometer_z_grav_stdaccelerometer_std_normaccelerometer_x_power_below_gaitaccelerometer_y_power_below_gait...gyroscope_mfcc_3gyroscope_mfcc_4gyroscope_mfcc_5gyroscope_mfcc_6gyroscope_mfcc_7gyroscope_mfcc_8gyroscope_mfcc_9gyroscope_mfcc_10gyroscope_mfcc_11gyroscope_mfcc_12
0484.000.9771340.1054310.0455780.0157220.0268170.0155460.0854540.0002840.000632...0.0428790.0542170.2758170.463253-0.0613820.1445630.243984-0.174514-0.2543180.118443
1484.750.9819500.1021690.0414510.0063730.0192880.0180580.0841790.0004450.000663...-0.139456-0.1954510.3577850.270232-0.250594-0.216808-0.0858420.1970420.0047480.044400
2485.500.9785260.1151410.0304990.0064400.0224740.0150480.0890550.0003280.000408...-0.068866-0.1800710.4746480.314642-0.180227-0.348940-0.1572150.1663440.019357-0.024083
3486.250.9746920.1282190.0250090.0031390.0140350.0079130.0922040.0005940.000150...-0.120759-0.0795690.4821820.455254-0.317632-0.206404-0.0651040.0224600.0652460.070507
4487.000.9727150.1318130.0292180.0008810.0084920.0131290.0786730.0007170.000138...-0.277626-0.0926050.1032490.414886-0.316136-0.200999-0.0906320.0775410.1117590.061460
\n", "

5 rows × 62 columns

\n", "
" ], "text/plain": [ " time accelerometer_x_grav_mean accelerometer_y_grav_mean \\\n", "0 484.00 0.977134 0.105431 \n", "1 484.75 0.981950 0.102169 \n", "2 485.50 0.978526 0.115141 \n", "3 486.25 0.974692 0.128219 \n", "4 487.00 0.972715 0.131813 \n", "\n", " accelerometer_z_grav_mean accelerometer_x_grav_std \\\n", "0 0.045578 0.015722 \n", "1 0.041451 0.006373 \n", "2 0.030499 0.006440 \n", "3 0.025009 0.003139 \n", "4 0.029218 0.000881 \n", "\n", " accelerometer_y_grav_std accelerometer_z_grav_std accelerometer_std_norm \\\n", "0 0.026817 0.015546 0.085454 \n", "1 0.019288 0.018058 0.084179 \n", "2 0.022474 0.015048 0.089055 \n", "3 0.014035 0.007913 0.092204 \n", "4 0.008492 0.013129 0.078673 \n", "\n", " accelerometer_x_power_below_gait accelerometer_y_power_below_gait ... \\\n", "0 0.000284 0.000632 ... \n", "1 0.000445 0.000663 ... \n", "2 0.000328 0.000408 ... \n", "3 0.000594 0.000150 ... \n", "4 0.000717 0.000138 ... \n", "\n", " gyroscope_mfcc_3 gyroscope_mfcc_4 gyroscope_mfcc_5 gyroscope_mfcc_6 \\\n", "0 0.042879 0.054217 0.275817 0.463253 \n", "1 -0.139456 -0.195451 0.357785 0.270232 \n", "2 -0.068866 -0.180071 0.474648 0.314642 \n", "3 -0.120759 -0.079569 0.482182 0.455254 \n", "4 -0.277626 -0.092605 0.103249 0.414886 \n", "\n", " gyroscope_mfcc_7 gyroscope_mfcc_8 gyroscope_mfcc_9 gyroscope_mfcc_10 \\\n", "0 -0.061382 0.144563 0.243984 -0.174514 \n", "1 -0.250594 -0.216808 -0.085842 0.197042 \n", "2 -0.180227 -0.348940 -0.157215 0.166344 \n", "3 -0.317632 -0.206404 -0.065104 0.022460 \n", "4 -0.316136 -0.200999 -0.090632 0.077541 \n", "\n", " gyroscope_mfcc_11 gyroscope_mfcc_12 \n", "0 -0.254318 0.118443 \n", "1 0.004748 0.044400 \n", "2 0.019357 -0.024083 \n", "3 0.065246 0.070507 \n", "4 0.111759 0.061460 \n", "\n", "[5 rows x 62 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.pipelines.gait_pipeline import extract_arm_activity_features\n", "\n", "config = GaitConfig(step='arm_activity')\n", "\n", "df_arm_activity = extract_arm_activity_features(\n", " df_timestamps=df_preprocessed, \n", " df_predictions=df_gait,\n", " config=config,\n", " threshold=clf_package.threshold\n", ")\n", "\n", "print(f\"A total of {df_arm_activity.shape[1]-1} features have been extracted from {df_arm_activity.shape[0]} {config.window_length_s}-second windows with {config.window_length_s-config.window_step_length_s} seconds overlap.\")\n", "df_arm_activity.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The features extracted are similar to the features extracted for gait detection, but the gyroscope has been added to extract additional MFCCs of this sensor. The gyroscope (measuring angular velocity) is relevant to distinguish between arm activities. Also note that the `time` column no longer starts at 0, since the first timestamps were predicted as non-gait and therefore discarded." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Filtering gait\n", "This classification task is similar to gait detection, although it uses a different classification object. The trained classifier is a logistic regression, similarly trained on the dataset of the [Parkinson@Home Validation Study](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/). Filtering gait is the process of detecting and removing gait segments containing other arm activities. This is an important process since individuals entertain a wide array of arm activities during gait: having hands in pockets, holding a dog leash, or carrying a plate to the kitchen. We trained a classifier to detect these other arm activities during gait, enabling accurate estimations of the arm swing." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Out of 50 windows, 0 (0.0%) were predicted as no_other_arm_activity, and 50 (100.0%) as other_arm_activity.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timepred_no_other_arm_activity_proba
0484.000.000002
1484.750.000004
2485.500.000020
3486.250.000013
4487.000.000011
\n", "
" ], "text/plain": [ " time pred_no_other_arm_activity_proba\n", "0 484.00 0.000002\n", "1 484.75 0.000004\n", "2 485.50 0.000020\n", "3 486.25 0.000013\n", "4 487.00 0.000011" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.classification import ClassifierPackage\n", "from paradigma.pipelines.gait_pipeline import filter_gait\n", "\n", "# Set the path to the classifier package\n", "classifier_package_filename = 'gait_filtering_clf_package.pkl'\n", "full_path_to_classifier_package = files('paradigma') / 'assets' / classifier_package_filename\n", "\n", "# Load the classifier package\n", "clf_package = ClassifierPackage.load(full_path_to_classifier_package)\n", "\n", "# Detecting no_other_arm_activity returns the probability of no_other_arm_activity for each window, which is concatenated to\n", "# the original dataframe\n", "df_arm_activity['pred_no_other_arm_activity_proba'] = filter_gait(\n", " df=df_arm_activity,\n", " clf_package=clf_package\n", ")\n", "\n", "n_windows = df_arm_activity.shape[0]\n", "n_predictions_no_other_arm_activity = df_arm_activity.loc[df_arm_activity['pred_no_other_arm_activity_proba']>=clf_package.threshold].shape[0]\n", "perc_predictions_no_other_arm_activity = round(100 * n_predictions_no_other_arm_activity / n_windows, 1)\n", "n_predictions_other_arm_activity = df_arm_activity.loc[df_arm_activity['pred_no_other_arm_activity_proba']\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
segment_nrrange_of_motionpeak_velocity
027.51158866.135460
1219.39191178.645638
2218.92859895.738234
3214.61286357.997025
4214.53634998.900658
\n", "" ], "text/plain": [ " segment_nr range_of_motion peak_velocity\n", "0 2 7.511588 66.135460\n", "1 2 19.391911 78.645638\n", "2 2 18.928598 95.738234\n", "3 2 14.612863 57.997025\n", "4 2 14.536349 98.900658" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from paradigma.pipelines.gait_pipeline import quantify_arm_swing\n", "\n", "temporary_clf_threshold = 0.00001\n", "dfs_to_quantify = ['unfiltered', 'filtered']\n", "\n", "print(f\"The original classification threshold of {clf_package.threshold} is for this tutorial temporarily set to {temporary_clf_threshold}.\\n\")\n", "\n", "clf_package.threshold = temporary_clf_threshold\n", "\n", "quantified_arm_swing, segment_meta = quantify_arm_swing(\n", " df_timestamps=df_preprocessed,\n", " df_predictions=df_arm_activity,\n", " classification_threshold=clf_package.threshold,\n", " window_length_s=config.window_length_s,\n", " max_segment_gap_s=config.max_segment_gap_s,\n", " min_segment_length_s=config.min_segment_length_s,\n", " fs=config.sampling_frequency,\n", " dfs_to_quantify=dfs_to_quantify\n", ")\n", "\n", "print(f\"Gait segments are created of minimum {config.min_segment_length_s} seconds and maximum {config.max_segment_gap_s} seconds gap between segments.\\n\")\n", "for df_name in dfs_to_quantify:\n", " print(f\"A total of {quantified_arm_swing[df_name]['segment_nr'].nunique()} {df_name} gait segments have been quantified.\")\n", "\n", "# The arm swing quantification is returned as a dictionary with two dataframes: 'unfiltered' and 'filtered'.\n", "# The 'unfiltered' dataframe contains the quantification of all gait segments, while the 'filtered' dataframe\n", "# contains only the segments that are classified as gait and no_other_arm_activity.\n", "dataset_used = 'filtered'\n", "\n", "print(f\"\\nThe first rows of the {dataset_used} dataset\")\n", "quantified_arm_swing[dataset_used].head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Furthermore, per gait segment the following metadata is stored:\n", "\n", "{'filtered': {2: {'segment_category': 'long', 'time_s': 2.25},\n", " 3: {'segment_category': 'long', 'time_s': 0.75},\n", " 4: {'segment_category': 'moderately_long', 'time_s': 6.75},\n", " 5: {'segment_category': 'moderately_long', 'time_s': 6.0},\n", " 6: {'segment_category': 'moderately_long', 'time_s': 0.75}},\n", " 'unfiltered': {1: {'segment_category': 'long', 'time_s': 12.75},\n", " 2: {'segment_category': 'moderately_long', 'time_s': 9.0},\n", " 3: {'segment_category': 'long', 'time_s': 12.0},\n", " 4: {'segment_category': 'moderately_long', 'time_s': 7.5},\n", " 5: {'segment_category': 'moderately_long', 'time_s': 7.5}}}\n" ] } ], "source": [ "import pprint\n", "\n", "print(f\"Furthermore, per gait segment the following metadata is stored:\\n\")\n", "pprint.pprint(segment_meta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The segment categories are defined as follows:\n", "- short: < 5 seconds\n", "- moderately_long: 5-10 seconds\n", "- long: 10-20 seconds\n", "- very_long: > 20 seconds\n", "\n", "As noted before, the segments (and categories) are determined based on predicted gait (unfiltered gait). Therefore, for the arm swing of filtered gait, a segment may be smaller as parts of the segment were predicted to have other arm activities, yet the category remained the same." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7: Aggregation\n", "Finally, the arm swing estimates can be aggregated across segments." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'moderately_long': {'time_s': 13.5,\n", " 'median_range_of_motion': 14.394662839301768,\n", " '95p_range_of_motion': 22.658748546739268,\n", " 'median_peak_velocity': 59.475147961606005,\n", " '95p_peak_velocity': 105.31356363194875},\n", " 'long': {'time_s': 3.0,\n", " 'median_range_of_motion': 14.612862824791089,\n", " '95p_range_of_motion': 19.29924826375146,\n", " 'median_peak_velocity': 78.64563778043262,\n", " '95p_peak_velocity': 98.26817330011067},\n", " 'all_segment_categories': {'time_s': 16.5,\n", " 'median_range_of_motion': 14.57460571480515,\n", " '95p_range_of_motion': 22.180208543070503,\n", " 'median_peak_velocity': 65.54184788176072,\n", " '95p_peak_velocity': 105.21165557470036}}\n" ] } ], "source": [ "from paradigma.pipelines.gait_pipeline import aggregate_arm_swing_params\n", "\n", "dataset_used = 'filtered'\n", "\n", "arm_swing_aggregations = aggregate_arm_swing_params(\n", " df_arm_swing_params=quantified_arm_swing[dataset_used],\n", " segment_meta=segment_meta[dataset_used],\n", " aggregates=['median', '95p']\n", ")\n", "\n", "pprint.pprint(arm_swing_aggregations, sort_dicts=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of the aggregation step contains the aggregated arm swing parameters per gait segment category. Additionally, the total time in seconds `time_s` is added to inform based on how much data the aggregations were created." ] } ], "metadata": { "kernelspec": { "display_name": "paradigma-Fn6RLG4_-py3.11", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 2 }