{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gait analysis\n",
    "This tutorial showcases the high-level functions composing the gait pipeline. Before following along, make sure all data preparation steps have been followed in the data preparation tutorial. \n",
    "\n",
    "In this tutorial, we use two days of data from a participant of the Personalized Parkinson Project to demonstrate the functionalities. Since `ParaDigMa` expects contiguous time series, the collected data was stored in two segments each with contiguous timestamps. Per segment, we load the data and perform the following steps:\n",
    "1. Data preprocessing\n",
    "2. Gait feature extraction\n",
    "3. Gait detection\n",
    "4. Arm activity feature extraction\n",
    "5. Filtering gait\n",
    "6. Arm swing quantification\n",
    "\n",
    "We then combine the output of the different raw data segments for the final step:\n",
    "\n",
    "7. Aggregation\n",
    "\n",
    "To run the complete gait pipeline, a prerequisite is to have both accelerometer and gyroscope data, although the first three steps can be completed using only accelerometer data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[!WARNING] The gait pipeline has been developed on data of the Gait Up Physilog 4, and is currently being validated on the Verily Study Watch. Different sensors and positions on the wrist may affect outcomes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data\n",
    "Here, we start by loading a single contiguous time series (segment), for which we continue running steps 1-6. [Below](#multiple_segments_cell) we show how to run these steps for multiple raw data segments.\n",
    "\n",
    "We use the interally developed `TSDF` ([documentation](https://biomarkersparkinson.github.io/tsdf/)) to load and store data [[1](https://arxiv.org/abs/2211.11294)]. Depending on the file extension of your time series data, examples of other Python functions for loading the data into memory include:\n",
    "- _.csv_: `pandas.read_csv()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html))\n",
    "- _.json_: `json.load()` ([documentation](https://docs.python.org/3/library/json.html#json.load))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>accelerometer_x</th>\n",
       "      <th>accelerometer_y</th>\n",
       "      <th>accelerometer_z</th>\n",
       "      <th>gyroscope_x</th>\n",
       "      <th>gyroscope_y</th>\n",
       "      <th>gyroscope_z</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>-0.474641</td>\n",
       "      <td>-0.379426</td>\n",
       "      <td>0.770335</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.402439</td>\n",
       "      <td>0.243902</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.009933</td>\n",
       "      <td>-0.472727</td>\n",
       "      <td>-0.378947</td>\n",
       "      <td>0.765072</td>\n",
       "      <td>0.426829</td>\n",
       "      <td>0.670732</td>\n",
       "      <td>-0.121951</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.019867</td>\n",
       "      <td>-0.471770</td>\n",
       "      <td>-0.375598</td>\n",
       "      <td>0.766986</td>\n",
       "      <td>1.158537</td>\n",
       "      <td>-0.060976</td>\n",
       "      <td>-0.304878</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.029800</td>\n",
       "      <td>-0.472727</td>\n",
       "      <td>-0.375598</td>\n",
       "      <td>0.770335</td>\n",
       "      <td>1.158537</td>\n",
       "      <td>-0.548780</td>\n",
       "      <td>-0.548780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.039733</td>\n",
       "      <td>-0.475120</td>\n",
       "      <td>-0.379426</td>\n",
       "      <td>0.772249</td>\n",
       "      <td>0.670732</td>\n",
       "      <td>-0.609756</td>\n",
       "      <td>-0.731707</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3455326</th>\n",
       "      <td>34339.561333</td>\n",
       "      <td>-0.257895</td>\n",
       "      <td>-0.319139</td>\n",
       "      <td>-0.761244</td>\n",
       "      <td>159.329269</td>\n",
       "      <td>14.634146</td>\n",
       "      <td>-28.658537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3455327</th>\n",
       "      <td>34339.571267</td>\n",
       "      <td>-0.555502</td>\n",
       "      <td>-0.153110</td>\n",
       "      <td>-0.671292</td>\n",
       "      <td>125.060976</td>\n",
       "      <td>-213.902440</td>\n",
       "      <td>-19.329268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3455328</th>\n",
       "      <td>34339.581200</td>\n",
       "      <td>-0.286124</td>\n",
       "      <td>-0.263636</td>\n",
       "      <td>-0.981340</td>\n",
       "      <td>158.658537</td>\n",
       "      <td>-328.170733</td>\n",
       "      <td>-3.170732</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3455329</th>\n",
       "      <td>34339.591133</td>\n",
       "      <td>-0.232536</td>\n",
       "      <td>-0.161722</td>\n",
       "      <td>-0.832536</td>\n",
       "      <td>288.841465</td>\n",
       "      <td>-281.707318</td>\n",
       "      <td>17.073171</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3455330</th>\n",
       "      <td>34339.601067</td>\n",
       "      <td>0.180383</td>\n",
       "      <td>-0.368421</td>\n",
       "      <td>-1.525837</td>\n",
       "      <td>376.219514</td>\n",
       "      <td>-140.853659</td>\n",
       "      <td>37.256098</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3455331 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 time  accelerometer_x  accelerometer_y  accelerometer_z  \\\n",
       "0            0.000000        -0.474641        -0.379426         0.770335   \n",
       "1            0.009933        -0.472727        -0.378947         0.765072   \n",
       "2            0.019867        -0.471770        -0.375598         0.766986   \n",
       "3            0.029800        -0.472727        -0.375598         0.770335   \n",
       "4            0.039733        -0.475120        -0.379426         0.772249   \n",
       "...               ...              ...              ...              ...   \n",
       "3455326  34339.561333        -0.257895        -0.319139        -0.761244   \n",
       "3455327  34339.571267        -0.555502        -0.153110        -0.671292   \n",
       "3455328  34339.581200        -0.286124        -0.263636        -0.981340   \n",
       "3455329  34339.591133        -0.232536        -0.161722        -0.832536   \n",
       "3455330  34339.601067         0.180383        -0.368421        -1.525837   \n",
       "\n",
       "         gyroscope_x  gyroscope_y  gyroscope_z  \n",
       "0           0.000000     1.402439     0.243902  \n",
       "1           0.426829     0.670732    -0.121951  \n",
       "2           1.158537    -0.060976    -0.304878  \n",
       "3           1.158537    -0.548780    -0.548780  \n",
       "4           0.670732    -0.609756    -0.731707  \n",
       "...              ...          ...          ...  \n",
       "3455326   159.329269    14.634146   -28.658537  \n",
       "3455327   125.060976  -213.902440   -19.329268  \n",
       "3455328   158.658537  -328.170733    -3.170732  \n",
       "3455329   288.841465  -281.707318    17.073171  \n",
       "3455330   376.219514  -140.853659    37.256098  \n",
       "\n",
       "[3455331 rows x 7 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "from paradigma.util import load_tsdf_dataframe\n",
    "\n",
    "# Set the path to where the prepared data is saved and load the data.\n",
    "# Note: the test data is stored in TSDF, but you can load your data in your own way\n",
    "path_to_data =  Path('../../example_data')\n",
    "path_to_prepared_data = path_to_data / 'imu'\n",
    "\n",
    "raw_data_segment_nr  = '0001' \n",
    "\n",
    "# Load the data from the file\n",
    "df_imu, metadata_time, metadata_values = load_tsdf_dataframe(path_to_prepared_data, prefix=f'IMU_segment{raw_data_segment_nr}')\n",
    "\n",
    "df_imu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Preprocess data\n",
    "The single function `preprocess_imu_data` in the cell below runs all necessary preprocessing steps. It requires the loaded dataframe, a configuration object `config` specifying parameters used for preprocessing, and a selection of sensors. For the sensors, options include `'accelerometer'`, `'gyroscope'`, or `'both'`.\n",
    "\n",
    "The function `preprocess_imu_data` processes the data as follows:\n",
    "1. Resample the data to ensure uniformly distributed sampling rate\n",
    "2. Apply filtering to separate the gravity component from the accelerometer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The dataset of 34339.61 seconds is automatically resampled to 100 Hz.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>accelerometer_x</th>\n",
       "      <th>accelerometer_y</th>\n",
       "      <th>accelerometer_z</th>\n",
       "      <th>gyroscope_x</th>\n",
       "      <th>gyroscope_y</th>\n",
       "      <th>gyroscope_z</th>\n",
       "      <th>accelerometer_x_grav</th>\n",
       "      <th>accelerometer_y_grav</th>\n",
       "      <th>accelerometer_z_grav</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.00</td>\n",
       "      <td>-0.002324</td>\n",
       "      <td>-0.001442</td>\n",
       "      <td>-0.002116</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.402439</td>\n",
       "      <td>0.243902</td>\n",
       "      <td>-0.472317</td>\n",
       "      <td>-0.377984</td>\n",
       "      <td>0.772451</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.01</td>\n",
       "      <td>-0.000390</td>\n",
       "      <td>-0.000914</td>\n",
       "      <td>-0.007396</td>\n",
       "      <td>0.432231</td>\n",
       "      <td>0.665526</td>\n",
       "      <td>-0.123434</td>\n",
       "      <td>-0.472326</td>\n",
       "      <td>-0.378012</td>\n",
       "      <td>0.772464</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.02</td>\n",
       "      <td>0.000567</td>\n",
       "      <td>0.002474</td>\n",
       "      <td>-0.005445</td>\n",
       "      <td>1.164277</td>\n",
       "      <td>-0.069584</td>\n",
       "      <td>-0.307536</td>\n",
       "      <td>-0.472336</td>\n",
       "      <td>-0.378040</td>\n",
       "      <td>0.772476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.03</td>\n",
       "      <td>-0.000425</td>\n",
       "      <td>0.002414</td>\n",
       "      <td>-0.002099</td>\n",
       "      <td>1.151432</td>\n",
       "      <td>-0.554928</td>\n",
       "      <td>-0.554223</td>\n",
       "      <td>-0.472346</td>\n",
       "      <td>-0.378068</td>\n",
       "      <td>0.772489</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.04</td>\n",
       "      <td>-0.002807</td>\n",
       "      <td>-0.001408</td>\n",
       "      <td>-0.000218</td>\n",
       "      <td>0.657189</td>\n",
       "      <td>-0.603207</td>\n",
       "      <td>-0.731570</td>\n",
       "      <td>-0.472355</td>\n",
       "      <td>-0.378096</td>\n",
       "      <td>0.772502</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   time  accelerometer_x  accelerometer_y  accelerometer_z  gyroscope_x  \\\n",
       "0  0.00        -0.002324        -0.001442        -0.002116     0.000000   \n",
       "1  0.01        -0.000390        -0.000914        -0.007396     0.432231   \n",
       "2  0.02         0.000567         0.002474        -0.005445     1.164277   \n",
       "3  0.03        -0.000425         0.002414        -0.002099     1.151432   \n",
       "4  0.04        -0.002807        -0.001408        -0.000218     0.657189   \n",
       "\n",
       "   gyroscope_y  gyroscope_z  accelerometer_x_grav  accelerometer_y_grav  \\\n",
       "0     1.402439     0.243902             -0.472317             -0.377984   \n",
       "1     0.665526    -0.123434             -0.472326             -0.378012   \n",
       "2    -0.069584    -0.307536             -0.472336             -0.378040   \n",
       "3    -0.554928    -0.554223             -0.472346             -0.378068   \n",
       "4    -0.603207    -0.731570             -0.472355             -0.378096   \n",
       "\n",
       "   accelerometer_z_grav  \n",
       "0              0.772451  \n",
       "1              0.772464  \n",
       "2              0.772476  \n",
       "3              0.772489  \n",
       "4              0.772502  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from paradigma.config import IMUConfig\n",
    "from paradigma.preprocessing import preprocess_imu_data\n",
    "\n",
    "config = IMUConfig()\n",
    "\n",
    "df_preprocessed = preprocess_imu_data(\n",
    "    df=df_imu, \n",
    "    config=config,\n",
    "    sensor='both',\n",
    "    watch_side='left',\n",
    ")\n",
    "\n",
    "print(f\"The dataset of {df_preprocessed.shape[0] / config.sampling_frequency} seconds is automatically resampled to {config.sampling_frequency} Hz.\")\n",
    "df_preprocessed.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The resulting dataframe shown above contains uniformly distributed timestamps with corresponding accelerometer and gyroscope values. Note the for accelerometer values, the following notation is used: \n",
    "- `accelerometer_x`: the accelerometer signal after filtering out the gravitational component\n",
    "- `accelerometer_x_grav`: the gravitational component of the accelerometer signal\n",
    "\n",
    "The accelerometer data is retained and used to compute gravity-related features for the classification tasks, because the gravity is informative of the position of the arm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Extract gait features\n",
    "With the data uniformly resampled and the gravitional component separated from the accelerometer signal, features can be extracted from the time series data. This step does not require gyroscope data. To extract the features, the pipeline executes the following steps:\n",
    "- Use overlapping windows to group timestamps\n",
    "- Extract temporal features\n",
    "- Use Fast Fourier Transform the transform the windowed data into the spectral domain\n",
    "- Extract spectral features\n",
    "- Combine both temporal and spectral features into a final dataframe\n",
    "\n",
    "These steps are encapsulated in `extract_gait_features` (documentation can be found [here](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/pipelines/gait_pipeline.py))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A total of 34 features have been extracted from 34334 6-second windows with 5 seconds overlap.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>accelerometer_x_grav_mean</th>\n",
       "      <th>accelerometer_y_grav_mean</th>\n",
       "      <th>accelerometer_z_grav_mean</th>\n",
       "      <th>accelerometer_x_grav_std</th>\n",
       "      <th>accelerometer_y_grav_std</th>\n",
       "      <th>accelerometer_z_grav_std</th>\n",
       "      <th>accelerometer_std_norm</th>\n",
       "      <th>accelerometer_x_power_below_gait</th>\n",
       "      <th>accelerometer_y_power_below_gait</th>\n",
       "      <th>...</th>\n",
       "      <th>accelerometer_mfcc_3</th>\n",
       "      <th>accelerometer_mfcc_4</th>\n",
       "      <th>accelerometer_mfcc_5</th>\n",
       "      <th>accelerometer_mfcc_6</th>\n",
       "      <th>accelerometer_mfcc_7</th>\n",
       "      <th>accelerometer_mfcc_8</th>\n",
       "      <th>accelerometer_mfcc_9</th>\n",
       "      <th>accelerometer_mfcc_10</th>\n",
       "      <th>accelerometer_mfcc_11</th>\n",
       "      <th>accelerometer_mfcc_12</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.472967</td>\n",
       "      <td>-0.380588</td>\n",
       "      <td>0.774287</td>\n",
       "      <td>0.000270</td>\n",
       "      <td>0.000818</td>\n",
       "      <td>0.000574</td>\n",
       "      <td>0.003377</td>\n",
       "      <td>0.000003</td>\n",
       "      <td>1.188086e-06</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.101486</td>\n",
       "      <td>0.524288</td>\n",
       "      <td>0.215990</td>\n",
       "      <td>0.429154</td>\n",
       "      <td>0.900923</td>\n",
       "      <td>1.135918</td>\n",
       "      <td>0.673404</td>\n",
       "      <td>-0.128276</td>\n",
       "      <td>-0.335655</td>\n",
       "      <td>-0.060155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>-0.473001</td>\n",
       "      <td>-0.380704</td>\n",
       "      <td>0.774541</td>\n",
       "      <td>0.000235</td>\n",
       "      <td>0.000588</td>\n",
       "      <td>0.000220</td>\n",
       "      <td>0.003194</td>\n",
       "      <td>0.000003</td>\n",
       "      <td>1.210176e-06</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.997314</td>\n",
       "      <td>0.633275</td>\n",
       "      <td>0.327645</td>\n",
       "      <td>0.451613</td>\n",
       "      <td>0.972729</td>\n",
       "      <td>1.120786</td>\n",
       "      <td>0.770134</td>\n",
       "      <td>-0.115916</td>\n",
       "      <td>-0.395856</td>\n",
       "      <td>-0.011206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>-0.473036</td>\n",
       "      <td>-0.380563</td>\n",
       "      <td>0.774578</td>\n",
       "      <td>0.000233</td>\n",
       "      <td>0.000619</td>\n",
       "      <td>0.000195</td>\n",
       "      <td>0.003188</td>\n",
       "      <td>0.000002</td>\n",
       "      <td>6.693551e-07</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.040592</td>\n",
       "      <td>0.404720</td>\n",
       "      <td>0.268514</td>\n",
       "      <td>0.507473</td>\n",
       "      <td>0.944706</td>\n",
       "      <td>1.016282</td>\n",
       "      <td>0.785686</td>\n",
       "      <td>-0.071433</td>\n",
       "      <td>-0.414269</td>\n",
       "      <td>0.020690</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>-0.472952</td>\n",
       "      <td>-0.380310</td>\n",
       "      <td>0.774660</td>\n",
       "      <td>0.000301</td>\n",
       "      <td>0.000526</td>\n",
       "      <td>0.000326</td>\n",
       "      <td>0.003020</td>\n",
       "      <td>0.000002</td>\n",
       "      <td>6.835856e-07</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.075637</td>\n",
       "      <td>0.258352</td>\n",
       "      <td>0.257234</td>\n",
       "      <td>0.506739</td>\n",
       "      <td>0.892823</td>\n",
       "      <td>0.900388</td>\n",
       "      <td>0.706368</td>\n",
       "      <td>-0.080562</td>\n",
       "      <td>-0.302595</td>\n",
       "      <td>0.054805</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>-0.472692</td>\n",
       "      <td>-0.380024</td>\n",
       "      <td>0.774889</td>\n",
       "      <td>0.000468</td>\n",
       "      <td>0.000355</td>\n",
       "      <td>0.000470</td>\n",
       "      <td>0.002869</td>\n",
       "      <td>0.000002</td>\n",
       "      <td>1.097557e-06</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.079496</td>\n",
       "      <td>0.264418</td>\n",
       "      <td>0.237172</td>\n",
       "      <td>0.587941</td>\n",
       "      <td>0.936835</td>\n",
       "      <td>0.763372</td>\n",
       "      <td>0.607845</td>\n",
       "      <td>-0.159721</td>\n",
       "      <td>-0.184856</td>\n",
       "      <td>0.128150</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 35 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   time  accelerometer_x_grav_mean  accelerometer_y_grav_mean  \\\n",
       "0   0.0                  -0.472967                  -0.380588   \n",
       "1   1.0                  -0.473001                  -0.380704   \n",
       "2   2.0                  -0.473036                  -0.380563   \n",
       "3   3.0                  -0.472952                  -0.380310   \n",
       "4   4.0                  -0.472692                  -0.380024   \n",
       "\n",
       "   accelerometer_z_grav_mean  accelerometer_x_grav_std  \\\n",
       "0                   0.774287                  0.000270   \n",
       "1                   0.774541                  0.000235   \n",
       "2                   0.774578                  0.000233   \n",
       "3                   0.774660                  0.000301   \n",
       "4                   0.774889                  0.000468   \n",
       "\n",
       "   accelerometer_y_grav_std  accelerometer_z_grav_std  accelerometer_std_norm  \\\n",
       "0                  0.000818                  0.000574                0.003377   \n",
       "1                  0.000588                  0.000220                0.003194   \n",
       "2                  0.000619                  0.000195                0.003188   \n",
       "3                  0.000526                  0.000326                0.003020   \n",
       "4                  0.000355                  0.000470                0.002869   \n",
       "\n",
       "   accelerometer_x_power_below_gait  accelerometer_y_power_below_gait  ...  \\\n",
       "0                          0.000003                      1.188086e-06  ...   \n",
       "1                          0.000003                      1.210176e-06  ...   \n",
       "2                          0.000002                      6.693551e-07  ...   \n",
       "3                          0.000002                      6.835856e-07  ...   \n",
       "4                          0.000002                      1.097557e-06  ...   \n",
       "\n",
       "   accelerometer_mfcc_3  accelerometer_mfcc_4  accelerometer_mfcc_5  \\\n",
       "0             -1.101486              0.524288              0.215990   \n",
       "1             -0.997314              0.633275              0.327645   \n",
       "2             -1.040592              0.404720              0.268514   \n",
       "3             -1.075637              0.258352              0.257234   \n",
       "4             -1.079496              0.264418              0.237172   \n",
       "\n",
       "   accelerometer_mfcc_6  accelerometer_mfcc_7  accelerometer_mfcc_8  \\\n",
       "0              0.429154              0.900923              1.135918   \n",
       "1              0.451613              0.972729              1.120786   \n",
       "2              0.507473              0.944706              1.016282   \n",
       "3              0.506739              0.892823              0.900388   \n",
       "4              0.587941              0.936835              0.763372   \n",
       "\n",
       "   accelerometer_mfcc_9  accelerometer_mfcc_10  accelerometer_mfcc_11  \\\n",
       "0              0.673404              -0.128276              -0.335655   \n",
       "1              0.770134              -0.115916              -0.395856   \n",
       "2              0.785686              -0.071433              -0.414269   \n",
       "3              0.706368              -0.080562              -0.302595   \n",
       "4              0.607845              -0.159721              -0.184856   \n",
       "\n",
       "   accelerometer_mfcc_12  \n",
       "0              -0.060155  \n",
       "1              -0.011206  \n",
       "2               0.020690  \n",
       "3               0.054805  \n",
       "4               0.128150  \n",
       "\n",
       "[5 rows x 35 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from paradigma.config import GaitConfig\n",
    "from paradigma.pipelines.gait_pipeline import extract_gait_features\n",
    "\n",
    "config = GaitConfig(step='gait')\n",
    "\n",
    "df_gait = extract_gait_features(\n",
    "    df=df_preprocessed, \n",
    "    config=config\n",
    ")\n",
    "\n",
    "print(f\"A total of {df_gait.shape[1]-1} features have been extracted from {df_gait.shape[0]} {config.window_length_s}-second windows with {config.window_length_s-config.window_step_length_s} seconds overlap.\")\n",
    "df_gait.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each row in this dataframe corresponds to a single window, with the window length and overlap set in the `config` object. Note that the `time` column has a 1-second interval instead of the 10-millisecond interval before, as it now represents the starting time of the window."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Gait detection\n",
    "For classification, ParaDigMa uses so-called Classifier Packages which contain a classifier, classification threshold, and a feature scaler as attributes. The classifier is a [random forest](https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.RandomForestClassifier.html) trained on a dataset of people with PD performing a wide range of activities in free-living conditions: [The Parkinson@Home Validation Study](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/). The classification threshold was set to limit the amount of false-positive predictions in the original study, i.e., to limit non-gait to be predicted as gait. The classification threshold can be changed by setting `clf_package.threshold` to a different float value. The feature scaler was similarly fitted on the original dataset, ensuring the features are within expected confined spaces to make reliable predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Out of 34334 windows, 2753 (8.0%) were predicted as gait, and 31581 (92.0%) as non-gait.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>pred_gait_proba</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.000024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   time  pred_gait_proba\n",
       "0   0.0         0.000023\n",
       "1   1.0         0.000024\n",
       "2   2.0         0.000023\n",
       "3   3.0         0.000023\n",
       "4   4.0         0.000023"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from importlib.resources import files\n",
    "from paradigma.classification import ClassifierPackage\n",
    "from paradigma.pipelines.gait_pipeline import detect_gait\n",
    "\n",
    "# Set the path to the classifier package\n",
    "classifier_package_filename = 'gait_detection_clf_package.pkl'\n",
    "full_path_to_classifier_package = files('paradigma') / 'assets' / classifier_package_filename\n",
    "\n",
    "# Load the classifier package\n",
    "clf_package_detection = ClassifierPackage.load(full_path_to_classifier_package)\n",
    "\n",
    "# Detecting gait returns the probability of gait for each window, which is concatenated to\n",
    "# the original dataframe\n",
    "df_gait['pred_gait_proba'] = detect_gait(\n",
    "    df=df_gait,\n",
    "    clf_package=clf_package_detection\n",
    ")\n",
    "\n",
    "n_windows = df_gait.shape[0]\n",
    "n_predictions_gait = df_gait.loc[df_gait['pred_gait_proba'] >= clf_package_detection.threshold].shape[0]\n",
    "perc_predictions_gait = round(100 * n_predictions_gait / n_windows, 1)\n",
    "n_predictions_non_gait = df_gait.loc[df_gait['pred_gait_proba'] < clf_package_detection.threshold].shape[0]\n",
    "perc_predictions_non_gait = round(100 * n_predictions_non_gait / n_windows, 1)\n",
    "\n",
    "print(f\"Out of {n_windows} windows, {n_predictions_gait} ({perc_predictions_gait}%) were predicted as gait, and {n_predictions_non_gait} ({perc_predictions_non_gait}%) as non-gait.\")\n",
    "\n",
    "# Only the time and the predicted gait probability are shown, but the dataframe also contains\n",
    "# the extracted features\n",
    "df_gait[['time', 'pred_gait_proba']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Store as TSDF\n",
    "The predicted probabilities (and optionally other features) can be stored and loaded in TSDF as demonstrated below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tsdf\n",
    "from paradigma.util import write_df_data\n",
    "\n",
    "# Set 'path_to_data' to the directory where you want to save the data\n",
    "metadata_time_store = tsdf.TSDFMetadata(metadata_time.get_plain_tsdf_dict_copy(), path_to_data)\n",
    "metadata_values_store = tsdf.TSDFMetadata(metadata_values.get_plain_tsdf_dict_copy(), path_to_data)\n",
    "\n",
    "# Select the columns to be saved \n",
    "metadata_time_store.channels = ['time']\n",
    "metadata_values_store.channels = ['pred_gait_proba']\n",
    "\n",
    "# Set the units\n",
    "metadata_time_store.units = ['Relative seconds']\n",
    "metadata_values_store.units = ['Unitless']\n",
    "metadata_time_store.data_type = float\n",
    "metadata_values_store.data_type = float\n",
    "\n",
    "# Set the filenames\n",
    "meta_store_filename = f'segment{raw_data_segment_nr}_meta.json'\n",
    "values_store_filename = meta_store_filename.replace('_meta.json', '_values.bin')\n",
    "time_store_filename = meta_store_filename.replace('_meta.json', '_time.bin')\n",
    "\n",
    "metadata_values_store.file_name = values_store_filename\n",
    "metadata_time_store.file_name = time_store_filename\n",
    "\n",
    "write_df_data(metadata_time_store, metadata_values_store, path_to_data, meta_store_filename, df_gait)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>pred_gait_proba</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.000024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>0.000023</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   time  pred_gait_proba\n",
       "0   0.0         0.000023\n",
       "1   1.0         0.000024\n",
       "2   2.0         0.000023\n",
       "3   3.0         0.000023\n",
       "4   4.0         0.000023"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_gait, _, _ = load_tsdf_dataframe(path_to_data, prefix=f'segment{raw_data_segment_nr}')\n",
    "df_gait.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once again, the `time` column indicates the start time of the window. Therefore, it can be observed that probabilities are predicted of overlapping windows, and not of individual timestamps. The function [`merge_timestamps_with_predictions`](https://github.com/biomarkersParkinson/paradigma/blob/main/src/paradigma/util.py) can be used to retrieve predicted probabilities per timestamp by aggregating the predicted probabilities of overlapping windows. This function is included in the next step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Arm activity feature extraction\n",
    "The extraction of arm swing features is similar to the extraction of gait features, but we use a different window length and step length (`config.window_length_s`, `config.window_step_length_s`) to distinguish between gait segments with and without other arm activities. Therefore, the following steps are conducted sequentially by `extract_arm_activity_features`:\n",
    "- Start with the preprocessed data of step 1\n",
    "- Merge the gait predictions into the preprocessed data\n",
    "- Discard predicted non-gait activities\n",
    "- Create windows of the time series data and extract features\n",
    "\n",
    "But, first, the gait predictions should be merged with the preprocessed time series data, such that individual timestamps have a corresponding probability of gait. The function `extract_arm_activity_features` expects a time series dataframe of predicted gait."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from paradigma.constants import DataColumns\n",
    "from paradigma.util import merge_predictions_with_timestamps\n",
    "\n",
    "# Merge gait predictions into timeseries data\n",
    "if not any(df_gait[DataColumns.PRED_GAIT_PROBA] >= clf_package_detection.threshold):\n",
    "    raise ValueError(\"No gait detected in the input data.\")\n",
    "\n",
    "gait_preprocessing_config = GaitConfig(step='gait')\n",
    "\n",
    "df = merge_predictions_with_timestamps(\n",
    "    df_ts=df_preprocessed, \n",
    "    df_predictions=df_gait, \n",
    "    pred_proba_colname=DataColumns.PRED_GAIT_PROBA,\n",
    "    window_length_s=gait_preprocessing_config.window_length_s,\n",
    "    fs=gait_preprocessing_config.sampling_frequency\n",
    ")\n",
    "\n",
    "# Add a column for predicted gait based on a fitted threshold\n",
    "df[DataColumns.PRED_GAIT] = (df[DataColumns.PRED_GAIT_PROBA] >= clf_package_detection.threshold).astype(int)\n",
    "\n",
    "# Filter the DataFrame to only include predicted gait (1)\n",
    "df = df.loc[df[DataColumns.PRED_GAIT]==1].reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A total of 61 features have been extracted from 2749 3 - second windows with 2.25 seconds overlap.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>accelerometer_x_grav_mean</th>\n",
       "      <th>accelerometer_y_grav_mean</th>\n",
       "      <th>accelerometer_z_grav_mean</th>\n",
       "      <th>accelerometer_x_grav_std</th>\n",
       "      <th>accelerometer_y_grav_std</th>\n",
       "      <th>accelerometer_z_grav_std</th>\n",
       "      <th>accelerometer_std_norm</th>\n",
       "      <th>accelerometer_x_power_below_gait</th>\n",
       "      <th>accelerometer_y_power_below_gait</th>\n",
       "      <th>...</th>\n",
       "      <th>gyroscope_mfcc_3</th>\n",
       "      <th>gyroscope_mfcc_4</th>\n",
       "      <th>gyroscope_mfcc_5</th>\n",
       "      <th>gyroscope_mfcc_6</th>\n",
       "      <th>gyroscope_mfcc_7</th>\n",
       "      <th>gyroscope_mfcc_8</th>\n",
       "      <th>gyroscope_mfcc_9</th>\n",
       "      <th>gyroscope_mfcc_10</th>\n",
       "      <th>gyroscope_mfcc_11</th>\n",
       "      <th>gyroscope_mfcc_12</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1463.00</td>\n",
       "      <td>-0.941812</td>\n",
       "      <td>-0.216149</td>\n",
       "      <td>-0.129170</td>\n",
       "      <td>0.031409</td>\n",
       "      <td>0.089397</td>\n",
       "      <td>0.060771</td>\n",
       "      <td>0.166084</td>\n",
       "      <td>0.000596</td>\n",
       "      <td>0.007746</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.555190</td>\n",
       "      <td>0.735644</td>\n",
       "      <td>0.180382</td>\n",
       "      <td>0.044897</td>\n",
       "      <td>-0.645257</td>\n",
       "      <td>-0.255383</td>\n",
       "      <td>0.121998</td>\n",
       "      <td>0.297776</td>\n",
       "      <td>0.326170</td>\n",
       "      <td>0.348648</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1463.75</td>\n",
       "      <td>-0.933787</td>\n",
       "      <td>-0.198807</td>\n",
       "      <td>-0.092710</td>\n",
       "      <td>0.045961</td>\n",
       "      <td>0.066987</td>\n",
       "      <td>0.038606</td>\n",
       "      <td>0.363777</td>\n",
       "      <td>0.001216</td>\n",
       "      <td>0.002593</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.722972</td>\n",
       "      <td>0.686450</td>\n",
       "      <td>-0.254451</td>\n",
       "      <td>-0.282469</td>\n",
       "      <td>-0.798232</td>\n",
       "      <td>-0.100043</td>\n",
       "      <td>0.028278</td>\n",
       "      <td>0.114591</td>\n",
       "      <td>0.160311</td>\n",
       "      <td>0.372009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1464.50</td>\n",
       "      <td>-0.882285</td>\n",
       "      <td>-0.265160</td>\n",
       "      <td>-0.080937</td>\n",
       "      <td>0.094924</td>\n",
       "      <td>0.146720</td>\n",
       "      <td>0.021218</td>\n",
       "      <td>0.362434</td>\n",
       "      <td>0.002429</td>\n",
       "      <td>0.001315</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.134321</td>\n",
       "      <td>0.773245</td>\n",
       "      <td>-0.218279</td>\n",
       "      <td>-0.430585</td>\n",
       "      <td>-0.437373</td>\n",
       "      <td>-0.065236</td>\n",
       "      <td>0.014411</td>\n",
       "      <td>0.083823</td>\n",
       "      <td>0.181666</td>\n",
       "      <td>0.079949</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1465.25</td>\n",
       "      <td>-0.794800</td>\n",
       "      <td>-0.405043</td>\n",
       "      <td>-0.094178</td>\n",
       "      <td>0.126863</td>\n",
       "      <td>0.212621</td>\n",
       "      <td>0.034948</td>\n",
       "      <td>0.363425</td>\n",
       "      <td>0.004974</td>\n",
       "      <td>0.008407</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.154252</td>\n",
       "      <td>1.024267</td>\n",
       "      <td>-0.161531</td>\n",
       "      <td>-0.217479</td>\n",
       "      <td>-0.153630</td>\n",
       "      <td>-0.016550</td>\n",
       "      <td>0.119570</td>\n",
       "      <td>0.095287</td>\n",
       "      <td>0.231406</td>\n",
       "      <td>0.015294</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1466.00</td>\n",
       "      <td>-0.691081</td>\n",
       "      <td>-0.578715</td>\n",
       "      <td>-0.118220</td>\n",
       "      <td>0.127414</td>\n",
       "      <td>0.219660</td>\n",
       "      <td>0.035758</td>\n",
       "      <td>0.360352</td>\n",
       "      <td>0.003998</td>\n",
       "      <td>0.004305</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.763188</td>\n",
       "      <td>0.763812</td>\n",
       "      <td>-0.158849</td>\n",
       "      <td>-0.023935</td>\n",
       "      <td>-0.006564</td>\n",
       "      <td>-0.185257</td>\n",
       "      <td>-0.120585</td>\n",
       "      <td>0.090823</td>\n",
       "      <td>0.171506</td>\n",
       "      <td>-0.038381</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 62 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      time  accelerometer_x_grav_mean  accelerometer_y_grav_mean  \\\n",
       "0  1463.00                  -0.941812                  -0.216149   \n",
       "1  1463.75                  -0.933787                  -0.198807   \n",
       "2  1464.50                  -0.882285                  -0.265160   \n",
       "3  1465.25                  -0.794800                  -0.405043   \n",
       "4  1466.00                  -0.691081                  -0.578715   \n",
       "\n",
       "   accelerometer_z_grav_mean  accelerometer_x_grav_std  \\\n",
       "0                  -0.129170                  0.031409   \n",
       "1                  -0.092710                  0.045961   \n",
       "2                  -0.080937                  0.094924   \n",
       "3                  -0.094178                  0.126863   \n",
       "4                  -0.118220                  0.127414   \n",
       "\n",
       "   accelerometer_y_grav_std  accelerometer_z_grav_std  accelerometer_std_norm  \\\n",
       "0                  0.089397                  0.060771                0.166084   \n",
       "1                  0.066987                  0.038606                0.363777   \n",
       "2                  0.146720                  0.021218                0.362434   \n",
       "3                  0.212621                  0.034948                0.363425   \n",
       "4                  0.219660                  0.035758                0.360352   \n",
       "\n",
       "   accelerometer_x_power_below_gait  accelerometer_y_power_below_gait  ...  \\\n",
       "0                          0.000596                          0.007746  ...   \n",
       "1                          0.001216                          0.002593  ...   \n",
       "2                          0.002429                          0.001315  ...   \n",
       "3                          0.004974                          0.008407  ...   \n",
       "4                          0.003998                          0.004305  ...   \n",
       "\n",
       "   gyroscope_mfcc_3  gyroscope_mfcc_4  gyroscope_mfcc_5  gyroscope_mfcc_6  \\\n",
       "0         -0.555190          0.735644          0.180382          0.044897   \n",
       "1         -0.722972          0.686450         -0.254451         -0.282469   \n",
       "2         -1.134321          0.773245         -0.218279         -0.430585   \n",
       "3         -1.154252          1.024267         -0.161531         -0.217479   \n",
       "4         -0.763188          0.763812         -0.158849         -0.023935   \n",
       "\n",
       "   gyroscope_mfcc_7  gyroscope_mfcc_8  gyroscope_mfcc_9  gyroscope_mfcc_10  \\\n",
       "0         -0.645257         -0.255383          0.121998           0.297776   \n",
       "1         -0.798232         -0.100043          0.028278           0.114591   \n",
       "2         -0.437373         -0.065236          0.014411           0.083823   \n",
       "3         -0.153630         -0.016550          0.119570           0.095287   \n",
       "4         -0.006564         -0.185257         -0.120585           0.090823   \n",
       "\n",
       "   gyroscope_mfcc_11  gyroscope_mfcc_12  \n",
       "0           0.326170           0.348648  \n",
       "1           0.160311           0.372009  \n",
       "2           0.181666           0.079949  \n",
       "3           0.231406           0.015294  \n",
       "4           0.171506          -0.038381  \n",
       "\n",
       "[5 rows x 62 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from paradigma.pipelines.gait_pipeline import extract_arm_activity_features\n",
    "\n",
    "config = GaitConfig(step='arm_activity')\n",
    "\n",
    "df_arm_activity = extract_arm_activity_features(\n",
    "    df=df, \n",
    "    config=config,\n",
    ")\n",
    "\n",
    "print(f\"A total of {df_arm_activity.shape[1] - 1} features have been extracted from {df_arm_activity.shape[0]} {config.window_length_s} - second windows with {config.window_length_s - config.window_step_length_s} seconds overlap.\")\n",
    "df_arm_activity.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The features extracted are similar to the features extracted for gait detection, but the gyroscope has been added to extract additional MFCCs of this sensor. The gyroscope (measuring angular velocity) is relevant to distinguish between arm activities. Also note that the `time` column no longer starts at 0, since the first timestamps were predicted as non-gait and therefore discarded."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Filtering gait\n",
    "This classification task is similar to gait detection, although it uses a different classification object. The trained classifier is a logistic regression, similarly trained on the dataset of the [Parkinson@Home Validation Study](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/). Filtering gait is the process of detecting and removing gait segments containing other arm activities. This is an important process since individuals entertain a wide array of arm activities during gait: having hands in pockets, holding a dog leash, or carrying a plate to the kitchen. We trained a classifier to detect these other arm activities during gait, enabling accurate estimations of the arm swing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Out of 2749 windows, 916 (33.3%) were predicted as no_other_arm_activity, and 1833 (66.7%) as other_arm_activity.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>pred_no_other_arm_activity_proba</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1463.00</td>\n",
       "      <td>0.199764</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1463.75</td>\n",
       "      <td>0.107982</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1464.50</td>\n",
       "      <td>0.138796</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1465.25</td>\n",
       "      <td>0.168050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1466.00</td>\n",
       "      <td>0.033986</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      time  pred_no_other_arm_activity_proba\n",
       "0  1463.00                          0.199764\n",
       "1  1463.75                          0.107982\n",
       "2  1464.50                          0.138796\n",
       "3  1465.25                          0.168050\n",
       "4  1466.00                          0.033986"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from paradigma.classification import ClassifierPackage\n",
    "from paradigma.pipelines.gait_pipeline import filter_gait\n",
    "\n",
    "# Set the path to the classifier package\n",
    "classifier_package_filename = 'gait_filtering_clf_package.pkl'\n",
    "full_path_to_classifier_package = files('paradigma') / 'assets' / classifier_package_filename\n",
    "\n",
    "# Load the classifier package\n",
    "clf_package_filtering = ClassifierPackage.load(full_path_to_classifier_package)\n",
    "\n",
    "# Detecting no_other_arm_activity returns the probability of no_other_arm_activity for each window, which is concatenated to\n",
    "# the original dataframe\n",
    "df_arm_activity['pred_no_other_arm_activity_proba'] = filter_gait(\n",
    "    df=df_arm_activity,\n",
    "    clf_package=clf_package_filtering\n",
    ")\n",
    "\n",
    "n_windows = df_arm_activity.shape[0]\n",
    "n_predictions_no_other_arm_activity = df_arm_activity.loc[df_arm_activity['pred_no_other_arm_activity_proba']>=clf_package_filtering.threshold].shape[0]\n",
    "perc_predictions_no_other_arm_activity = round(100 * n_predictions_no_other_arm_activity / n_windows, 1)\n",
    "n_predictions_other_arm_activity = df_arm_activity.loc[df_arm_activity['pred_no_other_arm_activity_proba']<clf_package_filtering.threshold].shape[0]\n",
    "perc_predictions_other_arm_activity = round(100 * n_predictions_other_arm_activity / n_windows, 1)\n",
    "\n",
    "print(f\"Out of {n_windows} windows, {n_predictions_no_other_arm_activity} ({perc_predictions_no_other_arm_activity}%) were predicted as no_other_arm_activity, and {n_predictions_other_arm_activity} ({perc_predictions_other_arm_activity}%) as other_arm_activity.\")\n",
    "\n",
    "# Only the time and predicted probabilities are shown, but the dataframe also contains\n",
    "# the extracted features\n",
    "df_arm_activity[['time', 'pred_no_other_arm_activity_proba']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6: Arm swing quantification\n",
    "The next step is to extract arm swing estimates from the predicted gait segments without other arm activities. Arm swing estimates can be calculated for both filtered and unfiltered gait, with the latter being predicted gait including all arm activities. Specifically, the range of motion (`'range_of_motion'`) and peak angular velocity (`'peak_velocity'`) are extracted.  \n",
    "\n",
    "This step creates gait segments based on consecutively predicted gait windows. A new gait segment is created if the gap between consecutive gait predictions exceeds `config.max_segment_gap_s`. Furthermore, a gait segment is considered valid if it is of at minimum length `config.min_segment_length_s`. \n",
    "\n",
    "But, first, similar to the step of extracting arm activity features, the predictions of the previous step should be merged with the preprocessed time series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge arm activity predictions into timeseries data\n",
    "\n",
    "if not any(df_arm_activity[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY_PROBA] >= clf_package_filtering.threshold):\n",
    "    raise ValueError(\"No gait without other arm activities detected in the input data.\")\n",
    "\n",
    "config = GaitConfig(step='arm_activity')\n",
    "\n",
    "df = merge_predictions_with_timestamps(\n",
    "    df_ts=df_preprocessed, \n",
    "    df_predictions=df_arm_activity, \n",
    "    pred_proba_colname=DataColumns.PRED_NO_OTHER_ARM_ACTIVITY_PROBA,\n",
    "    window_length_s=config.window_length_s,\n",
    "    fs=config.sampling_frequency\n",
    ")\n",
    "\n",
    "# Add a column for predicted gait based on a fitted threshold\n",
    "df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY] = (df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY_PROBA] >= clf_package_filtering.threshold).astype(int)\n",
    "\n",
    "# Filter the DataFrame to only include predicted gait (1)\n",
    "df = df.loc[df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY]==1].reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The arm swing quantification is based on the filtered gait segments.\n",
      "\n",
      "Gait segments are created of minimum 1.5 seconds and maximum 1.5 seconds gap between segments.\n",
      "\n",
      "A total of 84 filtered gait segments have been quantified.\n",
      "\n",
      "Metadata of the first gait segment:\n",
      "{'duration_s': 9.0,\n",
      " 'end_time_s': 2230.74,\n",
      " 'segment_category': 'moderately_long',\n",
      " 'start_time_s': 2221.75}\n",
      "\n",
      "Individual arm swings of the first gait segment of the filtered dataset:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>segment_nr</th>\n",
       "      <th>segment_category</th>\n",
       "      <th>range_of_motion</th>\n",
       "      <th>peak_velocity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>19.218491</td>\n",
       "      <td>90.807689</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>21.267287</td>\n",
       "      <td>105.781357</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>23.582098</td>\n",
       "      <td>103.932332</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>23.757712</td>\n",
       "      <td>114.846304</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>17.430734</td>\n",
       "      <td>63.297391</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>12.139037</td>\n",
       "      <td>59.740258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>6.681346</td>\n",
       "      <td>36.802784</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>6.293493</td>\n",
       "      <td>30.793498</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>7.892546</td>\n",
       "      <td>42.481470</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>9.633521</td>\n",
       "      <td>43.837249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>9.679263</td>\n",
       "      <td>38.867993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>9.437900</td>\n",
       "      <td>34.112233</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>moderately_long</td>\n",
       "      <td>9.272199</td>\n",
       "      <td>33.344802</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    segment_nr segment_category  range_of_motion  peak_velocity\n",
       "0            1  moderately_long        19.218491      90.807689\n",
       "1            1  moderately_long        21.267287     105.781357\n",
       "2            1  moderately_long        23.582098     103.932332\n",
       "3            1  moderately_long        23.757712     114.846304\n",
       "4            1  moderately_long        17.430734      63.297391\n",
       "5            1  moderately_long        12.139037      59.740258\n",
       "6            1  moderately_long         6.681346      36.802784\n",
       "7            1  moderately_long         6.293493      30.793498\n",
       "8            1  moderately_long         7.892546      42.481470\n",
       "9            1  moderately_long         9.633521      43.837249\n",
       "10           1  moderately_long         9.679263      38.867993\n",
       "11           1  moderately_long         9.437900      34.112233\n",
       "12           1  moderately_long         9.272199      33.344802"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from paradigma.pipelines.gait_pipeline import quantify_arm_swing\n",
    "from pprint import pprint\n",
    "\n",
    "# Set to True to quantify arm swing based on the filtered gait segments, and False\n",
    "# to quantify arm swing based on all gait segments\n",
    "filtered = True\n",
    "\n",
    "if filtered:\n",
    "    dataset_used = 'filtered'\n",
    "    print(f\"The arm swing quantification is based on the filtered gait segments.\\n\")\n",
    "else:\n",
    "    dataset_used = 'unfiltered'\n",
    "    print(f\"The arm swing quantification is based on all gait segments.\\n\")\n",
    "\n",
    "quantified_arm_swing, gait_segment_meta = quantify_arm_swing(\n",
    "    df=df,\n",
    "    fs=config.sampling_frequency,\n",
    "    filtered=filtered,\n",
    "    max_segment_gap_s=config.max_segment_gap_s,\n",
    "    min_segment_length_s=config.min_segment_length_s,\n",
    ")\n",
    "\n",
    "print(f\"Gait segments are created of minimum {config.min_segment_length_s} seconds and maximum {config.max_segment_gap_s} seconds gap between segments.\\n\")\n",
    "print(f\"A total of {quantified_arm_swing['segment_nr'].nunique()} {dataset_used} gait segments have been quantified.\")\n",
    "\n",
    "print(f\"\\nMetadata of the first gait segment:\")\n",
    "pprint(gait_segment_meta['per_segment'][1])\n",
    "\n",
    "print(f\"\\nIndividual arm swings of the first gait segment of the {dataset_used} dataset:\")\n",
    "quantified_arm_swing.loc[quantified_arm_swing['segment_nr']==1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The gait segment categories are defined as follows:\n",
    "- short: < 5 seconds\n",
    "- moderately_long: 5-10 seconds\n",
    "- long: 10-20 seconds\n",
    "- very_long: > 20 seconds\n",
    "\n",
    "As noted before, the gait segments (and categories) are determined based on predicted gait (unfiltered gait). Therefore, for the arm swing of filtered gait, a gait segment may be smaller as parts of the gait segment were predicted to have other arm activities, yet the category remained the same."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run steps 1-6 for the all raw data segment(s) <a id='multiple_segments_cell'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your data is also stored in multiple raw data segments, you can modify `raw_data_segments` in the cell below to a list of the filenames of your respective segmented data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from pathlib import Path\n",
    "from importlib.resources import files\n",
    "from pprint import pprint\n",
    "\n",
    "from paradigma.util import load_tsdf_dataframe, merge_predictions_with_timestamps\n",
    "from paradigma.config import IMUConfig, GaitConfig\n",
    "from paradigma.preprocessing import preprocess_imu_data\n",
    "from paradigma.pipelines.gait_pipeline import extract_gait_features, detect_gait,extract_arm_activity_features, filter_gait, quantify_arm_swing\n",
    "from paradigma.constants import DataColumns\n",
    "from paradigma.classification import ClassifierPackage\n",
    "\n",
    "# Set the path to where the prepared data is saved\n",
    "path_to_data =  Path('../../example_data')\n",
    "path_to_prepared_data = path_to_data / 'imu'\n",
    "\n",
    "# Load the gait detection classifier package\n",
    "classifier_package_filename = 'gait_detection_clf_package.pkl'\n",
    "full_path_to_classifier_package = files('paradigma') / 'assets' / classifier_package_filename\n",
    "clf_package_detection = ClassifierPackage.load(full_path_to_classifier_package)\n",
    "\n",
    "# Load the gait filtering classifier package\n",
    "classifier_package_filename = 'gait_filtering_clf_package.pkl'\n",
    "full_path_to_classifier_package = files('paradigma') / 'assets' / classifier_package_filename\n",
    "clf_package_filtering = ClassifierPackage.load(full_path_to_classifier_package)\n",
    "\n",
    "# Set to True to quantify arm swing based on the filtered gait segments, and False\n",
    "# to quantify arm swing based on all gait segments\n",
    "filtered = True\n",
    "\n",
    "# Create a list to store all quantified arm swing segments \n",
    "list_quantified_arm_swing = []\n",
    "\n",
    "raw_data_segments  = ['0001','0002'] # list with all available raw data segments\n",
    "\n",
    "for raw_data_segment_nr in raw_data_segments:\n",
    "    \n",
    "    # Load the data\n",
    "    df_imu, _, _ = load_tsdf_dataframe(path_to_prepared_data, prefix=f'IMU_segment{raw_data_segment_nr}')\n",
    "\n",
    "    # 1: Preprocess the data\n",
    "    config = IMUConfig()\n",
    "\n",
    "    df_preprocessed = preprocess_imu_data(\n",
    "        df=df_imu, \n",
    "        config=config,\n",
    "        sensor='both',\n",
    "        watch_side='left',\n",
    "    )\n",
    "\n",
    "    # 2: Extract gait features\n",
    "    config = GaitConfig(step='gait')\n",
    "\n",
    "    df_gait = extract_gait_features(\n",
    "        df=df_preprocessed, \n",
    "        config=config\n",
    "    )\n",
    "\n",
    "    # 3: Detect gait\n",
    "    df_gait['pred_gait_proba'] = detect_gait(\n",
    "        df=df_gait,\n",
    "        clf_package=clf_package_detection\n",
    "    )\n",
    "\n",
    "    # Merge gait predictions into timeseries data\n",
    "    if not any(df_gait[DataColumns.PRED_GAIT_PROBA] >= clf_package_detection.threshold):\n",
    "        raise ValueError(\"No gait detected in the input data.\")\n",
    "    \n",
    "    df = merge_predictions_with_timestamps(\n",
    "        df_ts=df_preprocessed, \n",
    "        df_predictions=df_gait, \n",
    "        pred_proba_colname=DataColumns.PRED_GAIT_PROBA,\n",
    "        window_length_s=config.window_length_s,\n",
    "        fs=config.sampling_frequency\n",
    "    )\n",
    "\n",
    "    df[DataColumns.PRED_GAIT] = (df[DataColumns.PRED_GAIT_PROBA] >= clf_package_detection.threshold).astype(int)\n",
    "    df = df.loc[df[DataColumns.PRED_GAIT]==1].reset_index(drop=True)\n",
    "\n",
    "    # 4: Extract arm activity features\n",
    "    config = GaitConfig(step='arm_activity')\n",
    "\n",
    "    df_arm_activity = extract_arm_activity_features(\n",
    "        df=df, \n",
    "        config=config,\n",
    "    )\n",
    "\n",
    "    # 5: Filter gait\n",
    "    df_arm_activity['pred_no_other_arm_activity_proba'] = filter_gait(\n",
    "        df=df_arm_activity,\n",
    "        clf_package=clf_package_filtering\n",
    "    )\n",
    "\n",
    "    # Merge arm activity predictions into timeseries data\n",
    "    if not any(df_arm_activity[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY_PROBA] >= clf_package_filtering.threshold):\n",
    "        raise ValueError(\"No gait without other arm activities detected in the input data.\")\n",
    "\n",
    "    df = merge_predictions_with_timestamps(\n",
    "        df_ts=df_preprocessed, \n",
    "        df_predictions=df_arm_activity, \n",
    "        pred_proba_colname=DataColumns.PRED_NO_OTHER_ARM_ACTIVITY_PROBA,\n",
    "        window_length_s=config.window_length_s,\n",
    "        fs=config.sampling_frequency\n",
    "    )\n",
    "\n",
    "    df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY] = (df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY_PROBA] >= clf_package_filtering.threshold).astype(int)\n",
    "    df = df.loc[df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY]==1].reset_index(drop=True)\n",
    "\n",
    "    # 6: Quantify arm swing\n",
    "    quantified_arm_swing, gait_segment_meta = quantify_arm_swing(\n",
    "        df=df,\n",
    "        fs=config.sampling_frequency,\n",
    "        filtered=filtered,\n",
    "        max_segment_gap_s=config.max_segment_gap_s,\n",
    "        min_segment_length_s=config.min_segment_length_s,\n",
    "    )\n",
    "\n",
    "    # Add the predictions of the current raw data segment to the list\n",
    "    quantified_arm_swing['raw_data_segment_nr'] = raw_data_segment_nr\n",
    "    list_quantified_arm_swing.append(quantified_arm_swing)\n",
    "\n",
    "quantified_arm_swing = pd.concat(list_quantified_arm_swing, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7: Aggregation\n",
    "Finally, the arm swing estimates can be aggregated across all gait segments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'long': {'duration_s': 60.75,\n",
      "          'median_range_of_motion': 15.78108745792784,\n",
      "          '95p_range_of_motion': 45.16540046751929,\n",
      "          'median_peak_velocity': 86.83257977334745,\n",
      "          '95p_peak_velocity': 219.97254034894718},\n",
      " 'short': {'duration_s': 153.75,\n",
      "           'median_range_of_motion': 14.225382307390944,\n",
      "           '95p_range_of_motion': 40.53847370093226,\n",
      "           'median_peak_velocity': 71.56035976932178,\n",
      "           '95p_peak_velocity': 197.13328716416063},\n",
      " 'very_long': {'duration_s': 1905.75,\n",
      "               'median_range_of_motion': 25.2896510096605,\n",
      "               '95p_range_of_motion': 43.74907398039543,\n",
      "               'median_peak_velocity': 125.9443142903539,\n",
      "               '95p_peak_velocity': 217.80854223601992},\n",
      " 'moderately_long': {'duration_s': 187.5,\n",
      "                     'median_range_of_motion': 15.73004566220565,\n",
      "                     '95p_range_of_motion': 54.55881567144294,\n",
      "                     'median_peak_velocity': 77.94780939826387,\n",
      "                     '95p_peak_velocity': 256.9799773546029},\n",
      " 'all_segment_categories': {'duration_s': 2307.75,\n",
      "                            'median_range_of_motion': 23.100608971051315,\n",
      "                            '95p_range_of_motion': 45.92600123148869,\n",
      "                            'median_peak_velocity': 116.50364930684765,\n",
      "                            '95p_peak_velocity': 219.2008357820751}}\n"
     ]
    }
   ],
   "source": [
    "from paradigma.pipelines.gait_pipeline import aggregate_arm_swing_params\n",
    "\n",
    "arm_swing_aggregations = aggregate_arm_swing_params(\n",
    "    df_arm_swing_params=quantified_arm_swing,\n",
    "    segment_meta=gait_segment_meta['per_segment'],\n",
    "    aggregates=['median', '95p']\n",
    ")\n",
    "\n",
    "pprint(arm_swing_aggregations, sort_dicts=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of the aggregation step contains the aggregated arm swing parameters per gait segment category. Additionally, the total time in seconds `time_s` is added to inform based on how much data the aggregations were created."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "paradigma-Fn6RLG4_-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}