Tremor analysis
This tutorial shows how to run the tremor pipeline to obtain aggregated tremor measures from gyroscope sensor data. Before following along, make sure all data preparation steps have been followed in the data preparation tutorial.
In this tutorial, we use two days of data from a participant of the Personalized Parkinson Project to demonstrate the functionalities. Since ParaDigMa
expects contiguous time series, the collected data was stored in two segments each with contiguous timestamps. Per segment, we load the data and perform the following steps:
Preprocess the time series data
Extract tremor features
Detect tremor
We then combine the output of the different segments for the final step:
Compute aggregated tremor measures
Load example data
Here, we start by loading a single contiguous time series (segment), for which we continue running steps 1-3. Below we show how to run these steps for multiple segments.
We use the interally developed TSDF
(documentation) to load and store data [1]. Depending on the file extension of your time series data, examples of other Python functions for loading the data into memory include:
.csv:
pandas.read_csv()
(documentation).json:
json.load()
(documentation)
from pathlib import Path
from paradigma.util import load_tsdf_dataframe
# Set the path to where the prepared data is saved and load the data.
# Note: the test data is stored in TSDF, but you can load your data in your own way
path_to_data = Path('../../example_data')
path_to_prepared_data = path_to_data / 'imu'
segment_nr = '0001'
df_data, metadata_time, metadata_values = load_tsdf_dataframe(path_to_prepared_data, prefix=f'IMU_segment{segment_nr}')
df_data
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
Cell In[1], line 11
7 path_to_prepared_data = path_to_data / 'imu'
9 segment_nr = '0001'
---> 11 df_data, metadata_time, metadata_values = load_tsdf_dataframe(path_to_prepared_data, prefix=f'IMU_segment{segment_nr}')
13 df_data
File ~/work/paradigma/paradigma/src/paradigma/util.py:134, in load_tsdf_dataframe(path_to_data, prefix, meta_suffix, time_suffix, values_suffix)
131 time_filename = f"{prefix}_{time_suffix}"
132 values_filename = f"{prefix}_{values_suffix}"
--> 134 metadata_time, metadata_values = read_metadata(path_to_data, meta_filename, time_filename, values_filename)
135 df = tsdf.load_dataframe_from_binaries([metadata_time, metadata_values], tsdf.constants.ConcatenationType.columns)
137 return df, metadata_time, metadata_values
File ~/work/paradigma/paradigma/src/paradigma/util.py:121, in read_metadata(input_path, meta_filename, time_filename, values_filename)
118 def read_metadata(
119 input_path: str, meta_filename: str, time_filename: str, values_filename: str
120 ) -> Tuple[TSDFMetadata, TSDFMetadata]:
--> 121 metadata_dict = tsdf.load_metadata_from_path(
122 os.path.join(input_path, meta_filename)
123 )
124 metadata_time = metadata_dict[time_filename]
125 metadata_values = metadata_dict[values_filename]
File ~/.cache/pypoetry/virtualenvs/paradigma-1HID61PK-py3.12/lib/python3.12/site-packages/tsdf/read_tsdf.py:84, in load_metadata_from_path(path)
82 # The data is isomorphic to a JSON
83 with open(path, "r") as file:
---> 84 data = json.load(file)
86 abs_path = os.path.realpath(path)
87 # Parse the data and verify that it complies with TSDF requirements
File /usr/lib/python3.12/json/__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
275 parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
276 """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
277 a JSON document) to a Python object.
278
(...) 291 kwarg; otherwise ``JSONDecoder`` is used.
292 """
--> 293 return loads(fp.read(),
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File /usr/lib/python3.12/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
341 s = s.decode(detect_encoding(s), 'surrogatepass')
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
348 cls = JSONDecoder
File /usr/lib/python3.12/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
332 def decode(self, s, _w=WHITESPACE.match):
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
File /usr/lib/python3.12/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Step 1: Preprocess data
IMU sensors collect data at a fixed sampling frequency, but the sampling rate is not uniform, causing variation in time differences between timestamps. The preprocess_imu_data function therefore resamples the timestamps to be uniformly distributed, and then interpolates IMU values at these new timestamps using the original timestamps and corresponding IMU values. By setting sensor
to ‘gyroscope’, only gyroscope data is preprocessed and the accelerometer data is removed from the dataframe. Also a watch_side
should be provided, although for the tremor analysis it does not matter whether this is the correct side since the tremor features are not influenced by the gyroscope axes orientation.
from paradigma.config import IMUConfig
from paradigma.preprocessing import preprocess_imu_data
config = IMUConfig()
print(f'The data is resampled to {config.sampling_frequency} Hz.')
df_preprocessed_data = preprocess_imu_data(df_data, config, sensor='gyroscope', watch_side='left')
df_preprocessed_data
The data is resampled to 100 Hz.
time | gyroscope_x | gyroscope_y | gyroscope_z | |
---|---|---|---|---|
0 | 0.00 | 0.000000 | 1.402439 | 0.243902 |
1 | 0.01 | 0.432231 | 0.665526 | -0.123434 |
2 | 0.02 | 1.164277 | -0.069584 | -0.307536 |
3 | 0.03 | 1.151432 | -0.554928 | -0.554223 |
4 | 0.04 | 0.657189 | -0.603207 | -0.731570 |
... | ... | ... | ... | ... |
3433956 | 34339.56 | 130.392434 | 29.491627 | -26.868202 |
3433957 | 34339.57 | 135.771133 | -184.515525 | -21.544211 |
3433958 | 34339.58 | 146.364103 | -324.248909 | -5.248641 |
3433959 | 34339.59 | 273.675024 | -293.011330 | 14.618256 |
3433960 | 34339.60 | 372.878731 | -158.516265 | 35.330770 |
3433961 rows × 4 columns
Step 2: Extract tremor features
The function extract_tremor_features
extracts windows from the preprocessed gyroscope data using non-overlapping windows of length config.window_length_s
. Next, from these windows the tremor features are extracted: 12 mel-frequency cepstral coefficients (MFCCs), frequency of the peak in the power spectral density, power below tremor (0.5 - 3 Hz), and power around the tremor peak. The latter is not used for tremor detection, but to compute aggregate measures of tremor power in Step 4.
from paradigma.config import TremorConfig
from paradigma.pipelines.tremor_pipeline import extract_tremor_features
config = TremorConfig(step='features')
print(f'The window length is {config.window_length_s} seconds')
df_features = extract_tremor_features(df_preprocessed_data, config)
df_features
The window length is 4 seconds
time | mfcc_1 | mfcc_2 | mfcc_3 | mfcc_4 | mfcc_5 | mfcc_6 | mfcc_7 | mfcc_8 | mfcc_9 | mfcc_10 | mfcc_11 | mfcc_12 | freq_peak | below_tremor_power | tremor_power | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 5.323582 | 1.179579 | -0.498552 | -0.149152 | -0.063535 | -0.132090 | -0.112380 | -0.044326 | -0.025917 | 0.116045 | 0.169869 | 0.213884 | 3.75 | 0.082219 | 0.471588 |
1 | 4.0 | 5.333162 | 1.205712 | -0.607844 | -0.138371 | -0.039518 | -0.137703 | -0.069552 | -0.008029 | -0.087711 | 0.089844 | 0.152380 | 0.195165 | 3.75 | 0.071260 | 0.327252 |
2 | 8.0 | 5.180974 | 1.039548 | -0.627100 | -0.054816 | -0.016767 | -0.044817 | 0.079859 | -0.023155 | 0.024729 | 0.104989 | 0.126502 | 0.192319 | 7.75 | 0.097961 | 0.114138 |
3 | 12.0 | 5.290298 | 1.183957 | -0.627651 | -0.027235 | 0.095184 | -0.050455 | -0.024654 | 0.029754 | -0.007459 | 0.125700 | 0.146895 | 0.220589 | 7.75 | 0.193237 | 0.180988 |
4 | 16.0 | 5.128074 | 1.066869 | -0.622282 | 0.038557 | -0.034719 | 0.045109 | 0.076679 | 0.057267 | -0.024619 | 0.131755 | 0.177849 | 0.149686 | 7.75 | 0.156469 | 0.090009 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8579 | 34316.0 | 7.071408 | -0.376556 | 0.272322 | 0.068750 | 0.051588 | 0.102012 | 0.055017 | 0.115942 | 0.012746 | 0.117970 | 0.073279 | 0.057367 | 13.50 | 48.930380 | 91.971686 |
8580 | 34320.0 | 1.917642 | 0.307927 | 0.142330 | 0.265357 | 0.285635 | 0.143886 | 0.259636 | 0.195724 | 0.176947 | 0.162205 | 0.147897 | 0.170488 | 11.00 | 0.012123 | 0.000316 |
8581 | 34324.0 | 2.383806 | 0.268580 | 0.151254 | 0.414430 | 0.241540 | 0.244071 | 0.201109 | 0.209611 | 0.097146 | 0.048798 | 0.013239 | 0.035379 | 2.00 | 0.013077 | 0.000615 |
8582 | 34328.0 | 1.883626 | 0.089983 | 0.196880 | 0.300523 | 0.239185 | 0.259342 | 0.277586 | 0.206517 | 0.178499 | 0.215561 | 0.067234 | 0.123958 | 13.75 | 0.011466 | 0.000211 |
8583 | 34332.0 | 2.599103 | 0.286252 | -0.014529 | 0.475488 | 0.229446 | 0.188200 | 0.173689 | 0.033262 | 0.138957 | 0.106176 | 0.036859 | 0.082178 | 12.50 | 0.015068 | 0.000891 |
8584 rows × 16 columns
Step 3: Detect tremor
The function detect_tremor
uses a pretrained logistic regression classifier to predict the tremor probability (pred_tremor_proba
) for each window, based on the MFCCs. Using the prespecified threshold, a tremor label of 0 (no tremor) or 1 (tremor) is assigned (pred_tremor_logreg
). Furthermore, the detected tremor windows are checked for rest tremor in two ways. First, the frequency of the peak should be between 3-7 Hz. Second, we want to exclude windows with significant arm movements. We consider a window to have significant arm movement if below_tremor_power
exceeds config.movement_threshold
. The final tremor label is saved in pred_tremor_checked
. A label for predicted arm at rest (pred_arm_at_rest
, which is 1 when at rest and 0 when not at rest) was also saved, to control for the amount of arm movement during the observed time period when aggregating the amount of tremor time in Step 4 (if a person is moving their arm, they cannot have rest tremor).
from importlib.resources import files
from paradigma.pipelines.tremor_pipeline import detect_tremor
print(f'A threshold of {config.movement_threshold} deg\u00b2/s\u00b2 \
is used to determine whether the arm is at rest or in stable posture.')
# Load the pre-trained logistic regression classifier
tremor_detection_classifier_package_filename = 'tremor_detection_clf_package.pkl'
full_path_to_classifier_package = files('paradigma') / 'assets' / tremor_detection_classifier_package_filename
# Use the logistic regression classifier to detect tremor and check for rest tremor
df_predictions = detect_tremor(df_features, config, full_path_to_classifier_package)
df_predictions[['time', 'pred_tremor_proba', 'pred_tremor_logreg', 'pred_arm_at_rest', 'pred_tremor_checked']]
A threshold of 50 deg²/s² is used to determine whether the arm is at rest or in stable posture.
time | pred_tremor_proba | pred_tremor_logreg | pred_arm_at_rest | pred_tremor_checked | |
---|---|---|---|---|---|
0 | 0.0 | 0.038968 | 1 | 1 | 1 |
1 | 4.0 | 0.035365 | 1 | 1 | 1 |
2 | 8.0 | 0.031255 | 1 | 1 | 0 |
3 | 12.0 | 0.021106 | 0 | 1 | 0 |
4 | 16.0 | 0.021078 | 0 | 1 | 0 |
... | ... | ... | ... | ... | ... |
8579 | 34316.0 | 0.000296 | 0 | 1 | 0 |
8580 | 34320.0 | 0.000089 | 0 | 1 | 0 |
8581 | 34324.0 | 0.000023 | 0 | 1 | 0 |
8582 | 34328.0 | 0.000053 | 0 | 1 | 0 |
8583 | 34332.0 | 0.000049 | 0 | 1 | 0 |
8584 rows × 5 columns
Store as TSDF
The predicted probabilities (and optionally other features) can be stored and loaded in TSDF as demonstrated below.
import tsdf
from paradigma.util import write_df_data
# Set 'path_to_data' to the directory where you want to save the data
metadata_time_store = tsdf.TSDFMetadata(metadata_time.get_plain_tsdf_dict_copy(), path_to_data)
metadata_values_store = tsdf.TSDFMetadata(metadata_values.get_plain_tsdf_dict_copy(), path_to_data)
# Select the columns to be saved
metadata_time_store.channels = ['time']
metadata_values_store.channels = ['pred_tremor_proba', 'pred_tremor_logreg', 'pred_arm_at_rest', 'pred_tremor_checked']
# Set the units
metadata_time_store.units = ['Relative seconds']
metadata_values_store.units = ['Unitless', 'Unitless', 'Unitless', 'Unitless']
metadata_time_store.data_type = float
metadata_values_store.data_type = float
# Set the filenames
meta_store_filename = f'segment{segment_nr}_meta.json'
values_store_filename = meta_store_filename.replace('_meta.json', '_values.bin')
time_store_filename = meta_store_filename.replace('_meta.json', '_time.bin')
metadata_values_store.file_name = values_store_filename
metadata_time_store.file_name = time_store_filename
write_df_data(metadata_time_store, metadata_values_store, path_to_data, meta_store_filename, df_predictions)
df_predictions, _, _ = load_tsdf_dataframe(path_to_data, prefix=f'segment{segment_nr}')
df_predictions.head()
time | pred_tremor_proba | pred_tremor_logreg | pred_arm_at_rest | pred_tremor_checked | |
---|---|---|---|---|---|
0 | 0.0 | 0.038968 | 1.0 | 1.0 | 1.0 |
1 | 4.0 | 0.035365 | 1.0 | 1.0 | 1.0 |
2 | 8.0 | 0.031255 | 1.0 | 1.0 | 0.0 |
3 | 12.0 | 0.021106 | 0.0 | 1.0 | 0.0 |
4 | 16.0 | 0.021078 | 0.0 | 1.0 | 0.0 |
Run steps 1 - 3 for all segments
If your data is also stored in multiple segments, you can modify segments
in the cell below to a list of the filenames of your respective segmented data.
import pandas as pd
import datetime
import pytz
from pathlib import Path
from importlib.resources import files
from paradigma.util import load_tsdf_dataframe
from paradigma.config import IMUConfig, TremorConfig
from paradigma.preprocessing import preprocess_imu_data
from paradigma.pipelines.tremor_pipeline import extract_tremor_features, detect_tremor
# Set the path to where the prepared data is saved
path_to_data = Path('../../example_data')
path_to_prepared_data = path_to_data / 'imu'
# Load the pre-trained logistic regression classifier
tremor_detection_classifier_package_filename = 'tremor_detection_clf_package.pkl'
full_path_to_classifier_package = files('paradigma') / 'assets' / tremor_detection_classifier_package_filename
# Create a list of dataframes to store the predictions of all segments
list_df_predictions = []
segments = ['0001','0002'] # list with all available segments
for segment_nr in segments:
# Load the data
df_data, metadata_time, _ = load_tsdf_dataframe(path_to_prepared_data, prefix='IMU_segment'+segment_nr)
# 1: Preprocess the data
config = IMUConfig()
df_preprocessed_data = preprocess_imu_data(df_data, config, sensor='gyroscope', watch_side='left')
# 2: Extract features
config = TremorConfig(step='features')
df_features = extract_tremor_features(df_preprocessed_data, config)
# 3: Detect tremor
df_predictions = detect_tremor(df_features, config, full_path_to_classifier_package)
df_predictions[['time', 'pred_tremor_proba', 'pred_tremor_logreg', 'pred_arm_at_rest', 'pred_tremor_checked']]
# Create datetime column based on the start time of the segment
start_time = datetime.datetime.strptime(metadata_time.start_iso8601, '%Y-%m-%dT%H:%M:%SZ')
start_time = start_time.replace(tzinfo=pytz.timezone('UTC')).astimezone(pytz.timezone('CET')) # convert to correct timezone if necessary
df_predictions['time_dt'] = start_time + pd.to_timedelta(df_predictions['time'], unit="s")
# Add the predictions of the current segment to the list
df_predictions['segment_nr'] = segment_nr
list_df_predictions.append(df_predictions)
df_predictions = pd.concat(list_df_predictions, ignore_index=True)
Step 4: Compute aggregated tremor measures
The final step is to compute the amount of tremor time and tremor power with the function aggregate_tremor
, which aggregates over all windows in the input dataframe. Depending on the size of the input dateframe, you could select the hours and days (both optional) that you want to include in this analysis. In this case we use data collected between 8 am and 10 pm (specified as select_hours_start
and select_hours_end
), and days with at least 10 hours of data (min_hours_per_day
) based on. Based on the selected data, we compute aggregated measures for tremor time and tremor power:
Tremor time is calculated as the number of detected tremor windows, as percentage of the number of windows while the arm is at rest or in stable posture (when
below_tremor_power
does not exceedconfig.movement_threshold
). This way the tremor time is controlled for the amount of time the arm is at rest or in stable posture, when rest tremor and re-emergent tremor could occur.For tremor power the following aggregates are derived: the mode, median and 90th percentile of tremor power (specified in
config.aggregates_tremor_power
). The median and modal tremor power reflect the typical tremor severity, whereas the 90th percentile reflects the maximal tremor severity within the observed timeframe. The aggregated tremor measures and metadata are stored in a json file.
import pprint
from paradigma.util import select_hours, select_days
from paradigma.pipelines.tremor_pipeline import aggregate_tremor
select_hours_start = '08:00' # you can specifiy the hours and minutes here
select_hours_end = '22:00'
min_hours_per_day = 10
print(f'Before aggregation we select data collected between {select_hours_start} \
and {select_hours_end}. We also select days with at least {min_hours_per_day} hours of data.')
print(f'The following tremor power aggregates are derived: {config.aggregates_tremor_power}.')
# Select the hours that should be included in the analysis
df_predictions = select_hours(df_predictions, select_hours_start, select_hours_end)
# Remove days with less than the specified minimum amount of hours
df_predictions = select_days(df_predictions, min_hours_per_day)
# Compute the aggregated measures
config = TremorConfig()
d_tremor_aggregates = aggregate_tremor(df = df_predictions, config = config)
pprint.pprint(d_tremor_aggregates)
Before aggregation we select data collected between 08:00 and 22:00. We also select days with at least 10 hours of data.
The following tremor power aggregates are derived: ['mode', 'median', '90p'].
{'aggregated_tremor_measures': {'90p_tremor_power': 1.3259483071516063,
'median_tremor_power': 0.5143985314908104,
'modal_tremor_power': 0.3,
'perc_windows_tremor': 19.386769676484793},
'metadata': {'nr_valid_days': 1,
'nr_windows_rest': 8284,
'nr_windows_total': 12600}}