{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Device specific data loading\n", "This tutorial demonstrates how to load sensor data of the following devices into memory:\n", "- Axivity AX3 / AX6\n", "- Empatica EmbracePlus\n", "\n", "Note that Paradigma requires further data preparation as outlined in [the data preparation tutorial](data_preparation)." ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "### Axivity\n", "Axivity sensor data (AX3 & AX6) are stored in `.CWA` format, which requires some preparation to be processable. In this tutorial, we showcase how to transform `.CWA` files into a workable format in Python using `openmovement`. More information on the `openmovement` package can be found on the [Open Movement GitHub page](https://github.com/openmovementproject/openmovement-python).\n", "\n", "For the `openmovement` package, make sure to install the `master` branch, as this branch contains the valid code for preparing `.CWA` data. This can for example be done using `pip` by running:\n", "\n", "```bash\n", "pip install git+https://github.com/digitalinteraction/openmovement-python.git@master\n", "```\n", "Or, when using Poetry, add the following line to the list of dependencies in `pyproject.toml`: \n", "\n", "```toml\n", "openmovement = { git = \"https://github.com/digitalinteraction/openmovement-python.git\", branch = \"master\" }\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from pprint import pprint\n", "\n", "import pandas as pd\n", "from openmovement.load import CwaData\n", "\n", "# Load data\n", "path_to_input_data = Path('../../example_data/axivity/')\n", "test_data_filename = 'test_data.CWA'\n", "prepared_data_filename = 'test_data.parquet'\n", "\n", "# Note: Set include_gyro to False when using AX3 devices without gyroscope,\n", "# or when gyroscope data is not needed\n", "with CwaData(\n", " filename=path_to_input_data / test_data_filename,\n", " include_gyro=True,\n", " include_temperature=False\n", ") as cwa_data:\n", " print(\"Data format info:\")\n", " pprint(cwa_data.data_format)\n", "\n", " df = cwa_data.get_samples() # Load all samples into a DataFrame\n", "\n", "# Set time to start at 0 seconds\n", "df['time_dt'] = df['time'].copy()\n", "df['time'] = (df['time'] - df['time'].iloc[0]).dt.total_seconds()\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "### Empatica EmbracePlus\n", "Empatica EmbracePlus sensor data is stored in Apache Avro (`.avro`) format. In short, Empatica automatically writes sensor data every 30 minutes to a cloud storage with the naming convention [participant_id]_[timestamp].avro. In this tutorial we will show how to read and prepare a single `.avro` file. \n", "\n", "For more detailed documentation on using this data format in Python, consider reading [the official Apache Avro documentation](https://avro.apache.org/docs/). Extensive documentation is available on how to read and write `.avro` files in Python [here](https://avro.apache.org/docs/++version++/getting-started-python/)." ] }, { "cell_type": "code", "execution_count": null, "id": "4", "metadata": {}, "outputs": [], "source": [ "import json\n", "from pathlib import Path\n", "\n", "from avro.datafile import DataFileReader\n", "from avro.io import DatumReader\n", "\n", "path_to_input_data = Path('../../example_data/empatica/')\n", "empatica_data_filename = 'test_data.avro'\n", "\n", "## Read Avro file\n", "# reader = DataFileReader(\n", "# open(path_to_empatica_data / empatica_data_filename, \"rb\"),\n", "# DatumReader()\n", "# )\n", "with open(path_to_input_data / empatica_data_filename, \"rb\") as f:\n", " reader = DataFileReader(f, DatumReader())\n", "\n", " schema = json.loads(reader.meta.get(\"avro.schema\").decode(\"utf-8\"))\n", " empatica_data = next(reader)\n", "\n", "accel_data = empatica_data['rawData']['accelerometer']\n", "\n", "# The example data does not contain gyroscope data, but if it did,\n", "# you could access it like this:\n", "# gyro_data = empatica_data['rawData']['gyroscope']\n", "\n", "# To convert accelerometer and gyroscope data into the correct format, we need to\n", "# check the Avro schema version. This converts accelerometer into g (9.81 m/s²) units,\n", "# and gyroscope into degrees per second (rad/s). More info on units and conversion\n", "# can be found in the schema object using: print(schema).\n", "\n", "avro_version = (\n", " (empatica_data[\"schemaVersion\"][\"major\"]),\n", " (empatica_data[\"schemaVersion\"][\"minor\"]),\n", " (empatica_data[\"schemaVersion\"][\"patch\"]),\n", ")\n", "\n", "# Due to changes in the Avro schema, conversion differs for versions\n", "# before and after 6.5.0\n", "if avro_version < (6, 5, 0):\n", " physical_range = (\n", " accel_data[\"imuParams\"][\"physicalMax\"]\n", " - accel_data[\"imuParams\"][\"physicalMin\"]\n", " )\n", " digital_range = (\n", " accel_data[\"imuParams\"][\"digitalMax\"]\n", " - accel_data[\"imuParams\"][\"digitalMin\"]\n", " )\n", " accel_x = [val * physical_range / digital_range for val in accel_data[\"x\"]]\n", " accel_y = [val * physical_range / digital_range for val in accel_data[\"y\"]]\n", " accel_z = [val * physical_range / digital_range for val in accel_data[\"z\"]]\n", "else:\n", " conversion_factor = accel_data[\"imuParams\"][\"conversionFactor\"]\n", " accel_x = [val * conversion_factor for val in accel_data[\"x\"]]\n", " accel_y = [val * conversion_factor for val in accel_data[\"y\"]]\n", " accel_z = [val * conversion_factor for val in accel_data[\"z\"]]\n", "\n", "sampling_frequency = accel_data['samplingFrequency']\n", "nrows = len(accel_x)\n", "\n", "t_start = accel_data['timestampStart']\n", "t_array = [t_start + i * (1e6 /sampling_frequency) for i in range(nrows)]\n", "t_from_0_array = ([(x - t_array[0]) / 1e6 for x in t_array])\n", "\n", "df = pd.DataFrame({\n", " 'time': t_from_0_array,\n", " 'time_dt': pd.to_datetime(t_array, unit='us'),\n", " 'accel_x': accel_x,\n", " 'accel_y': accel_y,\n", " 'accel_z': accel_z,\n", "})\n", "\n", "print(\n", " f\"Data loaded from Avro file with {nrows} rows sampled \"\n", " f\"at {sampling_frequency} Hz.\"\n", ")\n", "print(f\"Start time: {pd.to_datetime(t_start, unit='us')}\")\n", "\n", "df.head()" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }