{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Device specific data loading\n",
    "This tutorial demonstrates how to load sensor data of the following devices into memory:\n",
    "- Axivity AX3 / AX6\n",
    "- Empatica EmbracePlus\n",
    "\n",
    "Note that Paradigma requires further data preparation as outlined in [the data preparation tutorial](data_preparation)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "### Axivity\n",
    "Axivity sensor data (AX3 & AX6) are stored in `.CWA` format, which requires some preparation to be processable. In this tutorial, we showcase how to transform `.CWA` files into a workable format in Python using `openmovement`. More information on the `openmovement` package can be found on the [Open Movement GitHub page](https://github.com/openmovementproject/openmovement-python).\n",
    "\n",
    "For the `openmovement` package, make sure to install the `master` branch, as this branch contains the valid code for preparing `.CWA` data. This can for example be done using `pip` by running:\n",
    "\n",
    "```bash\n",
    "pip install git+https://github.com/digitalinteraction/openmovement-python.git@master\n",
    "```\n",
    "Or, when using Poetry, add the following line to the list of dependencies in `pyproject.toml`: \n",
    "\n",
    "```toml\n",
    "openmovement = { git = \"https://github.com/digitalinteraction/openmovement-python.git\", branch = \"master\" }\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "from pprint import pprint\n",
    "\n",
    "import pandas as pd\n",
    "from openmovement.load import CwaData\n",
    "\n",
    "# Load data\n",
    "path_to_input_data = Path('../../example_data/axivity/')\n",
    "test_data_filename = 'test_data.CWA'\n",
    "prepared_data_filename = 'test_data.parquet'\n",
    "\n",
    "# Note: Set include_gyro to False when using AX3 devices without gyroscope,\n",
    "# or when gyroscope data is not needed\n",
    "with CwaData(\n",
    "    filename=path_to_input_data / test_data_filename,\n",
    "    include_gyro=True,\n",
    "    include_temperature=False\n",
    ") as cwa_data:\n",
    "    print(\"Data format info:\")\n",
    "    pprint(cwa_data.data_format)\n",
    "\n",
    "    df = cwa_data.get_samples()  # Load all samples into a DataFrame\n",
    "\n",
    "# Set time to start at 0 seconds\n",
    "df['time_dt'] = df['time'].copy()\n",
    "df['time'] = (df['time'] - df['time'].iloc[0]).dt.total_seconds()\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "### Empatica EmbracePlus\n",
    "Empatica EmbracePlus sensor data is stored in Apache Avro (`.avro`) format. In short, Empatica automatically writes sensor data every 30 minutes to a cloud storage with the naming convention [participant_id]_[timestamp].avro. In this tutorial we will show how to read and prepare a single `.avro` file. \n",
    "\n",
    "For more detailed documentation on using this data format in Python, consider reading [the official Apache Avro documentation](https://avro.apache.org/docs/). Extensive documentation is available on how to read and write `.avro` files in Python [here](https://avro.apache.org/docs/++version++/getting-started-python/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "from avro.datafile import DataFileReader\n",
    "from avro.io import DatumReader\n",
    "\n",
    "path_to_input_data = Path('../../example_data/empatica/')\n",
    "empatica_data_filename = 'test_data.avro'\n",
    "\n",
    "## Read Avro file\n",
    "# reader = DataFileReader(\n",
    "#     open(path_to_empatica_data / empatica_data_filename, \"rb\"),\n",
    "#     DatumReader()\n",
    "# )\n",
    "with open(path_to_input_data / empatica_data_filename, \"rb\") as f:\n",
    "    reader = DataFileReader(f, DatumReader())\n",
    "\n",
    "    schema = json.loads(reader.meta.get(\"avro.schema\").decode(\"utf-8\"))\n",
    "    empatica_data = next(reader)\n",
    "\n",
    "accel_data = empatica_data['rawData']['accelerometer']\n",
    "\n",
    "# The example data does not contain gyroscope data, but if it did,\n",
    "# you could access it like this:\n",
    "# gyro_data = empatica_data['rawData']['gyroscope']\n",
    "\n",
    "# To convert accelerometer and gyroscope data into the correct format, we need to\n",
    "# check the Avro schema version. This converts accelerometer into g (9.81 m/s²) units,\n",
    "# and gyroscope into degrees per second (rad/s). More info on units and conversion\n",
    "# can be found in the schema object using: print(schema).\n",
    "\n",
    "avro_version = (\n",
    "    (empatica_data[\"schemaVersion\"][\"major\"]),\n",
    "    (empatica_data[\"schemaVersion\"][\"minor\"]),\n",
    "    (empatica_data[\"schemaVersion\"][\"patch\"]),\n",
    ")\n",
    "\n",
    "# Due to changes in the Avro schema, conversion differs for versions\n",
    "# before and after 6.5.0\n",
    "if avro_version < (6, 5, 0):\n",
    "    physical_range = (\n",
    "        accel_data[\"imuParams\"][\"physicalMax\"]\n",
    "        - accel_data[\"imuParams\"][\"physicalMin\"]\n",
    "    )\n",
    "    digital_range = (\n",
    "        accel_data[\"imuParams\"][\"digitalMax\"]\n",
    "        - accel_data[\"imuParams\"][\"digitalMin\"]\n",
    "    )\n",
    "    accel_x = [val * physical_range / digital_range for val in accel_data[\"x\"]]\n",
    "    accel_y = [val * physical_range / digital_range for val in accel_data[\"y\"]]\n",
    "    accel_z = [val * physical_range / digital_range for val in accel_data[\"z\"]]\n",
    "else:\n",
    "    conversion_factor = accel_data[\"imuParams\"][\"conversionFactor\"]\n",
    "    accel_x = [val * conversion_factor for val in accel_data[\"x\"]]\n",
    "    accel_y = [val * conversion_factor for val in accel_data[\"y\"]]\n",
    "    accel_z = [val * conversion_factor for val in accel_data[\"z\"]]\n",
    "\n",
    "sampling_frequency = accel_data['samplingFrequency']\n",
    "nrows = len(accel_x)\n",
    "\n",
    "t_start = accel_data['timestampStart']\n",
    "t_array = [t_start + i * (1e6 /sampling_frequency) for i in range(nrows)]\n",
    "t_from_0_array = ([(x - t_array[0]) / 1e6 for x in t_array])\n",
    "\n",
    "df = pd.DataFrame({\n",
    "    'time': t_from_0_array,\n",
    "    'time_dt': pd.to_datetime(t_array, unit='us'),\n",
    "    'accel_x': accel_x,\n",
    "    'accel_y': accel_y,\n",
    "    'accel_z': accel_z,\n",
    "})\n",
    "\n",
    "print(\n",
    "    f\"Data loaded from Avro file with {nrows} rows sampled \"\n",
    "    f\"at {sampling_frequency} Hz.\"\n",
    ")\n",
    "print(f\"Start time: {pd.to_datetime(t_start, unit='us')}\")\n",
    "\n",
    "df.head()"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}