.. |eegdatset_tutorial| replace:: Loading signals with ``EEGDataset`` .. |TUH| replace:: Temple University Seizure Corpus ============================================= Handling events and EEG dataframes ============================================= This tutorial shows what you will find in the segments dataframe, how to split it into clips and how to load corresponding parts of EEG recordings with the :mod:`seiz_eeg` package. We suppose that you have an annotation dataframe, here called ``segments.parquet``, which contains annotation for EEG events with the following format =========== =========== =========== ========= ============== ============ ============== ================= ================ Multiindex Columns ----------------------------------- --------------------------------------------------------------------------------------- ``patient`` ``session`` ``segment`` ``label`` ``start_time`` ``end_time`` ``date`` ``sampling_rate`` ``signals_path`` =========== =========== =========== ========= ============== ============ ============== ================= ================ str str int int float float datetime int str =========== =========== =========== ========= ============== ============ ============== ================= ================ The annoations are represented as :class:`pandas.DataFrame` with hierarchical indexing. For more information on advaced pandas indexing, we refer you to `pandas documentation`_. .. _`pandas documentation`: https://pandas.pydata.org/docs/user_guide/advanced.html Reading annotations =================== We can read an annotation file stored at ``path/to/segments.parquet`` directly as a dataframe with the following code:: import pandas as pd df = pd.read_parquet("path/to/segments.parquet") We could then ispect the content of the first 10 lines, and sort them by index:: print(df.head(10).sort_index()) Which for the annotations of the train set of |TUH| would give the following: .. image:: ../../figures/segments_head_10.png :alt: First 10 lines of annotation dataframe :align: center :width: 800 px As you can see, the ``signals_path`` points to the same file for all segments belonging to the same ``session`` (recording). These files contain contiguous recordings which can span multiple events. One can manually load them, or use the provided :class:`seiz_eeg.dataset.EEGDataset` class, which allows to easily fetch segment signals. We give an example of its usage in the |eegdatset_tutorial|_ section. Filtering events ================ The :mod:`seiz_eeg.utils` module provide multiple functions that can be helpful to filter the data in the annotation dataframe. For instance, we might only be interested in sessions that contain either non-ictal activity (label 0), or generalized seizures (label 3). We can filter them with the :func:`seiz_eeg.utils.sessions_by_labels` function, and relabel them to 0 and 1 respectively, by running:: df = sessions_by_labels(df, target_labels=[0,3], relabel=True) Otherwise, we might be interested in working only with patients that have at least one seizure, but no more than 30 of them. For this purpose, we can use the :func:`seiz_eeg.utils.patients_by_seizures` function:: df = patients_by_seizures(df, low=1, high=30) Note that most of the functions in :mod:`seiz_eeg.utils` take as input an annotation dataframe, and return an object following the same schema. This allows to concatenate multiple preprocessing steps in a streamline pipeline. For readibility, we suggest to use the :meth:`pandas.DataFrame.pipe` method, which allows us to write:: df = ( pd.read_parquet("path/to/segments.parquet") .pipe(sessions_by_labels, target_labels=[0,3], relabel=True) .pipe(patients_by_seizures, low=1, high=30) ) Getting clips of the same length ================================ In case you need to work |eegdatset_tutorial| =================================== :class:`seiz_eeg.dataset.EEGDataset`.