Handling events and EEG dataframes¶
This tutorial shows what you will find in the segments dataframe, how to split it into clips and how to load corresponding parts of EEG recordings with the seiz_eeg
package.
We suppose that you have an annotation dataframe, here called segments.parquet
, which contains annotation for EEG events with the following format
Multiindex |
Columns |
|||||||
---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
str |
str |
int |
int |
float |
float |
datetime |
int |
str |
The annoations are represented as pandas.DataFrame
with hierarchical indexing. For more information on advaced pandas indexing, we refer you to pandas documentation.
Reading annotations¶
We can read an annotation file stored at path/to/segments.parquet
directly as a dataframe with the following code:
import pandas as pd
df = pd.read_parquet("path/to/segments.parquet")
We could then ispect the content of the first 10 lines, and sort them by index:
print(df.head(10).sort_index())
Which for the annotations of the train set of Temple University Seizure Corpus would give the following:
As you can see, the signals_path
points to the same file for all segments belonging to the same session
(recording).
These files contain contiguous recordings which can span multiple events. One can manually load them, or use the
provided seiz_eeg.dataset.EEGDataset
class, which allows to easily fetch segment signals. We give an example of its usage in the Loading signals with EEGDataset
section.
Filtering events¶
The seiz_eeg.utils
module provide multiple functions that can be helpful to filter the data in the annotation dataframe.
For instance, we might only be interested in sessions that contain either non-ictal activity (label 0), or generalized
seizures (label 3). We can filter them with the seiz_eeg.utils.sessions_by_labels
function, and relabel them to 0 and 1 respectively, by running:
df = sessions_by_labels(df, target_labels=[0,3], relabel=True)
Otherwise, we might be interested in working only with patients that have at least one seizure, but no more than 30 of
them. For this purpose, we can use the seiz_eeg.utils.patients_by_seizures
function:
df = patients_by_seizures(df, low=1, high=30)
Note that most of the functions in seiz_eeg.utils
take as input an annotation dataframe, and return an object
following the same schema. This allows to concatenate multiple preprocessing steps in a streamline pipeline.
For readibility, we suggest to use the pandas.DataFrame.pipe
method, which allows us to write:
df = (
pd.read_parquet("path/to/segments.parquet")
.pipe(sessions_by_labels, target_labels=[0,3], relabel=True)
.pipe(patients_by_seizures, low=1, high=30)
)
Getting clips of the same length¶
In case you need to work