Seizures are the defining signature of epilepsy. Because they manifest as abnormal electrical activity in the brain, they can be detected using electroencephalography (EEG) – in theory. In practice, seizure detection is notoriously difficult. Patterns vary widely, are often subtle, and are frequently obscured by noise. Distinguishing true epileptiform activity from normal EEG variants requires years of training.

Enter machine learning. Seizure detection has become a prime target for automation, and the motivation is obvious: trained EEG specialists are scarce, clinical workloads are growing, and rapid interpretation is often critical. The result is an intense race among EEG manufacturers and software developers to build automated detection systems.

With each new system, the numbers look better – sensitivities climb, false alarm rates drop, algorithms appear increasingly reliable. But look closer, and problems become evident: most of these claims are impossible to compare. Studies use different datasets, different definitions of what counts as a detected seizure, and different evaluation procedures. Without accepted benchmarks or reporting standards, performance metrics can easily create an illusion of progress.

This blog series examines the methodological loopholes behind that illusion. We’re not focused on how seizure-detection algorithms work – but on how their performance is measured, reported, and sometimes manipulated. Our goal is to help physicians, researchers, and decision-makers cut through the fog surrounding automated seizure detection claims.

A series of 4 blogs will cover a wide range of topics:

Blog 1 of 4

How to Define Seizure Detection

Signal Detection Theory: The Foundation

Medical diagnostic tools are typically evaluated using signal detection theory, which assumes an “ideal observer” can optimally separate signal from noise. In seizure detection, the signal represents the predominant ‘abnormal’ EEG pattern during true electrographic seizures, while noise encompasses everything else—including ‘normal’ background EEG and other types of abnormal non-epileptoform activity.

According to this theory, signals can be separated from noise as long as they don’t completely overlap. For example, EEG amplitude generally increases during seizures. By analyzing the distribution of amplitude values during seizure versus non-seizure periods, we can identify patterns that distinguish these two states.

However, reality is more complex. Motion artifacts can produce high-amplitude signals that mimic seizures (see Figure 1), leading to false positive detections. This is why sophisticated algorithms are necessary to achieve high specificity in real-world applications.

The Contingency Table

When scanning an EEG and making binary seizure/non-seizure decisions, four outcomes are possible:

True Positives (TP): Correctly identifying seizures
True Negatives (TN): Correctly identifying non-seizure segments
False Positives (FP): Incorrectly flagging non-seizure activity as seizures
False Negatives (FN): Missing actual seizures

These four categories form the basis for calculating performance metrics. This table is also often called a ‘confusion matrix’ and it provides an indispensable visual snapshot of classifier performance, revealing not just whether an algorithm works, but precisely how and where it succeeds or fails [4].

Impact of false assessments in Seizure Detection

An ideal seizure detection algorithm would produce a matrix with strong diagonal values (correct classifications) and zeros in all off-diagonal cells (no errors). In reality, the goal is to maximize correct detections while minimizing mistakes – but not all mistakes carry equal weight. False positives and false negatives have different clinical consequences.

False Negatives. Missing several individual seizure events within an EEG study might prompt a medication adjustment but typically won’t alter a patient’s epilepsy diagnosis. Though missing life-threatening non-convulsive status epilepticus in an ICU patient altogether can be catastrophic.

False Positives. Conversely, false detections can lead to misdiagnosis of epilepsy in healthy individuals, affecting their quality of life, employment, and driving privileges. In operational settings, excessive false alarms create another problem: alarm fatigue. When clinicians are inundated with false alerts, they may become desensitized and slower to respond to real emergencies, which is why precision matters as much as sensitivity.

What is “Detection”

One critical yet often overlooked aspect is the definition of “detection” itself. To validate the accuracy of an algorithm for seizure detection, human expert physicians score a given set of EEGs and their readings are then compared to the automated results.

The simplest approach uses temporal proximity-based detection: counting a ‘detection’ when the algorithm-identified event falls within any time window (Δt) of the expert-annotated onset (Figure 2a). There does not have to be a complete overlap to receive a 100% score. A more sophisticated class of methods uses the overlap between the machine-detected and expert-annotated seizure intervals. In this class, most studies use the binary “Overlap” (or short “OVLP”) method (Figure 2b), which considers any overlap between the detected and expert-annotated seizure interval as a hit (TP) [3]. However, the most accurate quantification of match is the percentage overlap or Time-aligned Event Scoring (TAES) method between expert-annotated and algorithm-detected seizure intervals (Figure 2c). These three different approaches can yield dramatically different results [6].

A proximity approach might count any partial detection as a correct “hit” while an overlap approach would reveal a more differentiated picture. For example, a 30-second temporal overlap between a human expert annotation and the algorithm score on a 60 seconds seizure would represent only a 50% overlap. As a result, a study reporting 95% sensitivity using proximity or OVLP detection might show only 70% average overlap with the percentage overlap approach. In all three cases, the algorithm performed equally ‘well’, but these results are reported with a different intent. Using the proximity approach helps to communicate how well the algorithm finds a similar number of seizure events. The overlap approach, however, also is able to make a statement about how similar the algorithm is to the expert human read in its temporal accuracy. Neither approach is wrong, but not considering these detection details can lead to widely different assessments of an algorithm’s efficiency.

It is entirely possible that a proximity-based detection overinflates the perceived performance of an algorithm by the mere circumstance that only the edges of the algorithm read aligns with the expert human assessment. Assessing the same algorithm outcome using the percent overlap approach would yield lower scores.

We will address the question of proximity-based vs. interval matching under the chapter “Temporal Considerations: Seizure Onset and Offset.”

proximity_vs_interval — **Figure 2. Proximity and two types of overlap based seizure detection.** The same seizure detection error can be quantified and reported with strikingly different results as illustrated in panels a, b and c. (a) *Proximity* based seizure detection defines an acceptance window (dt) around the true seizure onset. (b) *OVLP* (short for overlap) detects *any* overlap between expert-annotated and auto-detected seizure intervals. The overlap here is binary (yes/no), meaning it is a “yes” even when the overlap is only 1 ms. (c) The *percent overlap* method defines time intervals between seizure onset and offset for both expert-annotated and auto-detected seizures, and calculates the percent overlap between the two relative to the expert-annotated seizure interval. “Match” percentages at the bottom are the reported matches using specific methods.

While most seizure detection performance reports use proximity or OVLP methods, Zeto’s automated algorithm applies the more conservative percent overlap method for precisely quantifying seizure detection. It requires not only determining the seizure onset but also seizure offset times, and extra information is not always available from annotations of open-access EEG seizure databases.

Practical considerations

When evaluating published metrics, always ask: How was “detection” defined? What time window or overlap threshold was used? Are both onset and offset considered, or only onset? Without this context, confusion matrices and derived metrics lack essential interpretability.

The confusion matrix serves as a diagnostic window into classifier behavior, but its value depends entirely on understanding these methodological choices – distinguishing algorithms that merely look good on paper from those delivering genuine clinical value.

To learn more about quantitative evaluation of seizure detection, please continue reading the next blog: Basic Concepts of Seizure Detection .

Limitations

Throughout this series, we won’t cover the algorithms behind seizure detection methods themselves -several recent reviews do that [1, 7, 9, 12].

We also won’t attempt a systematic comparison of detection systems. Commercial algorithms are proprietary and often patent-protected, making them impossible to evaluate directly. In contrast, academic algorithms are often open-source and more comparable – but with half the landscape hidden behind trade secrecy, any head-to-head comparison would be fundamentally incomplete.

That gap is itself part of the problem. Instead of comparing algorithms, we focus on something more tractable: the loopholes and biases in reporting practices that shape how all of these seizure detection methods are perceived and judged.

References:

Bai, L., Litscher, G., & Li, X. (2025). Epileptic Seizure Detection Using Machine Learning: A Systematic Review and Meta-Analysis. Brain sciences, 15(6), 634. https://doi.org/10.3390/brainsci15060634
Baghdadi, A., Fourati, R., Aribi, Y. et al. A channel-wise attention-based representation learning method for epileptic seizure detection and type classification. J Ambient Intell Human Comput 14, 9403–9418 (2023). https://doi.org/10.1007/s12652-023-04609-6
Gotman, J., Flanagan, D., Zhang, J., & Rosenblatt, B. (1997). Automatic seizure detection in the newborn: methods and initial evaluation. Electroencephalography and clinical neurophysiology, 103(3), 356–362. https://doi.org/10.1016/s0013-4694(97)00003-9
Khurshid, D., Wahid, F., Ali, S., Gumaei, A. H., Alzanin, S. M., & Mosleh, M. A. A. (2024). A deep neural network-based approach for seizure activity recognition of epilepsy sufferers. Frontiers in medicine, 11, 1405848. https://doi.org/10.3389/fmed.2024.1405848
Kunekar, P., Gupta, M.K. & Gaur, P. Detection of epileptic seizure in EEG signals using machine learning and deep learning techniques. J. Eng. Appl. Sci. 71, 21 (2024). https://doi.org/10.1186/s44147-023-00353-y
Lee, K., Jeong, H., Kim, S., Yang, D., Kang, H. C., & Choi, E. (2022). Real-time seizure detection using EEG: a comprehensive comparison of recent approaches under a realistic setting. arXiv preprint arXiv:2201.08780.
Li, W., Wang, G., Lei, X., Sheng, D., Yu, T., & Wang, G. (2022). Seizure detection based on wearable devices: A review of device, mechanism, and algorithm. Acta neurologica Scandinavica, 146(6), 723–731. https://doi.org/10.1111/ane.13716
Paneru B, (2025) Epileptic Seizure Detection based on Different Events with XAI and Early Aid System for Patient Aid doi: https://doi.org/10.1101/2025.10.03.25337233
Slama, K., Yahyaouy, A., Riffi, J., Mahraz, M. A., & Tairi, H. (2025). Comprehensive review of machine learning and deep learning techniques for epileptic seizure detection and prediction based on neuroimaging modalities. Visual computing for industry, biomedicine, and art, 8(1), 27. https://doi.org/10.1186/s42492-025-00208-8
Torkey, H., Hashish, S., Souissi, S., Hemdan, E. E. D., & Sayed, A. (2025). Seizure Detection in Medical IoT: Hybrid CNN-LSTM-GRU Model with Data Balancing and XAI Integration. Algorithms, 18(2), 77.
Wilson, S. B., Scheuer, M. L., Plummer, C., Young, B., & Pacia, S. (2003). Seizure detection: correlation of human experts. Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology, 114(11), 2156–2164. https://doi.org/10.1016/s1388-2457(03)00212-8
Zhang X, Zhang X, Huang Q and Chen F (2024) A review of epilepsy detection and prediction methods based on EEG signal processing and deep learning. Front. Neurosci. 18:1468967. doi: 10.3389/fnins.2024.1468967

Understanding Seizure Detection – Blog Series