Journal of Applied Bioinformatics & Computational BiologyISSN: 2329-9533

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Review Article,  J Appl Bioinformat Computat Biol Vol: 0 Issue: 0

Assessment And Application of EEG: A Literature Review

John Reaves1, Timothy Flavin1, Bhaskar Mitra2, Mahantesh K3 and Vidhyashree Nagaraju1*

1Tandy School of Computer Science, The University of Tulsa, Tulsa, OK 74104 USA

2Idaho National Laboratory, Idaho Falls, ID, USA

3Department of Electronics & Comunication Engineering, SJB Institute of Technology, Bengaluru, KA, India

*Corresponding Author: Vidhyashree Nagaraju
Tandy School of Computer Science, The University of Tulsa, Tulsa, OK 74104 USA
E-mail: [email protected]

Received: July 01, 2021 Accepted: July 19, 2021 Published: July 26, 2021

Citation: Reaves J, Flavin T, Mitra B, Mahantesh K, Nagaraju V (2021) Assessment And Application of EEG: A Literature Review. J Appl Bioinforma Comput Biol 10:7.


Advancements in neuroscience have enabled the collection and assessment of neurological data to assist in the detection and treatment of several medical conditions as well as the operation and control of devices through brain-computer interface. Existing studies rely heavily on data such as elec- troencephalography (EEG) because of its ease of data collection and assessment, high temporal resolution, low cost, ease of use, and high computational accuracy compared to other neurological and physiological data such as functional Magnetic Resonance Imaging (fMRI) and heart rate.

This paper provides a comprehensive review of recent liter- ature on EEG data assessment and applications  in  computer and neuroscience research. Specifically, the paper reviews articles recently published in high impact venues including IEEE to provide a brief insight into ongoing  work  in  this  research  area. The survey is intended to provide a quick summary for researchers, graduate students, and any interested individuals seeking to advance research on this topic. It should also be beneficial to neuroscientists and professionals wishing to obtain a quick overview of previous work. In addition to summarizing key methodologies on data collection, preprocessing, and algorithms, we identify open data sets, software, and developing trends that would benefit from continued exploration.

Keywords: Electroencephalograph, Brain-Computer Interface, Magnetic Resonance Imaging, Data Preprocessing, Feature Extraction, Algorithms, Machine Learning


Technological advancement in the field of neuroscience has enabled solutions to many problems through the collection and assessment of neurological and physiological data such as Electroencephalography (EEG) [1], functional Magnetic Resonance Imaging (fMRI) [2], Electrocardiogram (ECG) [3], Electromyography (EMG) [4], and Galvanic Skin Response (GSR) [5]. Specifically neurological signals such as EEG and fMRI have assisted in advancing the detection and treatment of many neurological afflictions, including mental health dis- orders, through neurofeedback signals specific to the patient. Additional areas include improved focus in learning and design of brain-computer-interface (BCI) [6,7] devices, including brain-wavecontrolled prostheses and control of autonomous systems.

Much progress has been made over the years in understanding and applying EEG and other types of data. Other for- mats, such as electrocorticography (ECoG), sometimes called intracranial EEG and fMRI, are gaining traction. However, these methods of data collection have limitations that restrict their usage [8]. A review of existing literature suggests that applications of EEG ranging from emotion discrimination and seizure detection to motor imagery (MI) in BCI have seen significant increases in accuracy and computation speed. How- ever, there are challenges in filtering out noise and acquiring detailed information from the observed signals. Some of the challenges encountered are:

• Limited or lack of communication between neuroscien- tists and computer scientists.

• Lack of availability of recent datasets to support algorithmic research.

• Limited knowledge of the data collection process and application area that may hinder the ability to develop or apply algorithms without losing important information.

• Availability of variety of preprocessing pipelines for use in EEG applications to remove artifacts. While there is no qualityassurance standardization between methods as yet, continuing to encourage diverse approaches while having a means to evaluate them may be preferred to a central “gold standard” [9].

• Inherent non-stationarity of EEG signals complicates generalization, as signals vary between subjects and within subjects, depending on their state of mind.

This paper provides a comprehensive review of articles on EEG approaches published in high-impact journals and conferences including IEEE, Springer, and Elsevier. 48 articles from 15 journals and 3 conferences in IEEE, 22 papers from 17 journals in Springer, and 24 articles from 8 journals in Elsevier were selected from a pool of over 700 articles and analyzed. Much attention was paid to highimpact journals, including IEEE Machine Learning and Pattern Recognition, Elsevier Pattern Recognition, and proceedings including the Genetic and Evolutionary Computation Conference and the International Conference on Bioinformatics and Biomedical Science. The shortlisted articles capture the limitations and challenges of EEG research as well as highlight efforts from computer science researchers. The survey focuses on summarizing data collection, preprocessing, feature extraction, and classification/prediction algorithms. It provides a quick review of the existing and latest work for people who would like to pursue research in this area and enhance our understanding. The survey is intended to support graduate students, academic researchers, neuroscientists, and other professionals in the field. The paper also includes a list of devices and open data sets to support continued exploration.

The remainder of the paper is organized as follows: Section II provides a definition of EEG along with a brief description of data and devices, while Section II-B identifies publicly available datasets. Section III outlines some of the applications of EEG. Section IV reviews different stages of data processing and highlights efficient and commonly used algorithms at each stage. Section V provides a summary of the study along with some directions for future research.


This section describes EEG along with its benefits and drawbacks. It then summarizes EEG data collection and visualization with an example, and identifies publicly-available datasets. Finally, invasive and non-invasive data collection de- vices are discussed, and commercially available EEG devices are identified with their technical specifications.

A. Definition and Description

EEG is a measure of electrical activity in the brain that records frequencies observed through the brain’s normal activity. While EEG signals were discovered in 1875 through Richard Caton’s work with animals [9,10], the term ‘EEG’ was coined by Dr. Hans Berger in 1924 after the successful recording of the first human electroencephalograph. Formally, Olejniczak [1] defines EEG as “a graphic representation of the difference in voltage between two different cerebral locations plotted over time,” mostly consisting of synaptic activity, though contaminated with noise from other sources and distorted by being measured through the skull. Initially a novelty, interest in EEG technology increased with the discovery of seizure patterns.

EEG is prominently used in biomedical applications for the detection of neurological disorders such as epilepsy, tumors, sleep disorders, and inflammation or damage in the brain. In addition to this, EEG is extensively used in neuroscience research focused on, but not limited to, motor, cognitive, and sensory imaging. Advances in neuroscience research have enabled the development of braincomputer interfaces, which facilitate the control and use of devices via brain wave interpretation.

EEG is generally preferred over other methods of data collection mainly because of its high temporal resolution, low device cost, noninvasive and easy data-collection process as well as the fact that EEG data conversion and interpretation is computationally less expensive than other methods. However, EEG has a low signal-to-noise ratio since brain activity is observed through the skull, and motion can add additional noise artifacts. As mentioned before, signals are not always consistent between or within individuals. While individual differences are beneficial in applications such as EEG-based biometric identification, it complicates any kind of generalizable BCI or other algorithms that attempt to understand the functional details of the brain. This complication is further worsened by differences in individuals’ brain waves at any given time, based on emotional state, movement, and so on.

B. Data Collection and Interpretation

The EEG data collection process is typically centered around particular frequencies depending on the specific ap- plication, such as a research problem or medical assessment. Collection of EEG data through electrode placement adheres to internationally agreed rules [11], generally classified into 10-10 or 10-20. The numbers refer to the distance between electrodes; in the 10-20 system, for instance, electrodes are 10% of the skull’s left-right distance and 20% of its front-back distance apart. The placement starts with initial marks at four points: between the forehead and nose, middle of the back of the skull over the occipital area, and on both sides of the head above the outer part of the ear opening. After the indentation, the electrodes are placed at specific distances from the points. The brain signals can be localized by narrowing down the region through the addition of electrodes.

Figure 1 shows electrode placement based on 10-20 system to collect EEG data.

Figure 1: Positioning of electrodes in the 10-20 system.

Figure 1 shows connection points for 21 total channels, where each channel corresponds to an electrode and outputs a waveform. The connection points or electrodes are denoted by letters and numbers to easily distinguish them. The letters correspond to lobes, or approximate parts of the brain being analyzed: frontopolar or prefrontal cortex (Fp), frontal (F), temporal (T), parietal lobe (P), occipital (O), auricular (A), and central (C). The ‘Z’ label associated with these letters indicates electrodes along the midline of the head. The left and right hemisphere of the brain are identified by odd and even numbers, respectively.

Common wavelengths used in EEG analysis include delta (<4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12-30 Hz), and gamma (30-45 Hz) waves as shown in Figure 2. Figure 2 shows a visualization of brain activity [12,13] at different wave- lengths generated using MNE [14] Python.

Figure 2: Brain activity measured by EEG at different wavelengths

Alpha and beta waves in Figure 2 are commonly used in motor imagery applications, with the former correlating with eye-muscle movements and the latter associated with general movement [10]. In emotion and preference research, alpha waves are associated with positive emotions while asymmetrical, and greater beta waves (16-18 Hz) are tied to individual preference. Higher and lower frequency in theta bands indicates positive and negative emotions, respectively [15]. Delta waves in Figure 2 are assumed to be less useful and filtered out during the process of noise reduction.

3 is a visualization of an EEG motor movement/imagery dataset [13] collected using non-invasive device described in Figure 1 and Figure 2. In Figure 3, the x-axis indicates time in seconds and the y- axis corresponds to the output of each electrode described in Figure 1, sampled at 160 times per second. Figure 3 presents unprocessed data with no visible effect of the artifacts, which are commonly introduced during EEG data collection. The color changes represent the transitions between different events, with an approximate length of 4s along with the resting state (T0) between events. In baseline trials T0 is recorded between 60-120s.

Figure 3: Visualization of EEG Data.

Additional data formats: Other formats considered as standalone replacements or multimodal supplements in neuro- science research and development include:

• ECoG: This method is similar to an invasive form of an EEG scan. Subdural electrodes measure activity directly from the surface of the cerebral cortex, but the invasive nature of data collection limits its adoption [8].

• fMRI: This approach measures activity more spatially through detection of blood oxygen level-dependent changes in the MRI signal due to neuronal activities as a result of a stimulus or task [2]. While useful, the amount of time required to take a clear picture and potential for noise introduction severely limit the feasibility of its real- time use.

• EMG: An EMG analyzes electrical activity in the muscles. EMG is useful in examining the connection between nerves and muscles in a particular part of the body. Due to spatial limitations, EMG is often used for diagnosis rather than for signal analysis.

In addition to these formats, eye movement, heart rate, and body temperature are sometimes used in combination with EEG and other data formats to improve the accuracy of classification or prediction.

Publicly Available Datasets

Table 1 summarizes some publicly and readily available datasets for EEG that should serve as a helpful resource for subsequent work in the field.

Name Data type Reference
OpenNeuro EEG [16]
BCI Competition EEG [17]
Physionet EEG [18], [19]
UCI-ML EEG [20], [21]
BNCI Horizon 2020 EEG/EMG/EOG [24]
EEGbase EEG/ERP [25]

Table 1: Publicly available EEG data repository.

In Table 1, the BCI Competition datasets [17] and DEAP [18-24] are among the most commonly used for MI activities and emotional interpretation, respectively. In addition, Temple University maintains a comprehensive list of EEG databases and complementing software. Additional neurological data sets are maintained by Neuroscience Information Framework [25-27], while MRI and fMRI sources are maintained by Neuro- data [28] and OpenfMRI [29], respectively.

C. Devices

EEG data is typically collected by placing electrodes on the skull similar to Figure 1. At a high level, devices used to collect EEG data can be classified into invasive and non- invasive depending on placement of electrode inside or outside the skull.

1. Non-invasive: Information gathered from the brain is gathered without any surgical procedure. The data is usually collected using a cap or headset.

2. Invasive: Data is gathered from electrodes placed directly on the brain. Collected data will be less susceptible to noise and other interference. Additional benefits include accurate localization of signals.

The non-invasive device is the preferred type of data collection due to its ease of use and low cost. While selecting a non- invasive device, there are additional parameters that should be considered such as:

• Electrode Patterns: Selection of patterns including the original 10-20 or 10-10 system. Different and denser patterns usually allow for greater level of detail [30].

• Channels: The quality of data collection and assessment is dependent on electrode placement and the number of channels analyzed. Modern devices typically feature 24- 32 and up to 128 channels [31]. While a greater number of channels can allow for greater level of detail, it also increases the setup time and device cost significantly. However, the cost remains less expensive compared to fMRI data collection.

• Sensor Technology: EEG electrodes can be wet, dry, or semidry. Dry electrodes are easier to set up [32], but can be prone to interference and motion artifacts, unlike wet technology, which uses a conductive cream or paste. Semi-dry technology uses polymer electrolyte, which is usually utilized in devices such as EEG buds to record data for long periods of time.

• Connection: can be wired or wireless. Wireless connection usually involves transmission of data over Bluetooth or similar technology. While wireless is more expensive, such a feature is convenient in a laboratory setting, especially if movement of any sort is involved.

• Device type: could be a cap, headset, headband, or buds depending on the type of use. Caps and headsets are most commonly used in laboratory settings to collect high- quality data, whereas headbands and buds are mostly used in applications involving cognitive studies.

Table 2 lists commercially-available EEG devices and their approximate cost. Many devices feature accompanying soft- ware to process, visualize, and analyze the data, also listed in Table 2.

Device Name Data type Channels Device Style Electrode Type Data Transfer Freq. Range Software Cost Ref
Emotiv EPOC Flex EEG 32 Cap Wet BT 0.2 - 45Hz EmotivPRO $1700 [33]
Emotiv EPOC X EEG 14 Headset Wet USB, BT 0.16 – 43Hz EmotivPRO $850 [34]
OpenBCI EEG 16-Aug Cap, Headset Dry, Wet Wired, BT - OpenBCI (Open-Source) $1,000 [35], [36]
NeuroSky MindWave EEG 1 Headset Dry BT 3-100Hz NeuroView, NeuroSkyLab $100 [37]
Muse 2 EEG, Multi 4 Headset Dry BT 0.5-100Hz Muse/Muse Direct Apps $250 [38]
MindMedia NeXus 10 EEG, Multi 8 - - - - BioTrace+ $1,000 [39]
ActiveTwo EEG, Multi 32 - - USB None - $1,000 [40]

Table 2: Devices.


This section identifies and describes the applications of EEG data in medical and non-medical categories.

A. Medical Applications

With the increased interest in EEG technology following the discovery of epilepsy spikes in 1930, seizure detection has long been a major biomedical application of EEG data. Identification and prediction of seizures [33-42] in epileptic patients has significantly improved the quality of patients’ lives and enhanced the reliability of medical treatments. Ap- plications for epileptics are getting faster and more cost- effective as researchers continue to uncover new algorithms with high detection [43] and prediction accuracy [41] as well as improved epileptogenic foci localization [44].

Motor imagery [45] is another prominent field that attempts to understand and map human thinking processes into action. MI research is essentially predicated off the fact that whether someone actually moves a limb or simply imagines moving a limb, the brain signals produced are the same. Essentially, a functioning MI program could allow amputees [46] to regain movement in robotic versions of their lost limbs or allow anyone to remotely operate such limbs. The present state of the field is limited and classification of brainwaves is restricted to a few degrees of freedom. Recent studies on robotic prostheses focus on control [47], while some studies have focused on achieving that control with a short training time of approximately 15 minutes [48]. In addition, to improve the classification of signals, physiological data such as facial expression can also be analyzed [49].

Another major application of EEG is in the assessment and development of rehabilitation methods. Recently, a real-time EEG based MI-BCI system with a virtual reality game [50] was developed as a motivational tool with feedback for patients in stroke rehabilitation. Another study [51] explored the possibility of using por personal EEG devices with games to motivate patients to carry out their rehabilitation exercises. Additional applications include individual preference identification through interpretation of emotional information, especially for those with difficulties communicating [52,53] the study of sleep disorders through sleep-quality assessment [54] the analysis of EEG signals during pain perception [55]; the classification of potentially alcoholic patients [56] and the diagnosis of Parkinson’s disease [57].

B. Non-medical Applications

Non-medical applications of EEG are focused on, but not limited to, cognitive studies, robotic prostheses and security systems. Previous research on cognitive studies involves emotion recognition [52] and classification [23]. In addition, physiological signals were combined with EEG to improve the accuracy of emotion recognition [58]. A few studies also focused on methods to improve focus in e-learning through. attention feedback [59], improve learning for novice programmers through neurological signals-controlled interface [59,60] and to report experiences [61] of using EEG in the context of a software engineering education.

While robotic prostheses have significant use in medical applications, computer researchers are focused on utilizing motor imagery in applications including autonomous system operation and control [62,63]communication, and security research [64]. Recent studies include one focused on the tele- operation of a dualarm robot carrying a common object in multi-fingered hands [65]. The study is then extended to a controllable multi-directional arm to reach tasks in three- dimensional environments [7]. Many studies have also improved the classification accuracy of motor-imagery signals [66]. EEG signals are also used in security, specifically in biometrics authentication systems. EEG signals are used for personal identification [67] including facial recognition [68]. Most recently, one such study [69] examined network patterns and graph features to understand the distinctiveness of humans’ EEG functional connectivity and provide useful guidance for the design of graphbased EEG biometric systems.

EEG signals are also used in vigilance detection [70,71] critical for those who engage in long, demanding tasks such as monitoring systems or driving, which makes it a key field in BCI research. Additional non-medical applications include speech recognition [72] user intention classification [73] driver fatigue evaluation [47] for traffic safety, and mental workload assessment [74] for maintaining mental health and preventing accidents. More unique studied examples include using EEG as a lie detector [75] and using it for evaluating one’s confidence in making decisions [76].

Data Processing

This section describes steps in EEG data processing, including the preprocessing and classification stages. The section also summarizes existing studies on each stage of EEG data processing and identifies corresponding machine learning algorithms [77,78]. Figure 4 outlines the stages of EEG data processing in a neurofeedback system. In Figure 4, stages identified in the dashed box corresponds to data preprocessing. Most of the existing studies detail one or two stages of preprocessing rather than all three, due to either lack of information on the collection process or lack of knowledge in the selection and application of preprocessing methods.

Figure 4: Common data processing stages.

A. Denoising

In reducing and removing noise, the most common approaches include using low, high, or bandpass filters. These filters allow the user to pass along frequencies above, below, or between specified values, respectively. The choice of filters and frequency selection is dictated by the specific task at hand; for example, waves above 30Hz and below 8Hz will often be filtered out in studies involving motor imagery, as they are less relevant. More generally, very low frequencies are often muscle-movement artifacts, while frequencies in the 50- 60 Hz range feature noise from power lines or other electronic signals. There are a variety of more dynamic filters in use, such as Volterra [79] the zerophase bandpass Butterworth filter [80,81] and even fully adaptive filters for epilepsy detection [82,83].

Figure 5 shows a visualization of power recorded in the EEG data in Figure 3 after passing through low- and high- pass filters.

Figure 5: EEG power data before and after low and high pass filter.

While denoising using filters enables significant noise reduction by selecting only the required frequency range, artifacts in the data are more difficult to remove. Therefore, many studies extend the denoising stage with methods such as Independent Component Analysis (ICA) in attempts to automatically remove such artifacts [84- 86] though earlier and modern studies sometimes resort to manual artifact removal [87]. Infrequently, researchers and practitioners use filtered data directly without applying artifact removal procedures, though this has caused significant deviations in the overall assessment.

B. Time-frequency

Considering the non-stationary nature of EEG data, timefrequency analysis provides a temporal indication of multiple artifacts with the feature of time, helpful for developing controllers and feedback devices. Artifact removal [88] is a critical, yet challenging step in processing EEG data. The challenge is to analyze and isolate the data and remove the influence of other activities, such as prominent eye blink, heart rate, facial movement, and body temperature. Existing studies treat artifact removal as an optional stage. Less than 50% of the studies discuss artifact removal, with over 80% focused on manual removal based on their domain and problem-specific knowledge ICA and discrete wavelet transform (DWT) are among the most commonly used approaches to remove artifacts. However, application of these approaches are limited to static data analysis in applications such as epileptic seizure detection. However, some of these studies have highlighted the use of time-frequency analysis in reducing artifacts by assessing the input signal in both the time and frequency domains to achieve better resolution. For EEG data in particular, wavelet transforms (WT) [81] are commonly employed to more easily work with the inherently non-stationary signal. While WT can also help with denoising, it requires a relatively involved reconstruction process that is not used when using simpler filters; this process in itself can aid in dimensionality reduction, making the information that remains significantly more useful. As such, the Flexible Analytic Wavelet Transform (FAWT) [88,89] and others are extremely common due to their relative effectiveness. Common Spatial Patterns (CSP) [45, 80, 90] are also frequently used to preserve the limited spatial data ascertainable from an EEG, though the computational time that this takes limits its usefulness in real-time applications. Because of the extent to which the data can be altered by these processes, there are a few ways to estimate the relative quality of data afterwards. These include Harvard’s HAPPE, the LTAPP, and Automagic [91]. Considering how important the quality of the data used to develop a model is, these tools may see wider future use.

Additional time-frequency analysis methods include the Fourier transform (FT), short-time Fourier transform (STFT), Hilbert Huang transform (HHT), etc. FT is used only to deconstruct the received signal into component frequencies, though time information is necessarily lost. Because of the non-stationary nature of EEG signals, this is an uncommon approach and it is mostly coupled with other methods when used. Santoso [92] describes STFT as an extension of FT, which is able to preserve time information through the use of windowing, where FT will be applied to the subset of data in each window. A review of existing studies suggests the broad utilization of WT, which is able to preserve both time and frequency information. Specifically, WT is good in the time resolution of high frequencies, while for slowly varying functions, the frequency resolution is remarkable. On the more complex end, some authors make use of the HHT, which decomposes a signal into intrinsic mode functions (IMF) that also preserve temporal and spatial information [93]. It is highly effective for non-stationary and nonlinear data like EEG, and is complex enough that it could be considered an algorithm rather than a particular tool.

C. Feature extraction

Most commonly, regression and similar machine learning methods [94, 95] are used for feature extraction. Some of the most commonly used methods include logistic regression [42] support vector machine (SVM) [71] linear discriminant analysis (LDA) [96], principal component analysis (PCA) [97], evolutionary algorithms [98], and ensemble learning [99] including random forest [94] and XgBoost learning [94].

Recent studies have employed neural networks (NN) as the most common method of feature selection because of their ability to process information despite noise or artifacts. Convolutional [100] and analytical [101] neural networks with some variations, such as recurrent neural networks with long short-term memory (LSTM) are most commonly used. CSP algorithms are also frequently seen [45,80,90] especially in combination with methods of denoising that already decompose the signals. On the opposite end of the spectrum, a few papers [102,103] make use of fusional features, attempting to combine information from different electrodes, frequencies, or both in order to reduce the number of dimensions analyzed. This makes computation simpler and faster but runs the risk of reducing accuracy.

Less-common methods include recurrence quantification analysis (RQA) [104] a quadratic time-frequency distribution (QTFD) and Choi- Williams distribution (CWD) [105] quadratic discriminant analysis [106] to detect changes be- tween states, various types of segmenting [107, 108] and even the NSGA-II genetic algorithm [98]. Many studies, including [98] combine multiple methods of feature selection and machine learning, often using neural networks or k- nearest neighbors in tandem with more complex or uncommon methods.

Preprocessing algorithms: Table 3 lists most commonly- used algorithms and corresponding studies in EEG preprocessing.

Algorithm Reference
Wavelet transform (II) [42]–[44], [70], [74], [81], [84], [86], [87], [89], [96], [97], [100], [101], [109]–[120]
Fourier transform (II) [53], [86], [87], [92], [99], [120]
Independent component analysis (II-III) [7], [57], [68], [69], [83]–[87], [91], [96], [119]–[123]
Principal component analysis (II-III) [47], [58], [83], [89], [97], [120], [121], [124]–[126]
Power spectral density (I-II) [81], [85], [126]–[128]
Common average reference (I) [7], [102], [110], [120]
Common spatial pattern (II-III) [45], [65], [80], [82], [90], [93], [99], [120], [121], [128]–[138]
Hjorth parameters (I) [75], [87], [112], [123], [139]
Convolutional neural network (III) [99], [117], [129], [140], [141]
Discrete cosine transform (II) [86], [99]
Long short-term memory (III) [56], [142]–[144]
Linear discriminant analysis (II-III) [96], [128]
Genetic algorithms (III) [57], [98]
Support vector machine (III) [71], [109]
Transfer learning (III) [145], [146]
Empirical mode decomposition (I) [54], [87], [99]
Finite impulse response filter (I) [82], [147], [148]
Adaptive auto regression (I) [87], [120]
Autoregression (III) [120], [149]
Detrended fluctuation analysis (II) [54], [109]
Partial directed coherence (III) [57], [149]
Quadratic time-frequency distribution (II) [105], [141]
Renyi Entropy (III) [119], [150]
Symmetric and positive definite matrix (III) [151], [152]
Least mean squares (II) [83], [86]
Multiple artifact rejection algorithm (I) [69], [91]

Table3: Preprocessing.

Algorithms that correspond to different stages of preprocessing are identified as I, II, III to represent denoising, time- frequency analysis, and feature extraction, respectively. While the wavelet and Fourier transforms as well as CSP are the most commonly used techniques in time-frequency analysis since they preserve spatial information, ICA and PCA seem to be the preferred feature extraction approaches. Neural network algorithms along with transfer learning approaches are gaining attention.

D. Classification and Prediction

The classification/prediction stage of data processing is the final stage of EEG data that precedes inferences. The accuracy and computation time of classification is dependent on the quality and number of features extracted during preprocessing. While smaller sample sizes are easier to compute, many algorithms will not be able to make accurate predictions with small training sets. To address this, recent research has focused on applications of machine learning algorithms such as neural networks, which enable transfer and active learning to allow for the utilization of previously trained knowledge while making inferences based on small samples or less studied area.

Table 4 lists most commonly used algorithms and the corresponding studies in classification and prediction.

Algorithm Reference Field Hybrid
Support vector machine [23],  [42],  [44],  [45],  [47], [48],  [53],   [55],   [56], [58], [65],  [66], [68], [80], [86], [87], [93], [96]–[99], [103], [105],   [111], [113], [120], [121], [126],  [127], [129], [130], [134],  [137], [148], [149], [152]–[156] MI-BCI, Emotion, Epilepsy, Cognitive Ensemble Group, Sparse
k-Nearest neighbor [42],  [53],  [55],  [58],  [68], [96]–[98], [103], [113], [120], [123], [126], [127], [129], [148] MI-BCI, Emotion, Epilepsy, Cognitive CFS+KNN
Neural network [41],  [45],  [54],  [73],  [87], [97], [98], [101], [117], [118], [120], [147] MI-BCI, Epilepsy, Cognitive, Sleep DNN, ANN, MLP
Convolutional NN [7],  [41],  [50],  [56],  [74], [81], [92], [100], [110], [116], [117], [132],   [137]–[139], [141], [157]–[165] MI-BCI, Emotion, Cognitive, Epilepsy S-EEGNet, R3DCNN, CNN-LSTM
Linear discriminant analysis [48],  [58],  [82],  [87],  [89], [93], [98], [113], [120], [121], [129], [131], [135]–[137], [149], [155] MI-BCI, Sleep KLDA, LDA, Shrunken
Long Short Term Memory [7], [41], [52], [138], [159], [166]–[168] MI-BCI, Emotion, Epilepsy, Cognitive CNN-LSTM
Random Forest [42], [53], [57], [68], [72], [103], [111], [127] Emotion, Cognitive,  Epilepsy -
Naive Bayes [23], [53], [68], [97], [120], [148] Emotion, Sleep, Cognitive Gaussian
Recurrent NN [41], [42], [164], [169] MI, BCI, Epilepsy, Biometric R-3DCNN
Decision Tree [42], [58], [68], [97], [113] MI-BCI, Emotion Epilepsy, Sleep -
Transfer Learning [127], [138], [145], [146], [170] MI-BCI, Emotion DTL, MFTL, CSP, DNN
Logistic Regression [42], [58], [126], [129] MI-BCI, Emotion,  
Common spatial pattern [155], [170], [171] MI-BCI Filter Bank, Sparse Filter Band
Deconvolutional NN [47], [127] MI-BCI, Emotion -
Clustering [43], [87] Epilepsy, Sleep K-medoids
Quadratic discriminant analysis [106], [113] MI-BCI -
Principal component analysis [152], [155] MI-BCI KPCA
Sparse Representation [70], [171] MI-BCI, Cognitive -

Table4: Classification

The first column in Table 4 suggests that the first few entries are the most commonly used algorithms in classification. While the popularity of SVM is less clear, the application of neural networks is a promising path towards knowledge- sharing and improved accuracy. In addition, the use of transfer learning as a tool appears more often in classification efforts than in preprocessing. Given the inherent differences in EEG signals within and between people, such an approach will be invaluable in shortening learning times and improving future systems. Transfer learning’s lack of current popularity should not be confused for a lack of importance.

Conclusions and Future Research

This paper presents a comprehensive review of recent research on EEG data processing. The survey summarizes articles shortlisted from high-impact journals and conferences published in venues including IEEE, Elsevier, and Springer. The paper also provides descriptions of EEG data, applications, devices, and the various data processing stages. Comprehensive lists of publicly-available datasets, commercially- available devices, and algorithms that correspond to different data processing stages are provided for researchers and practitioners interested in advancing the field.

Future research will include the examination of commonlyapplied algorithms identified in the article and assess their suitability for EEG data. The development of efficient and generalizable preprocessing approaches that retain temporal and spatial resolution will be considered [109-171].


ANN Artificial neural network

BCI Brain-computer interface

BT Bluetooth

CFS Correlation-based feature selection

CNN Convolutional neural network

CSP Common spatial pattern

CWD Choi-Williams Distribution

DNN Deep neural network

DTL Deep transfer learning

DWT Discrete wavelet transform

ECG Electrocardiography

ECoG Electrocorticography

EEG Electroencephalograph

EMG Electromyography

EOG Electrooculography

ERP Event-related potential

FAWT Flexible analytic wavelet transform

fMRI Functional magnetic resonance imaging

T Fourier transform

GSR Galvanic skin response

HHT Hilbert-Huang transform

ICA Independent component analysis

IMF Intrinsic mode function


LDA Linear discriminant analysis

LSTM Long short-term memory

MI Motor imagery

MLP Multilayer perceptron

MRI Magnetic resonance imaging

NN Neural network

PCA Principal component analysis

QTFD Quadratic time-frequency distribution

RQA Recurrence quantification analysis

STFT Short-time Fourier transform

SVM Support vector machine

WT Wavelet transform


Track Your Manuscript