ST-SHAP: A hierarchical and explainable attention network for emotional EEG representation learning and decoding

Minmin Miao; Jin Liang; Zhenzhen Sheng; Wenzhe Liu; Baoguo Xu; Wenjun Hu

doi:10.1016/j.jneumeth.2024.110317

ST-SHAP: A hierarchical and explainable attention network for emotional EEG representation learning and decoding

J Neurosci Methods. 2024 Nov 12:414:110317. doi: 10.1016/j.jneumeth.2024.110317. Online ahead of print.

Authors

Minmin Miao¹, Jin Liang², Zhenzhen Sheng³, Wenzhe Liu³, Baoguo Xu⁴, Wenjun Hu⁵

Affiliations

¹ School of Information Engineering, Huzhou University, Huzhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Huzhou 313000, China. Electronic address: 02746@zjhu.edu.cn.
² School of Information Engineering, Huzhou University, Huzhou 313000, China.
³ School of Information Engineering, Huzhou University, Huzhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Huzhou 313000, China.
⁴ School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China.
⁵ School of Information Engineering, Huzhou University, Huzhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Huzhou 313000, China. Electronic address: huwenjun@zjhu.edu.cn.

PMID: 39542109
DOI: 10.1016/j.jneumeth.2024.110317

Abstract

Background: Emotion recognition using electroencephalogram (EEG) has become a research hotspot in the field of human-computer interaction, how to sufficiently learn complex spatial-temporal representations of emotional EEG data and obtain explainable model prediction results are still great challenges.

New method: In this study, a novel hierarchical and explainable attention network ST-SHAP which combines the Swin Transformer (ST) and SHapley Additive exPlanations (SHAP) technique is proposed for automatic emotional EEG classification. Firstly, a 3D spatial-temporal feature of emotional EEG data is generated via frequency band filtering, temporal segmentation, spatial mapping, and interpolation to fully preserve important spatial-temporal-frequency characteristics. Secondly, a hierarchical attention network is devised to sufficiently learn an abstract spatial-temporal representation of emotional EEG and perform classification. Concretely, in this decoding model, the W-MSA module is used for modeling correlations within local windows, the SW-MSA module allows for information interactions between different local windows, and the patch merging module further facilitates local-to-global multiscale modeling. Finally, the SHAP method is utilized to discover important brain regions for emotion processing and improve the explainability of the Swin Transformer model.

Results: Two benchmark datasets, namely SEED and DREAMER, are used for classification performance evaluation. In the subject-dependent experiments, for SEED dataset, ST-SHAP achieves an average accuracy of 97.18%, while for DREAMER dataset, the average accuracy is 96.06% and 95.98% on arousal and valence dimension respectively. In addition, important brain regions that conform to prior knowledge of neurophysiology are discovered via a data-driven approach for both datasets.

Comparison with existing methods: In terms of subject-dependent and subject-independent emotional EEG decoding accuracies, our method outperforms several closely related existing methods.

Conclusion: These experimental results fully prove the effectiveness and superiority of our proposed algorithm.

Keywords: EEG; Emotion recognition; Explainability; Self attention; Swin transformer.