Opposition-based sine cosine optimizer utilizing refraction learning and variable neighborhood search for feature selection

Appl Intell (Dordr). 2023;53(11):13224-13260. doi: 10.1007/s10489-022-04201-z. Epub 2022 Oct 8.

Abstract

This paper proposes new improved binary versions of the Sine Cosine Algorithm (SCA) for the Feature Selection (FS) problem. FS is an essential machine learning and data mining task of choosing a subset of highly discriminating features from noisy, irrelevant, high-dimensional, and redundant features to best represent a dataset. SCA is a recent metaheuristic algorithm established to emulate a model based on sine and cosine trigonometric functions. It was initially proposed to tackle problems in the continuous domain. The SCA has been modified to Binary SCA (BSCA) to deal with the binary domain of the FS problem. To improve the performance of BSCA, three accumulative improved variations are proposed (i.e., IBSCA1, IBSCA2, and IBSCA3) where the last version has the best performance. IBSCA1 employs Opposition Based Learning (OBL) to help ensure a diverse population of candidate solutions. IBSCA2 improves IBSCA1 by adding Variable Neighborhood Search (VNS) and Laplace distribution to support several mutation methods. IBSCA3 improves IBSCA2 by optimizing the best candidate solution using Refraction Learning (RL), a novel OBL approach based on light refraction. For performance evaluation, 19 real-wold datasets, including a COVID-19 dataset, were selected with different numbers of features, classes, and instances. Three performance measurements have been used to test the IBSCA versions: classification accuracy, number of features, and fitness values. Furthermore, the performance of the last variation of IBSCA3 is compared against 28 existing popular algorithms. Interestingly, IBCSA3 outperformed almost all comparative methods in terms of classification accuracy and fitness values. At the same time, it was ranked 15 out of 19 in terms of number of features. The overall simulation and statistical results indicate that IBSCA3 performs better than the other algorithms.

Keywords: Feature selection; Laplace distribution; Mutation methods; Opposition-based learning; Refraction learning; Sine cosine algorithm.