Objective: Support vector machine (SVM), a statistical learning method, has recently been evaluated in the prediction of absorption, distribution, metabolism, and excretion properties, as well as toxicity (ADMET) of new drugs. However, two problems still remain in SVM modeling, namely feature selection and parameter setting. The two problems have been shown to have an important impact on the efficiency and accuracy of SVM classification. In particular, the feature subset choice and optimal SVM parameter settings influence each other; this suggested that they should be dealt with simultaneously. In this paper, we propose an integrated scheme to account for both feature subset choice and SVM parameter settings in concert.
Method: In the proposed scheme, a genetic algorithm (GA) is used for the feature selection and the conjugate gradient (CG) method for the parameter optimization. Several classification models of ADMET related properties have been built for assessing and testing the integrated GA-CG-SVM scheme. They include: (1) identification of P-glycoprotein substrates and nonsubstrates, (2) prediction of human intestinal absorption, (3) prediction of compounds inducing torsades de pointes, and (4) prediction of blood-brain barrier penetration.
Results: Compared with the results of previous SVM studies, our GA-CG-SVM approach significantly improves the overall prediction accuracy and has fewer input features.
Conclusions: Our results indicate that considering feature selection and parameter optimization simultaneously, in SVM modeling, can help to develop better predictive models for the ADMET properties of drugs.