Author
Listed:
- Peng Li
- Lili Yin
- Bo Zhao
- Yuezhongyi Sun
Abstract
To address the imbalanced data problem in molecular docking-based virtual screening methods, this paper proposes a virtual screening method for drug proteins based on imbalanced data mining, which introduces machine learning technology into the virtual screening technology for drug proteins to deal with the imbalanced data problem in the virtual screening process and improve the accuracy of the virtual screening. First, to address the data imbalance problem caused by the large difference between the number of active compounds and the number of inactive compounds in the docking conformation generated by the actual virtual screening process, this paper proposes a way to improve the data imbalance problem using SMOTE combined with genetic algorithm to synthesize new active compounds artificially by upsampling active compounds. Then, in order to improve the accuracy in the virtual screening process of drug proteins, the idea of integrated learning is introduced, and the random forest (RF) extended from Bagging integrated learning technique is combined with the support vector machine (SVM) technique, and the virtual screening of molecular docking conformations using RF-SVM technique is proposed to improve the prediction accuracy of active compounds in docking conformations. To verify the effectiveness of the proposed technique, first, HIV-1 protease and SRC kinase were used as test data for the experiments, and then, CA II was used to validate the model of the test data. The virtual screening of drug proteins using the proposed method in this paper showed an improvement in both enrichment factor (EF) and AUC compared with the use of the traditional virtual screening, for the test dataset. Therefore, it can be shown that the proposed method can effectively improve the accuracy of drug virtual screening.
Suggested Citation
Peng Li & Lili Yin & Bo Zhao & Yuezhongyi Sun, 2021.
"Virtual Screening of Drug Proteins Based on Imbalance Data Mining,"
Mathematical Problems in Engineering, Hindawi, vol. 2021, pages 1-10, May.
Handle:
RePEc:hin:jnlmpe:5585990
DOI: 10.1155/2021/5585990
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:jnlmpe:5585990. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.