IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0024756.html
   My bibliography  Save this article

iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

Author

Listed:
  • Wei-Zhong Lin
  • Jian-An Fang
  • Xuan Xiao
  • Kuo-Chen Chou

Abstract

DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.

Suggested Citation

  • Wei-Zhong Lin & Jian-An Fang & Xuan Xiao & Kuo-Chen Chou, 2011. "iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model," PLOS ONE, Public Library of Science, vol. 6(9), pages 1-7, September.
  • Handle: RePEc:plo:pone00:0024756
    DOI: 10.1371/journal.pone.0024756
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0024756
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0024756&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0024756?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yan Xu & Jun Ding & Ling-Yun Wu & Kuo-Chen Chou, 2013. "iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-7, February.
    2. Bin Liu & Longyun Fang & Fule Liu & Xiaolong Wang & Junjie Chen & Kuo-Chen Chou, 2015. "Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-20, March.
    3. Xin Ma & Jing Guo & Xiao Sun, 2016. "DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues," PLOS ONE, Public Library of Science, vol. 11(12), pages 1-20, December.
    4. Jianjun He & Hong Gu & Wenqi Liu, 2012. "Imbalanced Multi-Modal Multi-Label Learning for Subcellular Localization Prediction of Human Proteins with Both Single and Multiple Sites," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-10, June.
    5. Bi-Qing Li & Le-Le Hu & Lei Chen & Kai-Yan Feng & Yu-Dong Cai & Kuo-Chen Chou, 2012. "Prediction of Protein Domain with mRMR Feature Selection and Analysis," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-14, June.
    6. Wenzheng Bao & Bin Yang & Rong Bao & Yuehui Chen, 2019. "LipoFNT: Lipoylation Sites Identification with Flexible Neural Tree," Complexity, Hindawi, vol. 2019, pages 1-9, July.
    7. Xiao Wang & Guo-Zheng Li, 2012. "A Multi-Label Predictor for Identifying the Subcellular Locations of Singleplex and Multiplex Eukaryotic Proteins," PLOS ONE, Public Library of Science, vol. 7(5), pages 1-9, May.
    8. Sabit Ahmed & Afrida Rahman & Md Al Mehedi Hasan & Md Khaled Ben Islam & Julia Rahman & Shamim Ahmad, 2021. "predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-17, April.
    9. Wu Zhu & Jian-an Fang & Yang Tang & Wenbing Zhang & Wei Du, 2012. "Digital IIR Filters Design Using Differential Evolution Algorithm with a Controllable Probabilistic Population Size," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-9, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0024756. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.