IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0202312.html
   My bibliography  Save this article

Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila

Author

Listed:
  • Zhila Esna Ashari
  • Kelly A Brayton
  • Shira L Broschat

Abstract

Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.

Suggested Citation

  • Zhila Esna Ashari & Kelly A Brayton & Shira L Broschat, 2019. "Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-12, January.
  • Handle: RePEc:plo:pone00:0202312
    DOI: 10.1371/journal.pone.0202312
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0202312
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0202312&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0202312?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kavita Thakur & Navneet Kaur Sandhu & Yogesh Kumar & Hiren Kumar Thakkar, 2024. "An automated multi-classification of communicable diseases using ensemble learning for disease surveillance," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 15(8), pages 3737-3756, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0202312. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.