IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1007419.html
   My bibliography  Save this article

Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences

Author

Listed:
  • Ilya Plyusnin
  • Liisa Holm
  • Petri Törönen

Abstract

Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an annotation method performs and allows for them to be ranked against one another. Unfortunately, most of these metrics were adopted from the machine learning literature without establishing whether they were appropriate for GO annotations. We propose a novel approach for comparing GO evaluation metrics called Artificial Dilution Series (ADS). Our approach uses existing annotation data to generate a series of annotation sets with different levels of correctness (referred to as their signal level). We calculate the evaluation metric being tested for each annotation set in the series, allowing us to identify whether it can separate different signal levels. Finally, we contrast these results with several false positive annotation sets, which are designed to expose systematic weaknesses in GO assessment. We compared 37 evaluation metrics for GO annotation using ADS and identified drastic differences between metrics. We show that some metrics struggle to differentiate between different signal levels, while others give erroneously high scores to the false positive data sets. Based on our findings, we provide guidelines on which evaluation metrics perform well with the Gene Ontology and propose improvements to several well-known evaluation metrics. In general, we argue that evaluation metrics should be tested for their performance and we provide software for this purpose (https://bitbucket.org/plyusnin/ads/). ADS is applicable to other areas of science where the evaluation of prediction results is non-trivial.Author summary: In the biosciences, predictive methods are becoming increasingly necessary as novel sequences are generated at an ever-increasing rate. The volume of sequence data necessitates Automated Function Prediction (AFP) as manual curation is often impossible. Unfortunately, selecting the best AFP method is complicated by researchers using different evaluation metrics. Furthermore, many commonly-used metrics can give misleading results. We argue that the use of poor metrics in AFP evaluation is a result of the lack of methods to benchmark the metrics themselves. We propose an approach called Artificial Dilution Series (ADS). ADS uses existing data sets to generate multiple artificial AFP results, where each result has a controlled error rate. We use ADS to understand whether different metrics can distinguish between results with known quantities of error. Our results highlight dramatic differences in performance between evaluation metrics.

Suggested Citation

  • Ilya Plyusnin & Liisa Holm & Petri Törönen, 2019. "Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences," PLOS Computational Biology, Public Library of Science, vol. 15(11), pages 1-27, November.
  • Handle: RePEc:plo:pcbi00:1007419
    DOI: 10.1371/journal.pcbi.1007419
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007419
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1007419&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1007419?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
    2. Catia Pesquita & Daniel Faria & André O Falcão & Phillip Lord & Francisco M Couto, 2009. "Semantic Similarity in Biomedical Ontologies," PLOS Computational Biology, Public Library of Science, vol. 5(7), pages 1-12, July.
    3. John P A Ioannidis, 2014. "How to Make More Published Research True," PLOS Medicine, Public Library of Science, vol. 11(10), pages 1-6, October.
    4. David J. Hand, 2012. "Assessing the Performance of Classification Methods," International Statistical Review, International Statistical Institute, vol. 80(3), pages 400-414, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stefan Stieglitz & Christian Meske & Björn Ross & Milad Mirbabaie, 2020. "Going Back in Time to Predict the Future - The Complex Role of the Data Collection Period in Social Media Analytics," Information Systems Frontiers, Springer, vol. 22(2), pages 395-409, April.
    2. Shareen A Iqbal & Joshua D Wallach & Muin J Khoury & Sheri D Schully & John P A Ioannidis, 2016. "Reproducible Research Practices and Transparency across the Biomedical Literature," PLOS Biology, Public Library of Science, vol. 14(1), pages 1-13, January.
    3. Peter Van Schuerbeek & Chris Baeken & Johan De Mey, 2016. "The Heterogeneity in Retrieved Relations between the Personality Trait ‘Harm Avoidance’ and Gray Matter Volumes Due to Variations in the VBM and ROI Labeling Processing Settings," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-15, April.
    4. Suresh H. Moolgavkar & Ellen T. Chang & Heather N. Watson & Edmund C. Lau, 2018. "An Assessment of the Cox Proportional Hazards Regression Model for Epidemiologic Studies," Risk Analysis, John Wiley & Sons, vol. 38(4), pages 777-794, April.
    5. Leonid Tiokhin & Minhua Yan & Thomas J. H. Morgan, 2021. "Competition for priority harms the reliability of science, but reforms can help," Nature Human Behaviour, Nature, vol. 5(7), pages 857-867, July.
    6. Denes Szucs & John P A Ioannidis, 2017. "Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature," PLOS Biology, Public Library of Science, vol. 15(3), pages 1-18, March.
    7. Anne-Laure Boulesteix, 2015. "Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research," PLOS Computational Biology, Public Library of Science, vol. 11(4), pages 1-6, April.
    8. Alexander Frankel & Maximilian Kasy, 2022. "Which Findings Should Be Published?," American Economic Journal: Microeconomics, American Economic Association, vol. 14(1), pages 1-38, February.
    9. Jyotirmoy Sarkar, 2018. "Will P†Value Triumph over Abuses and Attacks?," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 7(4), pages 66-71, July.
    10. Stanley, T. D. & Doucouliagos, Chris, 2019. "Practical Significance, Meta-Analysis and the Credibility of Economics," IZA Discussion Papers 12458, Institute of Labor Economics (IZA).
    11. Karin Langenkamp & Bodo Rödel & Kerstin Taufenbach & Meike Weiland, 2018. "Open Access in Vocational Education and Training Research," Publications, MDPI, vol. 6(3), pages 1-12, July.
    12. Kevin J. Boyle & Mark Morrison & Darla Hatton MacDonald & Roderick Duncan & John Rose, 2016. "Investigating Internet and Mail Implementation of Stated-Preference Surveys While Controlling for Differences in Sample Frames," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 64(3), pages 401-419, July.
    13. Jelte M Wicherts & Marjan Bakker & Dylan Molenaar, 2011. "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-7, November.
    14. Valentine, Kathrene D & Buchanan, Erin Michelle & Scofield, John E. & Beauchamp, Marshall T., 2017. "Beyond p-values: Utilizing Multiple Estimates to Evaluate Evidence," OSF Preprints 9hp7y, Center for Open Science.
    15. Anton, Roman, 2014. "Sustainable Intrapreneurship - The GSI Concept and Strategy - Unfolding Competitive Advantage via Fair Entrepreneurship," MPRA Paper 69713, University Library of Munich, Germany, revised 01 Feb 2015.
    16. Dudek, Thomas & Brenøe, Anne Ardila & Feld, Jan & Rohrer, Julia, 2022. "No Evidence That Siblings' Gender Affects Personality across Nine Countries," IZA Discussion Papers 15137, Institute of Labor Economics (IZA).
    17. Uwe Hassler & Marc‐Oliver Pohle, 2022. "Unlucky Number 13? Manipulating Evidence Subject to Snooping," International Statistical Review, International Statistical Institute, vol. 90(2), pages 397-410, August.
    18. Frederique Bordignon, 2020. "Self-correction of science: a comparative study of negative citations and post-publication peer review," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1225-1239, August.
    19. Omar Al-Ubaydli & John A. List, 2015. "Do Natural Field Experiments Afford Researchers More or Less Control than Laboratory Experiments? A Simple Model," NBER Working Papers 20877, National Bureau of Economic Research, Inc.
    20. Rebeca Buzzo Feltrin & Maria Cristina Rodrigues Guilam & Manoel Barral-Netto & Nísia Trindade Lima & Milton Ozório Moraes, 2018. "For socially engaged science: The dynamics of knowledge production in the Fiocruz graduate program in the framework of the "Brazil Without Extreme Poverty Plan"," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-15, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1007419. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.