IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004071.html
   My bibliography  Save this article

Accurate Computation of Survival Statistics in Genome-Wide Studies

Author

Listed:
  • Fabio Vandin
  • Alexandra Papoutsaki
  • Benjamin J Raphael
  • Eli Upfal

Abstract

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations.Author Summary: The identification of genetic variants associated with survival time is crucial in genomic studies. To this end, a number of methods have been proposed to computing a p-value that summarized the difference in survival time of two or more population. The most widely used method among these is the log-rank test. Widely used implementations of the log-rank test present a systematic error that emerges in most genome-wide applications, where the two populations have very different sizes, and the accurate computation of very small p-values is required due to the evaluation of a number of candidate variants. Considering cancer genomic applications, we show that the systematic error leads to many false positive associations of somatic variants and survival time. We present and analyze a new algorithm, ExaLT that accurately computes the p-value for the log-rank test under a distribution that is appropriate for the parameters found in genomics. Unlike previous approaches, ExaLT allows to control the accuracy of the computation. We use ExaLT to analyze cancer genomics data from The Cancer Genome Atlas (TCGA), identifying several novel associations in addition to well known associations. In contrast, the standard implementations of the log-rank test report a huge number of presumably false positive associations.

Suggested Citation

  • Fabio Vandin & Alexandra Papoutsaki & Benjamin J Raphael & Eli Upfal, 2015. "Accurate Computation of Survival Statistics in Genome-Wide Studies," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-18, May.
  • Handle: RePEc:plo:pcbi00:1004071
    DOI: 10.1371/journal.pcbi.1004071
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004071
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004071&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004071?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Georg Heinze & Michael Gnant & Michael Schemper, 2003. "Exact Log-Rank Tests for Unequal Follow-Up," Biometrics, The International Biometric Society, vol. 59(4), pages 1151-1157, December.
    2. Nicky J. Welton & Howard H. Z. Thom, 2015. "Value of Information," Medical Decision Making, , vol. 35(5), pages 564-566, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vincenzo Varriale & Antonello Cammarano & Francesca Michelino & Mauro Caputo, 2021. "Sustainable Supply Chains with Blockchain, IoT and RFID: A Simulation on Order Management," Sustainability, MDPI, vol. 13(11), pages 1-23, June.
    2. Valeria Costantini & Francesco Crespi & Giovanni Marin & Elena Paglialunga, 2016. "Eco-innovation, sustainable supply chains and environmental performance in European industries," LEM Papers Series 2016/19, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    3. Lee, Alice J. & Ames, Daniel R., 2017. "“I can’t pay more” versus “It’s not worth more”: Divergent effects of constraint and disparagement rationales in negotiations," Organizational Behavior and Human Decision Processes, Elsevier, vol. 141(C), pages 16-28.
    4. Hussain, Hadia & Murtaza, Murtaza & Ajmal, Areeb & Ahmed, Afreen & Khan, Muhammad Ovais Khalid, 2020. "A study on the effects of social media advertisement on consumer’s attitude and customer response," MPRA Paper 104675, University Library of Munich, Germany.
    5. A. G. Fatullayev & Nizami A. Gasilov & Şahin Emrah Amrahov, 2019. "Numerical solution of linear inhomogeneous fuzzy delay differential equations," Fuzzy Optimization and Decision Making, Springer, vol. 18(3), pages 315-326, September.
    6. Cyril Chalendard, 2015. "Use of internal information, external information acquisition and customs underreporting," Working Papers halshs-01179445, HAL.
    7. Arun Advani & William Elming & Jonathan Shaw, 2023. "The Dynamic Effects of Tax Audits," The Review of Economics and Statistics, MIT Press, vol. 105(3), pages 545-561, May.
    8. Philippe Aghion & Ufuk Akcigit & Matthieu Lequien & Stefanie Stantcheva, 2017. "Tax simplicity and heterogeneous learning," CEP Discussion Papers dp1516, Centre for Economic Performance, LSE.
    9. Marie Bjørneby & Annette Alstadsæter & Kjetil Telle, 2018. "Collusive tax evasion by employers and employees. Evidence from a randomized fi eld experiment in Norway," Discussion Papers 891, Statistics Norway, Research Department.
    10. Chuangen Gao & Shuyang Gu & Jiguo Yu & Hai Du & Weili Wu, 2022. "Adaptive seeding for profit maximization in social networks," Journal of Global Optimization, Springer, vol. 82(2), pages 413-432, February.
    11. Koessler, Frederic & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," Theoretical Economics, Econometric Society, vol. 17(2), May.
    12. Jamal El-Den & Pratap Adikhari & Pratap Adikhari, 2017. "Social media in the service of social entrepreneurship: Identifying factors for better services," Journal of Advances in Humanities and Social Sciences, Dr. Yi-Hsing Hsieh, vol. 3(2), pages 105-114.
    13. Annette Alstadsæter & Wojciech Kopczuk & Kjetil Telle, 2019. "Social networks and tax avoidance: evidence from a well-defined Norwegian tax shelter," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 26(6), pages 1291-1328, December.
    14. Xiongnan Jin & Sejin Chun & Jooik Jung & Kyong-Ho Lee, 0. "A fast and scalable approach for IoT service selection based on a physical service model," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    15. Jun Hong Park & Sang Ho Kook & Hyeonu Im & Soomin Eum & Chulung Lee, 2018. "Fabless Semiconductor Firms’ Financial Performance Determinant Factors: Product Platform Efficiency and Technological Capability," Sustainability, MDPI, vol. 10(10), pages 1-22, September.
    16. Sebastian Kaumanns, 2019. "“Some fuzzy math”: relational information on debt value adjustments by managers and the financial press," Business Research, Springer;German Academic Association for Business Research, vol. 12(2), pages 755-794, December.
    17. Samuel J Gershman, 2015. "A Unifying Probabilistic View of Associative Learning," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-20, November.
    18. Arun Advani, 2022. "Who does and doesn't pay taxes?," Fiscal Studies, John Wiley & Sons, vol. 43(1), pages 5-22, March.
    19. Steve Fortin & Ahmad Hammami & Michel Magnan, 2021. "Re‐exploring Fair Value Accounting and Value Relevance: An Examination of Underlying Securities," Abacus, Accounting Foundation, University of Sydney, vol. 57(2), pages 220-250, June.
    20. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004071. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.