IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i7p2446-2452.html
   My bibliography  Save this article

How accurate are the extremely small P-values used in genomic research: An evaluation of numerical libraries

Author

Listed:
  • Santosh Bangalore, Sai
  • Wang, Jelai
  • Allison, David B.

Abstract

In the fields of genomics and high-dimensional biology (HDB), massive multiple testing prompts the use of extremely small significance levels. Because tail areas of statistical distributions are needed for hypothesis testing, the accuracy of these areas is important to confidently make scientific judgments. Previous work on accuracy was primarily focused on evaluating professionally written statistical software, like SAS, on the Statistical Reference Datasets (StRD) provided by the National Institute of Standards and Technology (NIST) and on the accuracy of tail areas in statistical distributions. The goal of this paper is to provide guidance to investigators, who are developing their own custom scientific software built upon numerical libraries written by others. Specifically, we evaluate the accuracy of small tail areas from cumulative distribution functions (CDF) of the Chi-square and t-distribution by comparing several open-source, free, or commercially licensed numerical libraries in Java, C, and R to widely accepted standards of comparison like ELV and DCDFLIB. In our evaluation, the C libraries and R functions are consistently accurate up to six significant digits. Amongst the evaluated Java libraries, Colt is the most accurate. These languages and libraries are popular choices among programmers developing scientific software, so the results herein can be useful to programmers in choosing libraries for CDF accuracy.

Suggested Citation

  • Santosh Bangalore, Sai & Wang, Jelai & Allison, David B., 2009. "How accurate are the extremely small P-values used in genomic research: An evaluation of numerical libraries," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2446-2452, May.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:7:p:2446-2452
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00550-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Knusel, Leo, 2005. "On the accuracy of statistical distributions in Microsoft Excel 2003," Computational Statistics & Data Analysis, Elsevier, vol. 48(3), pages 445-449, March.
    2. Keeling, Kellie B. & Pavur, Robert J., 2007. "A comparative study of the reliability of nine statistical software packages," Computational Statistics & Data Analysis, Elsevier, vol. 51(8), pages 3811-3831, May.
    3. Altman, Micah & Gill, Jeff & McDonald, Michael P., 2007. "accuracy: Tools for Accurate and Reliable Statistical Computing," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i01).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. McCullough, Bruce D. & Yalta, A. Talha, 2013. "Spreadsheets in the Cloud - Not Ready Yet," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i07).
    2. Shi Yang & Shi Weiping & Wang Mengqiao & Lee Ji-Hyun & Kang Huining & Jiang Hui, 2023. "Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 22(1), pages 1-22, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. A. Yalta & A. Yalta, 2010. "Should Economists Use Open Source Software for Doing Research?," Computational Economics, Springer;Society for Computational Economics, vol. 35(4), pages 371-394, April.
    2. Keeling, Kellie B. & Pavur, Robert J., 2007. "A comparative study of the reliability of nine statistical software packages," Computational Statistics & Data Analysis, Elsevier, vol. 51(8), pages 3811-3831, May.
    3. Jason S. Bergtold & Krishna P. Pokharel & Allen M. Featherstone & Lijia Mo, 2018. "On the examination of the reliability of statistical software for estimating regression models with discrete dependent variables," Computational Statistics, Springer, vol. 33(2), pages 757-786, June.
    4. Bergtold, Jason S. & Pokharel, Krishna & Featherstone, Allen, 2015. "On the Examination of the Reliability of Statistical Software for Estimating Logistic Regression Models," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205643, Agricultural and Applied Economics Association.
    5. Yalta, A. Talha & Jenal, Olaf, 2009. "On the importance of verifying forecasting results," International Journal of Forecasting, Elsevier, vol. 25(1), pages 62-73.
    6. McCullough, Bruce D. & Yalta, A. Talha, 2013. "Spreadsheets in the Cloud - Not Ready Yet," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i07).
    7. repec:jss:jstsof:34:i04 is not listed on IDEAS
    8. D. Betsy McCoach & Graham G. Rifenbark & Sarah D. Newton & Xiaoran Li & Janice Kooken & Dani Yomtov & Anthony J. Gambino & Aarti Bellara, 2018. "Does the Package Matter? A Comparison of Five Common Multilevel Modeling Software Packages," Journal of Educational and Behavioral Statistics, , vol. 43(5), pages 594-627, October.
    9. Yalta, A. Talha, 2007. "The Numerical Reliability of GAUSS 8.0," The American Statistician, American Statistical Association, vol. 61, pages 262-268, August.
    10. Ignacio Díaz-Emparanza & Petr Mariel & María Victoria Esteban (ed.), 2009. "Econometrics with gretl. Proceedings of the gretl Conference 2009," UPV/EHU Books, Universidad del País Vasco - Facultad de Ciencias Económicas y Empresariales, edition 1, number 01, June.
    11. Yalta, A. Talha, 2008. "The accuracy of statistical distributions in Microsoft® Excel 2007," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4579-4586, June.
    12. A. Talha Yalta & A. Yasemin Yalta, 2009. "Wilkinson Tests and gretl," EHUCHAPS, in: Ignacio Díaz-Emparanza & Petr Mariel & María Victoria Esteban (ed.), Econometrics with gretl. Proceedings of the gretl Conference 2009, edition 1, chapter 16, pages 243-251, Universidad del País Vasco - Facultad de Ciencias Económicas y Empresariales.
    13. Ozier, Owen, 2012. "Perils of simulation : parallel streams and the case of stata's rnormal command," Policy Research Working Paper Series 6278, The World Bank.
    14. A. Talha Yalta, 2010. "The Accuracy of Statistical Distributions in Microsoft (R) Excel 2007," Working Papers 1006, TOBB University of Economics and Technology, Department of Economics.
    15. McCullough, B.D. & Heiser, David A., 2008. "On the accuracy of statistical procedures in Microsoft Excel 2007," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4570-4578, June.
    16. Izquierdo, Segismundo S. & Hernández, Cesáreo & del Hoyo, Juan, 2006. "Forecasting VARMA processes using VAR models and subspace-based state space models," MPRA Paper 4235, University Library of Munich, Germany.
    17. Tao Ge & Jinye Li & Cang Wang, 2023. "Econometric analysis of the impact of innovative city pilots on CO2 emissions in China," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 25(9), pages 9359-9386, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:7:p:2446-2452. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.