IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v22y2023i1p22n1.html
   My bibliography  Save this article

Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method

Author

Listed:
  • Shi Yang

    (Division of Biostatistics and Data Science, Department of Population Health Sciences and Department of Neuroscience and Regenerative Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA)

  • Shi Weiping

    (College of Mathematics, Jilin University, Changchun, 130012, China)

  • Wang Mengqiao

    (Department of Epidemiology and Biostatistics, School of Public Health, Chengdu Medical College, Chengdu, 610500, China)

  • Lee Ji-Hyun

    (Division of Quantitative Sciences, University of Florida Health Cancer Center and Department of Biostatistics, University of Florida, Gainesville, FL 32610, USA)

  • Kang Huining

    (University of New Mexico Comprehensive Cancer Center Biostatistics Shared Resource, University of New Mexico, Albuquerque, NM 87131, USA)

  • Jiang Hui

    (Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA)

Abstract

Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge in the application of permutation tests in genomic studies is that an enormous number of permutations are often needed to obtain reliable estimates of very small p-values, leading to intensive computational effort. To address this issue, we develop algorithms for the accurate and efficient estimation of small p-values in permutation tests for paired and independent two-group genomic data, and our approaches leverage a novel framework for parameterizing the permutation sample spaces of those two types of data respectively using the Bernoulli and conditional Bernoulli distributions, combined with the cross-entropy method. The performance of our proposed algorithms is demonstrated through the application to two simulated datasets and two real-world gene expression datasets generated by microarray and RNA-Seq technologies and comparisons to existing methods such as crude permutations and SAMC, and the results show that our approaches can achieve orders of magnitude of computational efficiency gains in estimating small p-values. Our approaches offer promising solutions for the improvement of computational efficiencies of existing permutation test procedures and the development of new testing methods using permutations in genomic data analysis.

Suggested Citation

  • Shi Yang & Shi Weiping & Wang Mengqiao & Lee Ji-Hyun & Kang Huining & Jiang Hui, 2023. "Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 22(1), pages 1-22, January.
  • Handle: RePEc:bpj:sagmbi:v:22:y:2023:i:1:p:22:n:1
    DOI: 10.1515/sagmb-2021-0067
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2021-0067
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2021-0067?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rubinstein, Reuven Y., 1997. "Optimization of computer simulation models with rare events," European Journal of Operational Research, Elsevier, vol. 99(1), pages 89-112, May.
    2. Brian D. Segal & Thomas Braun & Michael R. Elliott & Hui Jiang, 2018. "Fast approximation of small p†values in permutation tests by partitioning the permutations," Biometrics, The International Biometric Society, vol. 74(1), pages 196-206, March.
    3. Hu, Jiaqiao & Su, Zheng, 2008. "Bootstrap quantile estimation via importance resampling," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5136-5142, August.
    4. Reuven Rubinstein, 1999. "The Cross-Entropy Method for Combinatorial and Continuous Optimization," Methodology and Computing in Applied Probability, Springer, vol. 1(2), pages 127-190, September.
    5. Chen, Sean X., 2000. "General Properties and Estimation of Conditional Bernoulli Models," Journal of Multivariate Analysis, Elsevier, vol. 74(1), pages 69-87, July.
    6. Santosh Bangalore, Sai & Wang, Jelai & Allison, David B., 2009. "How accurate are the extremely small P-values used in genomic research: An evaluation of numerical libraries," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2446-2452, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mattrand, C. & Bourinet, J.-M., 2014. "The cross-entropy method for reliability assessment of cracked structures subjected to random Markovian loads," Reliability Engineering and System Safety, Elsevier, vol. 123(C), pages 171-182.
    2. K.-P. Hui & N. Bean & M. Kraetzl & Dirk Kroese, 2005. "The Cross-Entropy Method for Network Reliability Estimation," Annals of Operations Research, Springer, vol. 134(1), pages 101-118, February.
    3. Fahimnia, Behnam & Sarkis, Joseph & Eshragh, Ali, 2015. "A tradeoff model for green supply chain planning:A leanness-versus-greenness analysis," Omega, Elsevier, vol. 54(C), pages 173-190.
    4. Joshua C. C. Chan & Liana Jacobi & Dan Zhu, 2022. "An automated prior robustness analysis in Bayesian model comparison," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(3), pages 583-602, April.
    5. Ad Ridder, 2004. "Importance Sampling Simulations of Markovian Reliability Systems using Cross Entropy," Tinbergen Institute Discussion Papers 04-018/4, Tinbergen Institute.
    6. Masoud Esmaeilikia & Behnam Fahimnia & Joeseph Sarkis & Kannan Govindan & Arun Kumar & John Mo, 2016. "A tactical supply chain planning model with multiple flexibility options: an empirical evaluation," Annals of Operations Research, Springer, vol. 244(2), pages 429-454, September.
    7. Fahimnia, Behnam & Sarkis, Joseph & Choudhary, Alok & Eshragh, Ali, 2015. "Tactical supply chain planning under a carbon tax policy scheme: A case study," International Journal of Production Economics, Elsevier, vol. 164(C), pages 206-215.
    8. Ali Eshragh & Jerzy Filar & Michael Haythorpe, 2011. "A hybrid simulation-optimization algorithm for the Hamiltonian cycle problem," Annals of Operations Research, Springer, vol. 189(1), pages 103-125, September.
    9. Joshua Chan & Eric Eisenstat & Xuewen Yu, 2022. "Large Bayesian VARs with Factor Stochastic Volatility: Identification, Order Invariance and Structural Analysis," Papers 2207.03988, arXiv.org.
    10. Qun Niu & Ming You & Zhile Yang & Yang Zhang, 2021. "Economic Emission Dispatch Considering Renewable Energy Resources—A Multi-Objective Cross Entropy Optimization Approach," Sustainability, MDPI, vol. 13(10), pages 1-33, May.
    11. L. Margolin, 2005. "On the Convergence of the Cross-Entropy Method," Annals of Operations Research, Springer, vol. 134(1), pages 201-214, February.
    12. J Morio & R Pastel, 2012. "Plug-in estimation of d-dimensional density minimum volume set of a rare event in a complex system," Journal of Risk and Reliability, , vol. 226(3), pages 337-345, June.
    13. Morio, Jérôme, 2011. "Non-parametric adaptive importance sampling for the probability estimation of a launcher impact position," Reliability Engineering and System Safety, Elsevier, vol. 96(1), pages 178-183.
    14. Sze Him Leung & Ji Meng Loh & Chun Yip Yau & Zhengyuan Zhu, 2021. "Spatial Sampling Design Using Generalized Neyman–Scott Process," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(1), pages 105-127, March.
    15. Nguyen, Hoa T.M. & Chow, Andy H.F. & Ying, Cheng-shuo, 2021. "Pareto routing and scheduling of dynamic urban rail transit services with multi-objective cross entropy method," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 156(C).
    16. Hao Su & Qun Niu & Zhile Yang, 2023. "Optimal Power Flow Using Improved Cross-Entropy Method," Energies, MDPI, vol. 16(14), pages 1-33, July.
    17. Agbeyegbe, Terence D., 2020. "Bayesian analysis of output gap in Barbados," Latin American Journal of Central Banking (previously Monetaria), Elsevier, vol. 1(1).
    18. Chan, Joshua C.C., 2023. "Comparing stochastic volatility specifications for large Bayesian VARs," Journal of Econometrics, Elsevier, vol. 235(2), pages 1419-1446.
    19. Benham, Tim & Duan, Qibin & Kroese, Dirk P. & Liquet, Benoît, 2017. "CEoptim: Cross-Entropy R Package for Optimization," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i08).
    20. Ferdinand Bollwein & Stephan Westphal, 2022. "Oblique decision tree induction by cross-entropy optimization based on the von Mises–Fisher distribution," Computational Statistics, Springer, vol. 37(5), pages 2203-2229, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:22:y:2023:i:1:p:22:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.