IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v14y2015i6p551-573n4.html
   My bibliography  Save this article

An Empirical Bayes risk prediction model using multiple traits for sequencing data

Author

Listed:
  • Li Gengxin

    (Department of Mathematics and Statistics, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH 45435, USA)

  • Cui Yuehua

    (Department of Statistics and Probability, Michigan State University, 619 Red Cedar Rd, East Lansing, MI 48824,USA)

  • Zhao Hongyu

    (Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06520, USA)

Abstract

The rapidly developing sequencing technologies have led to improved disease risk prediction through identifying many novel genes. Many prediction methods have been proposed to use rich genomic information to predict binary disease outcomes. It is intuitive that these methods can be further improved by making efficient use of the rich information in measured quantitative traits that are correlated with binary outcomes. In this article, we propose a novel Empirical Bayes prediction model that uses information from both quantitative traits and binary disease status to improve risk prediction. Our method is built on a new statistic that better infers the gene effect on multiple traits, and it also enjoys the good theoretical properties. We then consider using sequencing data by combining information from multiple rare variants in individual genes to strengthen the signals of causal genetic effects. In simulation study, we find that our proposed Empirical Bayes approach is superior to other existing methods in terms of feature selection and risk prediction. We further evaluate the effectiveness of our proposed method through its application to the sequencing data provided by the Genetic Analysis Workshop 18.

Suggested Citation

  • Li Gengxin & Cui Yuehua & Zhao Hongyu, 2015. "An Empirical Bayes risk prediction model using multiple traits for sequencing data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(6), pages 551-573, December.
  • Handle: RePEc:bpj:sagmbi:v:14:y:2015:i:6:p:551-573:n:4
    DOI: 10.1515/sagmb-2015-0060
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2015-0060
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2015-0060?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Senn, Stephen, 2008. "A Note Concerning a Selection Paradox of Dawid's," The American Statistician, American Statistical Association, vol. 62, pages 206-210, August.
    2. Efron, Bradley, 2009. "Empirical Bayes Estimates for Large-Scale Prediction Problems," Journal of the American Statistical Association, American Statistical Association, vol. 104(487), pages 1015-1028.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shigeyuki Matsui & Hisashi Noma, 2011. "Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments," Biometrics, The International Biometric Society, vol. 67(4), pages 1225-1235, December.
    2. Andrew Y Chen & Tom Zimmermann & Jeffrey Pontiff, 2020. "Publication Bias and the Cross-Section of Stock Returns," The Review of Asset Pricing Studies, Society for Financial Studies, vol. 10(2), pages 249-289.
    3. Li Gengxin & Hou Lin & Liu Xiaoyu & Wu Cen, 2020. "A weighted empirical Bayes risk prediction model using multiple traits," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(3), pages 1-14, June.
    4. Stanley, T. D. & Doucouliagos, Hristos, 2011. "Meta-regression approximations to reduce publication selection bias," Working Papers eco_2011_4, Deakin University, Department of Economics.
    5. David Amar & Ron Shamir & Daniel Yekutieli, 2017. "Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate," PLOS Computational Biology, Public Library of Science, vol. 13(8), pages 1-22, August.
    6. David R. Bickel, 2014. "Small-scale Inference: Empirical Bayes and Confidence Methods for as Few as a Single Comparison," International Statistical Review, International Statistical Institute, vol. 82(3), pages 457-476, December.
    7. She, Yiyuan, 2012. "An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors," Computational Statistics & Data Analysis, Elsevier, vol. 56(10), pages 2976-2990.
    8. Lu, Jiannan & Deng, Alex, 2016. "Demystifying the bias from selective inference: A revisit to Dawid’s treatment selection problem," Statistics & Probability Letters, Elsevier, vol. 118(C), pages 8-15.
    9. Chen Xu & Jiahua Chen, 2014. "The Sparse MLE for Ultrahigh-Dimensional Feature Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1257-1269, September.
    10. van Iterson Maarten & van de Wiel Mark A. & Boer Judith M. & de Menezes Renée X., 2013. "General power and sample size calculations for high-dimensional genomic data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 449-467, August.
    11. Pallavi Basu & Luella Fu & Alessio Saretto & Wenguang Sun, 2021. "Empirical Bayes Control of the False Discovery Exceedance," Working Papers 2115, Federal Reserve Bank of Dallas.
    12. Maharaj, Elizabeth Ann & Alonso, Andrés M., 2014. "Discriminant analysis of multivariate time series: Application to diagnosis based on ECG signals," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 67-87.
    13. Park, Junyong, 2018. "Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation," Computational Statistics & Data Analysis, Elsevier, vol. 117(C), pages 19-31.
    14. Habiger, Joshua D. & Peña, Edsel A., 2014. "Compound p-value statistics for multiple testing procedures," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 153-166.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:14:y:2015:i:6:p:551-573:n:4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.