IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0030122.html
   My bibliography  Save this article

Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH

Author

Listed:
  • Oscar M Rueda
  • Ramón Díaz-Uriarte

Abstract

Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases. : As a consequence of problems during cell division, the number of copies of a gene in a chromosome can either increase or decrease. These copy-number alterations (CNAs) can play a crucial role in the emergence of complex multigenic diseases. For example, in cancer, amplification of oncogenes can drive tumor activation, and CNAs are associated with metastasis development and patient survival. Studies on the relationship between CNAs and disease have been recently fueled by the widespread use of array-based comparative genomic hybridization (aCGH), a technique with much finer resolution than previous experimental approaches. Detection of CNAs from these data depends on methods of analysis that do not impose biologically unrealistic assumptions and that provide direct answers to fundamental research questions. We have developed a statistical method, using a Bayesian approach, that returns estimates of the probabilities of CNAs from aCGH data, the most direct and valuable answer to the key biological question: “What is the probability that this gene/region has an altered copy number?” The output of the method can therefore be immediately used in different settings from clinical to basic research scenarios, and is applicable over a wide variety of aCGH technologies.

Suggested Citation

  • Oscar M Rueda & Ramón Díaz-Uriarte, 2007. "Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH," PLOS Computational Biology, Public Library of Science, vol. 3(6), pages 1-8, June.
  • Handle: RePEc:plo:pcbi00:0030122
    DOI: 10.1371/journal.pcbi.0030122
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030122
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0030122&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0030122?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. S. P. Brooks & P. Giudici & G. O. Roberts, 2003. "Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(1), pages 3-39, January.
    2. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    3. C. P. Robert & T. Rydén & D. M. Titterington, 2000. "Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(1), pages 57-75.
    4. Giovanni Parmigiani & Elizabeth S. Garrett & Ramaswamy Anbazhagan & Edward Gabrielson, 2002. "A statistical framework for expression‐based molecular classification in cancer," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 717-736, October.
    5. Raftery A.E. & Zheng Y., 2003. "Discussion: Performance of Bayesian Model Averaging," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 931-938, January.
    6. Fridlyand, Jane & Snijders, Antoine M. & Pinkel, Dan & Albertson, Donna G. & Jain, A.N.Ajay N., 2004. "Hidden Markov models approach to the analysis of array CGH data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 132-153, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ao Yuan & Guanjie Chen & Juan Xiong & Wenqing He & Wen Jin & Charles Rotimi, 2011. "Bayesian--frequentist hybrid model with application to the analysis of gene copy number changes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(5), pages 987-1005, February.
    2. James R Wagner & Bing Ge & Dmitry Pokholok & Kevin L Gunderson & Tomi Pastinen & Mathieu Blanchette, 2010. "Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-12, July.
    3. Michael Seifert & André Gohr & Marc Strickert & Ivo Grosse, 2012. "Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-15, January.
    4. Erick da Conceição Amorim & Vinícius Diniz Mayrink, 2020. "Clustering non-linear interactions in factor analysis," METRON, Springer;Sapienza Università di Roma, vol. 78(3), pages 329-352, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. McGrory, C.A. & Pettitt, A.N. & Faddy, M.J., 2009. "A fully Bayesian approach to inference for Coxian phase-type distributions with covariate dependent mean," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4311-4321, October.
    2. Ho, Remus K.W. & Hu, Inchi, 2008. "Flexible modelling of random effects in linear mixed models--A Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1347-1361, January.
    3. McVinish, R. & Mengersen, K., 2008. "Semiparametric Bayesian circular statistics," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4722-4730, June.
    4. Gagnon, Philippe & Bédard, Mylène & Desgagné, Alain, 2019. "Weak convergence and optimal tuning of the reversible jump algorithm," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 161(C), pages 32-51.
    5. Liu, Hefei & Song, Xinyuan, 2021. "Bayesian analysis of hidden Markov structural equation models with an unknown number of hidden states," Econometrics and Statistics, Elsevier, vol. 18(C), pages 29-43.
    6. Panagiotis Papastamoulis & George Iliopoulos, 2013. "On the Convergence Rate of Random Permutation Sampler and ECR Algorithm in Missing Data Models," Methodology and Computing in Applied Probability, Springer, vol. 15(2), pages 293-304, June.
    7. J. Vermaak & C. Andrieu & A. Doucet & S. J. Godsill, 2004. "Reversible Jump Markov Chain Monte Carlo Strategies for Bayesian Model Selection in Autoregressive Processes," Journal of Time Series Analysis, Wiley Blackwell, vol. 25(6), pages 785-809, November.
    8. McGrory, C.A. & Titterington, D.M., 2007. "Variational approximations in Bayesian model selection for finite mixture distributions," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5352-5367, July.
    9. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    10. Shuang Zhang & Xingdong Feng, 2022. "Distributed identification of heterogeneous treatment effects," Computational Statistics, Springer, vol. 37(1), pages 57-89, March.
    11. Li, Feng & Kang, Yanfei, 2018. "Improving forecasting performance using covariate-dependent copula models," International Journal of Forecasting, Elsevier, vol. 34(3), pages 456-476.
    12. Sik-Yum Lee, 2006. "Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data," Psychometrika, Springer;The Psychometric Society, vol. 71(3), pages 541-564, September.
    13. León-González, Roberto & Montolio, Daniel, 2015. "Endogeneity and panel data in growth regressions: A Bayesian model averaging approach," Journal of Macroeconomics, Elsevier, vol. 46(C), pages 23-39.
    14. Fisher, Mark & Jensen, Mark J., 2022. "Bayesian nonparametric learning of how skill is distributed across the mutual fund industry," Journal of Econometrics, Elsevier, vol. 230(1), pages 131-153.
    15. Cai, Jing-Heng & Song, Xin-Yuan & Lam, Kwok-Hap & Ip, Edward Hak-Sing, 2011. "A mixture of generalized latent variable models for mixed mode and heterogeneous data," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2889-2907, November.
    16. N. T. Longford & Pierpaolo D'Urso, 2011. "Mixture models with an improper component," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(11), pages 2511-2521, January.
    17. Ungolo, Francesco & Kleinow, Torsten & Macdonald, Angus S., 2020. "A hierarchical model for the joint mortality analysis of pension scheme data with missing covariates," Insurance: Mathematics and Economics, Elsevier, vol. 91(C), pages 68-84.
    18. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    19. Zhengyi Zhou & David S. Matteson & Dawn B. Woodard & Shane G. Henderson & Athanasios C. Micheas, 2015. "A Spatio-Temporal Point Process Model for Ambulance Demand," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 6-15, March.
    20. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0030122. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.