IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v111y2016i515p967-987.html
   My bibliography  Save this article

Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution

Author

Listed:
  • Alexander W. Blocker
  • Edoardo M. Airoldi

Abstract

We consider the problem of estimating the genome-wide distribution of nucleosome positions from paired-end sequencing data. We develop a modeling approach based on nonparametric templates to control for the variability along the sequence of read counts associated with nucleosomal DNA due to enzymatic digestion and other sample preparation steps, and we develop a calibrated Bayesian method to detect local concentrations of nucleosome positions. We also introduce a set of estimands that provides rich, interpretable summaries of nucleosome positioning. Inference is carried out via a distributed Hamiltonian Monte Carlo algorithm that can scale linearly with the length of the genome being analyzed. We provide MPI-based Python implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire Saccharomyces cerevisiae genome in less than 1 hr on EC2. We evaluate the accuracy and reproducibility of the inferences leveraging a factorially designed simulation study and experimental replicates. The template-based approach we develop here is also applicable to single-end sequencing data by using alternative sources of fragment length information, and to ordered and sequential data more generally. It provides a flexible and scalable alternative to mixture models, hidden Markov models, and Parzen-window methods. Supplementary materials for this article are available online.

Suggested Citation

  • Alexander W. Blocker & Edoardo M. Airoldi, 2016. "Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 967-987, July.
  • Handle: RePEc:taf:jnlasa:v:111:y:2016:i:515:p:967-987
    DOI: 10.1080/01621459.2016.1141095
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2016.1141095
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2016.1141095?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Istvan Albert & Travis N. Mavrich & Lynn P. Tomsho & Ji Qi & Sara J. Zanton & Stephan C. Schuster & B. Franklin Pugh, 2007. "Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome," Nature, Nature, vol. 446(7135), pages 572-576, March.
    2. Mayetri Gupta, 2007. "Generalized Hierarchical Markov Models for the Discovery of Length-Constrained Sequence Features from Genome Tiling Arrays," Biometrics, The International Biometric Society, vol. 63(3), pages 797-805, September.
    3. Wei Sun & Wei Xie & Feng Xu & Michael Grunstein & Ker-Chau Li, 2009. "Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model," PLOS ONE, Public Library of Science, vol. 4(3), pages 1-10, March.
    4. Guo-Cheng Yuan & Jun S Liu, 2008. "Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion," PLOS Computational Biology, Public Library of Science, vol. 4(1), pages 1-11, January.
    5. Eran Segal & Yvonne Fondufe-Mittendorf & Lingyi Chen & AnnChristine Thåström & Yair Field & Irene K. Moore & Ji-Ping Z. Wang & Jonathan Widom, 2006. "A genomic code for nucleosome positioning," Nature, Nature, vol. 442(7104), pages 772-778, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Moser Carlee & Gupta Mayetri, 2012. "A Generalized Hidden Markov Model for Determining Sequence-based Predictors of Nucleosome Positioning," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-23, January.
    2. Wolfram Möbius & Ulrich Gerland, 2010. "Quantitative Test of the Barrier Nucleosome Model for Statistical Positioning of Nucleosomes Up- and Downstream of Transcription Start Sites," PLOS Computational Biology, Public Library of Science, vol. 6(8), pages 1-11, August.
    3. Ji-Ping Wang & Yvonne Fondufe-Mittendorf & Liqun Xi & Guei-Feng Tsai & Eran Segal & Jonathan Widom, 2008. "Preferentially Quantized Linker DNA Lengths in Saccharomyces cerevisiae," PLOS Computational Biology, Public Library of Science, vol. 4(9), pages 1-10, September.
    4. Zing Tsung-Yeh Tsai & Shin-Han Shiu & Huai-Kuang Tsai, 2015. "Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast," PLOS Computational Biology, Public Library of Science, vol. 11(8), pages 1-22, August.
    5. Guo-Cheng Yuan & Jun S Liu, 2008. "Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion," PLOS Computational Biology, Public Library of Science, vol. 4(1), pages 1-11, January.
    6. Wei Chen & Hao Lin & Peng-Mian Feng & Chen Ding & Yong-Chun Zuo & Kuo-Chen Chou, 2012. "iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    7. Segal Mark R, 2008. "Re-Cracking the Nucleosome Positioning Code," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-24, April.
    8. Monica Naughtin & Zofia Haftek-Terreau & Johan Xavier & Sam Meyer & Maud Silvain & Yan Jaszczyszyn & Nicolas Levy & Vincent Miele & Mohamed Salah Benleulmi & Marc Ruff & Vincent Parissi & Cédric Vaill, 2015. "DNA Physical Properties and Nucleosome Positions Are Major Determinants of HIV-1 Integrase Selectivity," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-28, June.
    9. Anthony Mathelier & Wyeth W Wasserman, 2013. "The Next Generation of Transcription Factor Binding Site Prediction," PLOS Computational Biology, Public Library of Science, vol. 9(9), pages 1-18, September.
    10. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    11. Matti Annala & Kirsti Laurila & Harri Lähdesmäki & Matti Nykter, 2011. "A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-13, May.
    12. Joke J F A van Vugt & Martijn de Jager & Magdalena Murawska & Alexander Brehm & John van Noort & Colin Logie, 2009. "Multiple Aspects of ATP-Dependent Nucleosome Translocation by RSC and Mi-2 Are Directed by the Underlying DNA Sequence," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-14, July.
    13. Fang Liu & Eivind Tøstesen & Jostein K Sundet & Tor-Kristian Jenssen & Christoph Bock & Geir Ivar Jerstad & William G Thilly & Eivind Hovig, 2007. "The Human Genomic Melting Map," PLOS Computational Biology, Public Library of Science, vol. 3(5), pages 1-13, May.
    14. Jiayi Fan & Andrew T. Moreno & Alexander S. Baier & Joseph J. Loparo & Craig L. Peterson, 2022. "H2A.Z deposition by SWR1C involves multiple ATP-dependent steps," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    15. Iksoo Huh & Isabel Mendizabal & Taesung Park & Soojin V Yi, 2018. "Functional conservation of sequence determinants at rapidly evolving regulatory regions across mammals," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-21, October.
    16. Harsh Nagpal & Ahmad Ali-Ahmad & Yasuhiro Hirano & Wei Cai & Mario Halic & Tatsuo Fukagawa & Nikolina Sekulić & Beat Fierz, 2023. "CENP-A and CENP-B collaborate to create an open centromeric chromatin state," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    17. Kroll, K.M. & Ferrantini, A. & Domany, E., 2010. "Introduction to biology and chromosomal instabilities in cancer," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(20), pages 4374-4388.
    18. Xianfu Yi & Yu-Dong Cai & Zhisong He & WeiRen Cui & Xiangyin Kong, 2010. "Prediction of Nucleosome Positioning Based on Transcription Factor Binding Sites," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-7, September.
    19. Behrouz Eslami-Mossallam & Raoul D Schram & Marco Tompitak & John van Noort & Helmut Schiessel, 2016. "Multiplexing Genetic and Nucleosome Positioning Codes: A Computational Approach," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-14, June.
    20. Shuxiang Li & Tiejun Wei & Anna R. Panchenko, 2023. "Histone variant H2A.Z modulates nucleosome dynamics to promote DNA accessibility," Nature Communications, Nature, vol. 14(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:111:y:2016:i:515:p:967-987. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.