IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v10y2011i1n43.html
   My bibliography  Save this article

Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains

Author

Listed:
  • Singer Meromit
  • Engström Alexander
  • Schönhuth Alexander
  • Pachter Lior

Abstract

Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account.We present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate.We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.

Suggested Citation

  • Singer Meromit & Engström Alexander & Schönhuth Alexander & Pachter Lior, 2011. "Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-27, September.
  • Handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:43
    DOI: 10.2202/1544-6115.1677
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1677
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1677?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ryan Lister & Mattia Pelizzola & Robert H. Dowen & R. David Hawkins & Gary Hon & Julian Tonti-Filippini & Joseph R. Nery & Leonard Lee & Zhen Ye & Que-Minh Ngo & Lee Edsall & Jessica Antosiewicz-Bourg, 2009. "Human DNA methylomes at base resolution show widespread epigenomic differences," Nature, Nature, vol. 462(7271), pages 315-322, November.
    2. Nuel Gregory, 2006. "Numerical Solutions for Patterns Statistics on Markov Chains," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-45, October.
    3. Vergne Nicolas, 2008. "Drifting Markov Models with Polynomial Drift and Applications to DNA Sequences," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-45, February.
    4. Hsieh Fushing & Chen Shu-Chun & Pollard Katherine, 2009. "A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes," The International Journal of Biostatistics, De Gruyter, vol. 5(1), pages 1-24, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zengyu Shao & Jiuwei Lu & Nelli Khudaverdyan & Jikui Song, 2024. "Multi-layered heterochromatin interaction as a switch for DIM2-mediated DNA methylation," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    2. Ana Helena Tavares & Jakob Raymaekers & Peter J. Rousseeuw & Paula Brito & Vera Afreixo, 2020. "Clustering genomic words in human DNA using peaks and trends of distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 57-76, March.
    3. Xuelong Yao & Zongyang Lu & Zhanying Feng & Lei Gao & Xin Zhou & Min Li & Suijuan Zhong & Qian Wu & Zhenbo Liu & Haofeng Zhang & Zeyuan Liu & Lizhi Yi & Tao Zhou & Xudong Zhao & Jun Zhang & Yong Wang , 2022. "Comparison of chromatin accessibility landscapes during early development of prefrontal cortex between rhesus macaque and human," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Rakesh Chettier & Lesa Nelson & James W Ogilvie & Hans M Albertsen & Kenneth Ward, 2015. "Haplotypes at LBX1 Have Distinct Inheritance Patterns with Opposite Effects in Adolescent Idiopathic Scoliosis," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-11, February.
    5. Xue Yue & Zhiyuan Xie & Moran Li & Kai Wang & Xiaojing Li & Xiaoqing Zhang & Jian Yan & Yimeng Yin, 2022. "Simultaneous profiling of histone modifications and DNA methylation via nanopore sequencing," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    6. Travis J. Berge & Shu-Chun Chen & Hsieh Fushing & Òscar Jordà, 2010. "A chronology of international business cycles through non-parametric decoding," Research Working Paper RWP 11-13, Federal Reserve Bank of Kansas City.
    7. Anyou Wang & Ying Du & Qianchuan He & Chunxiao Zhou, 2013. "A Quantitative System for Discriminating Induced Pluripotent Stem Cells, Embryonic Stem Cells and Somatic Cells," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-10, February.
    8. Sébastien Massoni & Madalina Olteanu & Patrick Rousset, 2010. "Career-path analysis using drifting Markov models (DMM) and self-organizing maps," Post-Print hal-00443530, HAL.
    9. Yu Xiaoqing & Sun Shuying, 2016. "Comparing five statistical methods of differential methylation identification using bisulfite sequencing data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(2), pages 173-191, April.
    10. Jian Fang & Jianjun Jiang & Sarah M. Leichter & Jie Liu & Mahamaya Biswal & Nelli Khudaverdyan & Xuehua Zhong & Jikui Song, 2022. "Mechanistic basis for maintenance of CHG DNA methylation in plants," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Allegra Angeloni & Skye Fissette & Deniz Kaya & Jillian M. Hammond & Hasindu Gamaarachchi & Ira W. Deveson & Robert J. Klose & Weiming Li & Xiaotian Zhang & Ozren Bogdanovic, 2024. "Extensive DNA methylome rearrangement during early lamprey embryogenesis," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    12. Jason A. Carter & Léonie Strömich & Matthew Peacey & Sarah R. Chapin & Lars Velten & Lars M. Steinmetz & Benedikt Brors & Sheena Pinto & Hannah V. Meyer, 2022. "Transcriptomic diversity in human medullary thymic epithelial cells," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    13. Yousif Alyousifi & Kamarulzaman Ibrahim & Mahmod Othamn & Wan Zawiah Wan Zin & Nicolas Vergne & Abdullah Al-Yaari, 2022. "Bayesian Information Criterion for Fitting the Optimum Order of Markov Chain Models: Methodology and Application to Air Pollution Data," Mathematics, MDPI, vol. 10(13), pages 1-16, June.
    14. Xusheng Zhang & Xintong Gao & Zhen Liu & Fei Shao & Dou Yu & Min Zhao & Xiwen Qin & Shuo Wang, 2024. "Microbiota regulates the TET1-mediated DNA hydroxymethylation program in innate lymphoid cell differentiation," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    15. Lacey Michelle R. & Baribault Carl & Ehrlich Melanie, 2013. "Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(6), pages 723-742, December.
    16. Ruth V. Nichols & Brendan L. O’Connell & Ryan M. Mulqueen & Jerushah Thomas & Ashley R. Woodfin & Sonia Acharya & Gail Mandel & Dmitry Pokholok & Frank J. Steemers & Andrew C. Adey, 2022. "High-throughput robust single-cell DNA methylation profiling with sciMETv2," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    17. Ihab Ansari & Llorenç Solé-Boldo & Meshi Ridnik & Julian Gutekunst & Oliver Gilliam & Maria Korshko & Timur Liwinski & Birgit Jickeli & Noa Weinberg-Corem & Michal Shoshkes-Carmel & Eli Pikarsky & Era, 2023. "TET2 and TET3 loss disrupts small intestine differentiation and homeostasis," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    18. Jamie L. Endicott & Paula A. Nolte & Hui Shen & Peter W. Laird, 2022. "Cell division drives DNA methylation loss in late-replicating domains in primary human cells," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    19. Guodong Wu & Nengjun Yi & Devin Absher & Degui Zhi, 2011. "Statistical Quantification of Methylation Levels by Next-Generation Sequencing," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-12, June.
    20. Brendan Evano & Diljeet Gill & Irene Hernando-Herraez & Glenda Comai & Thomas M Stubbs & Pierre-Henri Commere & Wolf Reik & Shahragim Tajbakhsh, 2020. "Transcriptome and epigenome diversity and plasticity of muscle stem cells following transplantation," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-21, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:43. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.