IDEAS home Printed from https://ideas.repec.org/a/bpj/ijbist/v5y2009i1n14.html
   My bibliography  Save this article

A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes

Author

Listed:
  • Hsieh Fushing

    (University of California, Davis)

  • Chen Shu-Chun

    (Academia Sinica)

  • Pollard Katherine

    (University of California, San Francisco)

Abstract

CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island "core" is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island "core," a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21.

Suggested Citation

  • Hsieh Fushing & Chen Shu-Chun & Pollard Katherine, 2009. "A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes," The International Journal of Biostatistics, De Gruyter, vol. 5(1), pages 1-24, May.
  • Handle: RePEc:bpj:ijbist:v:5:y:2009:i:1:n:14
    DOI: 10.2202/1557-4679.1158
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1557-4679.1158
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1557-4679.1158?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Travis J. Berge & Shu-Chun Chen & Hsieh Fushing & Òscar Jordà, 2010. "A chronology of international business cycles through non-parametric decoding," Research Working Paper RWP 11-13, Federal Reserve Bank of Kansas City.
    2. Hsieh Fushing & Shu-Chun Chen & Travis J. Berge & Òscar Jordà, 2010. "A chronology of international business cycles through non-parametric decoding," Research Working Paper RWP 11-13, Federal Reserve Bank of Kansas City.
    3. Shu-Chun Chen & Hsieh Fushing & Chii-Ruey Hwang, 2013. "Discovering focal regions of slightly-aggregated sparse signals," Computational Statistics, Springer, vol. 28(5), pages 2295-2308, October.
    4. Singer Meromit & Engström Alexander & Schönhuth Alexander & Pachter Lior, 2011. "Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-27, September.
    5. Hsieh Fushing & Shu-Chun Chen & Chii-Ruey Hwang, 2014. "Single Stock Dynamics on High-Frequency Data: From a Compressed Coding Perspective," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:ijbist:v:5:y:2009:i:1:n:14. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.