IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0221760.html
   My bibliography  Save this article

Population size estimation for quality control of ChIP-Seq datasets

Author

Listed:
  • Semyon K Kolmykov
  • Yury V Kondrakhin
  • Ivan S Yevshin
  • Ruslan N Sharipov
  • Anna S Ryabova
  • Fedor A Kolpakov

Abstract

Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.

Suggested Citation

  • Semyon K Kolmykov & Yury V Kondrakhin & Ivan S Yevshin & Ruslan N Sharipov & Anna S Ryabova & Fedor A Kolpakov, 2019. "Population size estimation for quality control of ChIP-Seq datasets," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-17, August.
  • Handle: RePEc:plo:pone00:0221760
    DOI: 10.1371/journal.pone.0221760
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0221760
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0221760&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0221760?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Anne Chao & John Bunge, 2002. "Estimating the Number of Species in a Stochastic Abundance Model," Biometrics, The International Biometric Society, vol. 58(3), pages 531-539, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:jss:jstsof:40:i09 is not listed on IDEAS
    2. Zhang, Hongmei & Ghosh, Kaushik & Ghosh, Pulak, 2012. "Sampling designs via a multivariate hypergeometric-Dirichlet process model for a multi-species assemblage with unknown heterogeneity," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2562-2573.
    3. Dankmar Böhning & Rattana Lerdsuwansri & Patarawan Sangnawakij, 2023. "Modeling COVID‐19 contact‐tracing using the ratio regression capture–recapture approach," Biometrics, The International Biometric Society, vol. 79(4), pages 3818-3830, December.
    4. Balabdaoui, Fadoua & Kulagina, Yulia, 2020. "Completely monotone distributions: Mixing, approximation and estimation of number of species," Computational Statistics & Data Analysis, Elsevier, vol. 150(C).
    5. Chun-Huo Chiu, 2023. "A Richness Estimator Based on Integrated Data," Mathematics, MDPI, vol. 11(17), pages 1-24, September.
    6. Maria De Angelis & Maria Piccolo & Lucia Vannini & Sonya Siragusa & Andrea De Giacomo & Diana Isabella Serrazzanetti & Fernanda Cristofori & Maria Elisabetta Guerzoni & Marco Gobbetti & Ruggiero Franc, 2013. "Fecal Microbiota and Metabolome of Children with Autism and Pervasive Developmental Disorder Not Otherwise Specified," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-1, October.
    7. David R Blair & Kanix Wang & Svetlozar Nestorov & James A Evans & Andrey Rzhetsky, 2014. "Quantifying the Impact and Extent of Undocumented Biomedical Synonymy," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-17, September.
    8. Dankmar Böhning & Herwig Friedl, 2021. "Population size estimation based upon zero-truncated, one-inflated and sparse count data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1197-1217, October.
    9. Maria De Angelis & Eustacchio Montemurno & Maria Piccolo & Lucia Vannini & Gabriella Lauriero & Valentina Maranzano & Giorgia Gozzi & Diana Serrazanetti & Giuseppe Dalfino & Marco Gobbetti & Loreto Ge, 2014. "Microbiota and Metabolome Associated with Immunoglobulin A Nephropathy (IgAN)," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-15, June.
    10. Chang Xuan Mao & Nan Yang & Jinhua Zhong, 2013. "On Population Size Estimators in the Poisson Mixture Model," Biometrics, The International Biometric Society, vol. 69(3), pages 758-765, September.
    11. Jérôme A. Dupuis & Michel Goulard, 2011. "Estimating Species Richness from Quadrat Sampling Data: A General Approach," Biometrics, The International Biometric Society, vol. 67(4), pages 1489-1497, December.
    12. Dankmar Böhning & Ekkehart Dietz & Ronny Kuhnert & Dieter Schön, 2005. "Mixture models for capture-recapture count data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 14(1), pages 29-43, February.
    13. Xinyue Zhao & Mengjie Zhang & Zhilin Sun & Huabao Zheng & Qifa Zhou, 2023. "Anaerobic Storage Completely Removes Suspected Fungal Pathogens but Increases Antibiotic Resistance Gene Levels in Swine Wastewater High in Sulfonamides," IJERPH, MDPI, vol. 20(4), pages 1-11, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0221760. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.