IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-33194-z.html
   My bibliography  Save this article

Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA

Author

Listed:
  • Shengen Shawn Hu

    (University of Virginia)

  • Lin Liu

    (Shanghai Jiao Tong University)

  • Qi Li

    (Tsinghua University)

  • Wenjing Ma

    (University of Virginia
    Emory University)

  • Michael J. Guertin

    (University of Connecticut)

  • Clifford A. Meyer

    (Dana-Farber Cancer Institute
    Harvard T.H. Chan School of Public Health)

  • Ke Deng

    (Tsinghua University)

  • Tingting Zhang

    (University of Pittsburgh)

  • Chongzhi Zang

    (University of Virginia
    University of Virginia
    University of Virginia)

Abstract

Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases and not designed for analyzing single-cell data. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. SELMA can utilize internal mitochondrial DNA data to improve bias estimation. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. Furthermore, we show strong effects of intrinsic biases in single-cell ATAC-seq data, and develop the first single-cell ATAC-seq intrinsic bias correction model to improve cell clustering. SELMA can enhance the performance of existing bioinformatics tools and improve the analysis of both bulk and single-cell chromatin accessibility sequencing data.

Suggested Citation

  • Shengen Shawn Hu & Lin Liu & Qi Li & Wenjing Ma & Michael J. Guertin & Clifford A. Meyer & Ke Deng & Tingting Zhang & Chongzhi Zang, 2022. "Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33194-z
    DOI: 10.1038/s41467-022-33194-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-33194-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-33194-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chongzhi Zang & Tao Wang & Ke Deng & Bo Li & Sheng’en Hu & Qian Qin & Tengfei Xiao & Shihua Zhang & Clifford A. Meyer & Housheng Hansen He & Myles Brown & Jun S. Liu & Yang Xie & X. Shirley Liu, 2016. "High-dimensional genomic data bias correction and data integration using MANCIE," Nature Communications, Nature, vol. 7(1), pages 1-8, September.
    2. Ryan J. Smith & Hongpan Zhang & Shengen Shawn Hu & Theodora Yung & Roshane Francis & Lilian Lee & Mark W. Onaitis & Peter B. Dirks & Chongzhi Zang & Tae-Hee Kim, 2022. "Single-cell chromatin profiling of the primitive gut tube reveals regulatory dynamics underlying lineage fate decisions," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    3. John M. Williamson & Somnath Datta & Glen A. Satten, 2003. "Marginal Analyses of Clustered Data When Cluster Size Is Informative," Biometrics, The International Biometric Society, vol. 59(1), pages 36-42, March.
    4. Rongxin Fang & Sebastian Preissl & Yang Li & Xiaomeng Hou & Jacinta Lucero & Xinxin Wang & Amir Motamedi & Andrew K. Shiau & Xinzhu Zhou & Fangming Xie & Eran A. Mukamel & Kai Zhang & Yanxiao Zhang & , 2021. "Comprehensive analysis of single cell ATAC-seq data with SnapATAC," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    5. Jason D. Buenrostro & Beijing Wu & Ulrike M. Litzenburger & Dave Ruff & Michael L. Gonzales & Michael P. Snyder & Howard Y. Chang & William J. Greenleaf, 2015. "Single-cell chromatin accessibility reveals principles of regulatory variation," Nature, Nature, vol. 523(7561), pages 486-490, July.
    6. Shane Neph & Jeff Vierstra & Andrew B. Stergachis & Alex P. Reynolds & Eric Haugen & Benjamin Vernot & Robert E. Thurman & Sam John & Richard Sandstrom & Audra K. Johnson & Matthew T. Maurano & Richar, 2012. "An expansive human regulatory lexicon encoded in transcription factor footprints," Nature, Nature, vol. 489(7414), pages 83-90, September.
    7. Mette Bentsen & Philipp Goymann & Hendrik Schultheis & Kathrin Klee & Anastasiia Petrova & René Wiegandt & Annika Fust & Jens Preussner & Carsten Kuenne & Thomas Braun & Johnny Kim & Mario Looso, 2020. "ATAC-seq footprinting unravels kinetics of transcription factor binding during zygotic genome activation," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
    8. Jeff Vierstra & John Lazar & Richard Sandstrom & Jessica Halow & Kristen Lee & Daniel Bates & Morgan Diegel & Douglas Dunn & Fidencio Neri & Eric Haugen & Eric Rynes & Alex Reynolds & Jemma Nelson & A, 2020. "Global reference mapping of human transcription factor footprints," Nature, Nature, vol. 583(7818), pages 729-736, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alan Yue Yang Teo & Jordan W. Squair & Gregoire Courtine & Michael A. Skinnider, 2024. "Best practices for differential accessibility analysis in single-cell epigenomics," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    3. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    4. Kathleen Shah & Muralidhara Rao Maradana & M. Joaquina Delàs & Amina Metidji & Frederike Graelmann & Miriam Llorian & Probir Chakravarty & Ying Li & Mauro Tolaini & Michael Shapiro & Gavin Kelly & Chr, 2022. "Cell-intrinsic Aryl Hydrocarbon Receptor signalling is required for the resolution of injury-induced colonic stem cells," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    5. Lei Xiong & Kang Tian & Yuzhe Li & Weixi Ning & Xin Gao & Qiangfeng Cliff Zhang, 2022. "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    6. Samir Rachid Zaim & Mark-Phillip Pebworth & Imran McGrath & Lauren Okada & Morgan Weiss & Julian Reading & Julie L. Czartoski & Troy R. Torgerson & M. Juliana McElrath & Thomas F. Bumol & Peter J. Ske, 2024. "MOCHA’s advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts," Nature Communications, Nature, vol. 15(1), pages 1-24, December.
    7. Yuyan Cheng & Yuqin Yin & Alice Zhang & Alexander M. Bernstein & Riki Kawaguchi & Kun Gao & Kyra Potter & Hui-Ya Gilbert & Yan Ao & Jing Ou & Catherine J. Fricano-Kugler & Jeffrey L. Goldberg & Zhigan, 2022. "Transcription factor network analysis identifies REST/NRSF as an intrinsic regulator of CNS regeneration in mice," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    8. Alexendar R. Perez & Laura Sala & Richard K. Perez & Joana A. Vidigal, 2021. "CSC software corrects off-target mediated gRNA depletion in CRISPR-Cas9 essentiality screens," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    9. Jaakko Nevalainen & Somnath Datta & Hannu Oja, 2014. "Inference on the marginal distribution of clustered data with informative cluster size," Statistical Papers, Springer, vol. 55(1), pages 71-92, February.
    10. Ying Huang & Brian Leroux, 2011. "Informative Cluster Sizes for Subcluster-Level Covariates and Weighted Generalized Estimating Equations," Biometrics, The International Biometric Society, vol. 67(3), pages 843-851, September.
    11. Sally Hunsberger & Lori Long & Sarah E. Reese & Gloria H. Hong & Ian A. Myles & Christa S. Zerbe & Pleonchan Chetchotisakd & Joanna H. Shih, 2022. "Rank correlation inferences for clustered data with small sample size," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 76(3), pages 309-330, August.
    12. Somnath Datta & Glen A. Satten, 2008. "A Signed-Rank Test for Clustered Data," Biometrics, The International Biometric Society, vol. 64(2), pages 501-507, June.
    13. You-Gan Wang & Yudong Zhao, 2008. "Weighted Rank Regression for Clustered Data Analysis," Biometrics, The International Biometric Society, vol. 64(1), pages 39-45, March.
    14. Aaron T L Lun & Hervé Pagès & Mike L Smith, 2018. "beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types," PLOS Computational Biology, Public Library of Science, vol. 14(5), pages 1-15, May.
    15. Seong Kyu Han & Michelle T. McNulty & Christopher J. Benway & Pei Wen & Anya Greenberg & Ana C. Onuchic-Whitford & Dongkeun Jang & Jason Flannick & Noël P. Burtt & Parker C. Wilson & Benjamin D. Humph, 2023. "Mapping genomic regulation of kidney disease and traits through high-resolution and interpretable eQTLs," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    16. Kaiwen Wang & Yuqiu Yang & Fangjiang Wu & Bing Song & Xinlei Wang & Tao Wang, 2023. "Comparative analysis of dimension reduction methods for cytometry by time-of-flight data," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    17. Felicia Lazure & Rick Farouni & Korin Sahinyan & Darren M. Blackburn & Aldo Hernández-Corchado & Gabrielle Perron & Tianyuan Lu & Adrien Osakwe & Jiannis Ragoussis & Colin Crist & Theodore J. Perkins , 2023. "Transcriptional reprogramming of skeletal muscle stem cells by the niche environment," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    18. Yuki Matsushita & Jialin Liu & Angel Ka Yan Chu & Chiaki Tsutsumi-Arai & Mizuki Nagata & Yuki Arai & Wanida Ono & Kouhei Yamamoto & Thomas L. Saunders & Joshua D. Welch & Noriaki Ono, 2023. "Bone marrow endosteal stem cells dictate active osteogenesis and aggressive tumorigenesis," Nature Communications, Nature, vol. 14(1), pages 1-23, December.
    19. Chun Yin Lee & Kin Yau Wong & Kwok Fai Lam & Dipankar Bandyopadhyay, 2023. "A semiparametric joint model for cluster size and subunit‐specific interval‐censored outcomes," Biometrics, The International Biometric Society, vol. 79(3), pages 2010-2022, September.
    20. Jaakko Nevalainen & Denis Larocque & Hannu Oja, 2007. "A weighted spatial median for clustered data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 15(3), pages 355-379, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33194-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.