IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-50708-z.html
   My bibliography  Save this article

Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals

Author

Listed:
  • Jin Woo Oh

    (Johns Hopkins University)

  • Michael A. Beer

    (Johns Hopkins University)

Abstract

Gene regulatory elements drive complex biological phenomena and their mutations are associated with common human diseases. The impacts of human regulatory variants are often tested using model organisms such as mice. However, mapping human enhancers to conserved elements in mice remains a challenge, due to both rapid enhancer evolution and limitations of current computational methods. We analyze distal enhancers across 45 matched human/mouse cell/tissue pairs from a comprehensive dataset of DNase-seq experiments, and show that while cell-specific regulatory vocabulary is conserved, enhancers evolve more rapidly than promoters and CTCF binding sites. Enhancer conservation rates vary across cell types, in part explainable by tissue specific transposable element activity. We present an improved genome alignment algorithm using gapped-kmer features, called gkm-align, and make genome wide predictions for 1,401,803 orthologous regulatory elements. We show that gkm-align discovers 23,660 novel human/mouse conserved enhancers missed by previous algorithms, with strong evidence of conserved functional activity.

Suggested Citation

  • Jin Woo Oh & Michael A. Beer, 2024. "Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-50708-z
    DOI: 10.1038/s41467-024-50708-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-50708-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-50708-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Julien Pontis & Cyril Pulver & Christopher J. Playfoot & Evarist Planet & Delphine Grun & Sandra Offner & Julien Duc & Andrea Manfrin & Matthias P. Lutolf & Didier Trono, 2022. "Primate-specific transposable elements shape transcriptional networks during human development," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    2. Jesse R. Dixon & Siddarth Selvaraj & Feng Yue & Audrey Kim & Yan Li & Yin Shen & Ming Hu & Jun S. Liu & Bing Ren, 2012. "Topological domains in mammalian genomes identified by analysis of chromatin interactions," Nature, Nature, vol. 485(7398), pages 376-380, May.
    3. Yong Cheng & Zhihai Ma & Bong-Hyun Kim & Weisheng Wu & Philip Cayting & Alan P. Boyle & Vasavi Sundaram & Xiaoyun Xing & Nergiz Dogan & Jingjing Li & Ghia Euskirchen & Shin Lin & Yiing Lin & Axel Vise, 2014. "Principles of regulatory information conservation between mouse and human," Nature, Nature, vol. 515(7527), pages 371-375, November.
    4. Marco Osterwalder & Iros Barozzi & Virginie Tissières & Yoko Fukuda-Yuzawa & Brandon J. Mannion & Sarah Y. Afzal & Elizabeth A. Lee & Yiwen Zhu & Ingrid Plajzer-Frick & Catherine S. Pickle & Momoe Kat, 2018. "Enhancer redundancy provides phenotypic robustness in mammalian development," Nature, Nature, vol. 554(7691), pages 239-243, February.
    5. Len A. Pennacchio & Nadav Ahituv & Alan M. Moses & Shyam Prabhakar & Marcelo A. Nobrega & Malak Shoukry & Simon Minovitsky & Inna Dubchak & Amy Holt & Keith D. Lewis & Ingrid Plajzer-Frick & Jennifer , 2006. "In vivo enhancer analysis of human conserved non-coding sequences," Nature, Nature, vol. 444(7118), pages 499-502, November.
    6. Feng Yue & Yong Cheng & Alessandra Breschi & Jeff Vierstra & Weisheng Wu & Tyrone Ryba & Richard Sandstrom & Zhihai Ma & Carrie Davis & Benjamin D. Pope & Yin Shen & Dmitri D. Pervouchine & Sarah Djeb, 2014. "A comparative encyclopedia of DNA elements in the mouse genome," Nature, Nature, vol. 515(7527), pages 355-364, November.
    7. Joel Armstrong & Glenn Hickey & Mark Diekhans & Ian T. Fiddes & Adam M. Novak & Alden Deran & Qi Fang & Duo Xie & Shaohong Feng & Josefin Stiller & Diane Genereux & Jeremy Johnson & Voichita Dana Mari, 2020. "Progressive Cactus is a multiple-genome aligner for the thousand-genome era," Nature, Nature, vol. 587(7833), pages 246-251, November.
    8. Jian Yan & Yunjiang Qiu & André M. Ribeiro dos Santos & Yimeng Yin & Yang E. Li & Nick Vinckier & Naoki Nariai & Paola Benaglio & Anugraha Raman & Xiaoyu Li & Shicai Fan & Joshua Chiou & Fulin Chen & , 2021. "Systematic analysis of binding of transcription factors to noncoding variants," Nature, Nature, vol. 591(7848), pages 147-151, March.
    9. Hyo Jung Kang & Yuka Imamura Kawasawa & Feng Cheng & Ying Zhu & Xuming Xu & Mingfeng Li & André M. M. Sousa & Mihovil Pletikos & Kyle A. Meyer & Goran Sedmak & Tobias Guennel & Yurae Shin & Matthew B., 2011. "Spatio-temporal transcriptome of the human brain," Nature, Nature, vol. 478(7370), pages 483-489, October.
    10. Charles G. Mullighan & Salil Goorha & Ina Radtke & Christopher B. Miller & Elaine Coustan-Smith & James D. Dalton & Kevin Girtman & Susan Mathew & Jing Ma & Stanley B. Pounds & Xiaoping Su & Ching-Hon, 2007. "Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia," Nature, Nature, vol. 446(7137), pages 758-764, April.
    11. Wang Xi & Michael A. Beer, 2021. "Loop competition and extrusion model predicts CTCF interaction specificity," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    12. Mahmoud Ghandi & Dongwon Lee & Morteza Mohammad-Noori & Michael A Beer, 2014. "Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features," PLOS Computational Biology, Public Library of Science, vol. 10(7), pages 1-15, July.
    13. Scott Smemo & Juan J. Tena & Kyoung-Han Kim & Eric R. Gamazon & Noboru J. Sakabe & Carlos Gómez-Marín & Ivy Aneas & Flavia L. Credidio & Débora R. Sobreira & Nora F. Wasserman & Ju Hee Lee & Vijitha P, 2014. "Obesity-associated variants within FTO form long-range functional connections with IRX3," Nature, Nature, vol. 507(7492), pages 371-375, March.
    14. David Brawand & Magali Soumillon & Anamaria Necsulea & Philippe Julien & Gábor Csárdi & Patrick Harrigan & Manuela Weier & Angélica Liechti & Ayinuer Aximu-Petri & Martin Kircher & Frank W. Albert & U, 2011. "The evolution of gene expression levels in mammalian organs," Nature, Nature, vol. 478(7369), pages 343-348, October.
    15. Dmitri D. Pervouchine & Sarah Djebali & Alessandra Breschi & Carrie A. Davis & Pablo Prieto Barja & Alex Dobin & Andrea Tanzer & Julien Lagarde & Chris Zaleski & Lei-Hoon See & Meagan Fastuca & Jorg D, 2015. "Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression," Nature Communications, Nature, vol. 6(1), pages 1-11, May.
    16. Martin Kircher & Chenling Xiong & Beth Martin & Max Schubach & Fumitaka Inoue & Robert J. A. Bell & Joseph F. Costello & Jay Shendure & Nadav Ahituv, 2019. "Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution," Nature Communications, Nature, vol. 10(1), pages 1-15, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Samuel S. Kim & Buu Truong & Karthik Jagadeesh & Kushal K. Dey & Amber Z. Shen & Soumya Raychaudhuri & Manolis Kellis & Alkes L. Price, 2024. "Leveraging single-cell ATAC-seq and RNA-seq to identify disease-critical fetal and adult brain cell types," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    2. Andrea Wilderman & Eva D’haene & Machteld Baetens & Tara N. Yankee & Emma Wentworth Winchester & Nicole Glidden & Ellen Roets & Jo Dorpe & Sandra Janssens & Danny E. Miller & Miranda Galey & Kari M. B, 2024. "A distant global control region is essential for normal expression of anterior HOXA genes during mouse and human craniofacial development," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    3. Kashi Raj Bhattarai & Robert J. Mobley & Kelly R. Barnett & Daniel C. Ferguson & Baranda S. Hansen & Jonathan D. Diedrich & Brennan P. Bergeron & Satoshi Yoshimura & Wenjian Yang & Kristine R. Crews &, 2024. "Investigation of inherited noncoding genetic variation impacting the pharmacogenomics of childhood acute lymphoblastic leukemia treatment," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    4. Victor Lopez Soriano & Alfredo Dueñas Rey & Rajarshi Mukherjee & Frauke Coppieters & Miriam Bauwens & Andy Willaert & Elfride De Baere, 2024. "Multi-omics analysis in human retina uncovers ultraconserved cis-regulatory elements at rare eye disease loci," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    5. Botong Zhou & Ping Hu & Guichun Liu & Zhou Chang & Zhiwei Dong & Zihe Li & Yuan Yin & Zunzhe Tian & Ge Han & Wen Wang & Xueyan Li, 2024. "Evolutionary patterns and functional effects of 3D chromatin structures in butterflies with extensive genome rearrangements," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    6. Orshay Gabay & Yoav Shoshan & Eli Kopel & Udi Ben-Zvi & Tomer D. Mann & Noam Bressler & Roni Cohen‐Fultheim & Amos A. Schaffer & Shalom Hillel Roth & Ziv Tzur & Erez Y. Levanon & Eli Eisenberg, 2022. "Landscape of adenosine-to-inosine RNA recoding across human tissues," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    7. Christopher Chase Bolt & Lucille Lopez-Delisle & Aurélie Hintermann & Bénédicte Mascrez & Antonella Rauseo & Guillaume Andrey & Denis Duboule, 2022. "Context-dependent enhancer function revealed by targeted inter-TAD relocation," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    8. Matthieu Santos & Stéphanie Backer & Frédéric Auradé & Matthew Man-Kin Wong & Maud Wurmser & Rémi Pierre & Francina Langa & Marcio Cruzeiro & Alain Schmitt & Jean-Paul Concordet & Athanassia Sotiropou, 2022. "A fast Myosin super enhancer dictates muscle fiber phenotype through competitive interactions with Myosin genes," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    9. Alan Y. Du & Jason D. Chobirko & Xiaoyu Zhuo & Cédric Feschotte & Ting Wang, 2024. "Regulatory transposable elements in the encyclopedia of DNA elements," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    10. Long Jin & Danyang Wang & Jiaman Zhang & Pengliang Liu & Yujie Wang & Yu Lin & Can Liu & Ziyin Han & Keren Long & Diyan Li & Yu Jiang & Guisen Li & Yu Zhang & Jingyi Bai & Xiaokai Li & Jing Li & Lu Lu, 2023. "Dynamic chromatin architecture of the porcine adipose tissues with weight gain and loss," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    11. Arthur S. Lee & Lauren J. Ayers & Michael Kosicki & Wai-Man Chan & Lydia N. Fozo & Brandon M. Pratt & Thomas E. Collins & Boxun Zhao & Matthew F. Rose & Alba Sanchis-Juan & Jack M. Fu & Isaac Wong & X, 2024. "A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders," Nature Communications, Nature, vol. 15(1), pages 1-26, December.
    12. Abrar Aljahani & Peng Hua & Magdalena A. Karpinska & Kimberly Quililan & James O. J. Davies & A. Marieke Oudelaar, 2022. "Analysis of sub-kilobase chromatin topology reveals nano-scale regulatory interactions with variable dependence on cohesin and CTCF," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    13. Sandra Kessler & Maryline Minoux & Onkar Joshi & Yousra Zouari & Sebastien Ducret & Fiona Ross & Nathalie Vilain & Adwait Salvi & Joachim Wolff & Hubertus Kohler & Michael B. Stadler & Filippo M. Rijl, 2023. "A multiple super-enhancer region establishes inter-TAD interactions and controls Hoxa function in cranial neural crest," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    14. Markus Götz & Olivier Messina & Sergio Espinola & Jean-Bernard Fiche & Marcelo Nollmann, 2022. "Multiple parameters shape the 3D chromatin structure of single nuclei at the doc locus in Drosophila," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    15. Zhen-Hui Wang & Xin-Feng Wang & Tianyuan Lu & Ming-Rui Li & Peng Jiang & Jing Zhao & Si-Tong Liu & Xue-Qi Fu & Jonathan F. Wendel & Yves Peer & Bao Liu & Lin-Feng Li, 2022. "Reshuffling of the ancestral core-eudicot genome shaped chromatin topology and epigenetic modification in Panax," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    16. Matthias Wielscher & Pooja R. Mandaviya & Brigitte Kuehnel & Roby Joehanes & Rima Mustafa & Oliver Robinson & Yan Zhang & Barbara Bodinier & Esther Walton & Pashupati P. Mishra & Pascal Schlosser & Ro, 2022. "DNA methylation signature of chronic low-grade inflammation and its role in cardio-respiratory diseases," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    17. Liang-Yu Fu & Tao Zhu & Xinkai Zhou & Ranran Yu & Zhaohui He & Peijing Zhang & Zhigui Wu & Ming Chen & Kerstin Kaufmann & Dijun Chen, 2022. "ChIP-Hub provides an integrative platform for exploring plant regulome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    18. Bhuwan Khatri & Kandice L. Tessneer & Astrid Rasmussen & Farhang Aghakhanian & Tove Ragna Reksten & Adam Adler & Ilias Alevizos & Juan-Manuel Anaya & Lara A. Aqrawi & Eva Baecklund & Johan G. Brun & S, 2022. "Genome-wide association study identifies Sjögren’s risk loci with functional implications in immune and glandular cells," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    19. Siobhan Rice & Thomas Jackson & Nicholas T. Crump & Nicholas Fordham & Natalina Elliott & Sorcha O’Byrne & Maria del Mar Lara Fanego & Dilys Addy & Trisevgeni Crabb & Carryl Dryden & Sarah Inglott & D, 2021. "A human fetal liver-derived infant MLL-AF4 acute lymphoblastic leukemia model reveals a distinct fetal gene expression program," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    20. Leon D. Lotter & Amin Saberi & Justine Y. Hansen & Bratislav Misic & Casey Paquola & Gareth J. Barker & Arun L. W. Bokde & Sylvane Desrivières & Herta Flor & Antoine Grigis & Hugh Garavan & Penny Gowl, 2024. "Regional patterns of human cortex development correlate with underlying neurobiology," Nature Communications, Nature, vol. 15(1), pages 1-21, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-50708-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.