IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-23143-7.html
   My bibliography  Save this article

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Author

Listed:
  • Mathys Grapotte

    (Institut de Biologie Computationnelle
    University of Montpellier, CNRS
    Translational Sciences)

  • Manu Saraswat

    (Institut de Biologie Computationnelle
    University of Montpellier, CNRS)

  • Chloé Bessière

    (Institut de Biologie Computationnelle
    University of Montpellier, CNRS)

  • Christophe Menichelli

    (Institut de Biologie Computationnelle
    Univ Montpellier, CNRS)

  • Jordan A. Ramilowski

    (RIKEN Center for Integrative Medical Sciences)

  • Jessica Severin

    (RIKEN Center for Integrative Medical Sciences)

  • Yoshihide Hayashizaki

    (RIKEN Preventive Medicine and Diagnosis Innovation Program)

  • Masayoshi Itoh

    (RIKEN Preventive Medicine and Diagnosis Innovation Program)

  • Michihira Tagami

    (RIKEN Center for Integrative Medical Sciences)

  • Mitsuyoshi Murata

    (RIKEN Center for Integrative Medical Sciences)

  • Miki Kojima-Ishiyama

    (RIKEN Center for Integrative Medical Sciences)

  • Shohei Noma

    (RIKEN Center for Integrative Medical Sciences)

  • Shuhei Noguchi

    (RIKEN Center for Integrative Medical Sciences)

  • Takeya Kasukawa

    (RIKEN Center for Integrative Medical Sciences)

  • Akira Hasegawa

    (RIKEN Center for Integrative Medical Sciences)

  • Harukazu Suzuki

    (RIKEN Center for Integrative Medical Sciences)

  • Hiromi Nishiyori-Sueki

    (RIKEN Center for Integrative Medical Sciences)

  • Martin C. Frith

    (AIST
    University of Tokyo
    AIST)

  • Clément Chatelain

    (Translational Sciences)

  • Piero Carninci

    (RIKEN Center for Integrative Medical Sciences)

  • Michiel J. L. Hoon

    (RIKEN Center for Integrative Medical Sciences)

  • Wyeth W. Wasserman

    (University of British Columbia)

  • Laurent Bréhélin

    (Institut de Biologie Computationnelle
    Univ Montpellier, CNRS)

  • Charles-Henri Lecellier

    (Institut de Biologie Computationnelle
    University of Montpellier, CNRS
    Univ Montpellier, CNRS)

Abstract

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.

Suggested Citation

  • Mathys Grapotte & Manu Saraswat & Chloé Bessière & Christophe Menichelli & Jordan A. Ramilowski & Jessica Severin & Yoshihide Hayashizaki & Masayoshi Itoh & Michihira Tagami & Mitsuyoshi Murata & Miki, 2021. "Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-23143-7
    DOI: 10.1038/s41467-021-23143-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-23143-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-23143-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yirong Shi & Yiwei Niu & Peng Zhang & Huaxia Luo & Shuai Liu & Sijia Zhang & Jiajia Wang & Yanyan Li & Xinyue Liu & Tingrui Song & Tao Xu & Shunmin He, 2023. "Characterization of genome-wide STR variation in 6487 human genomes," Nature Communications, Nature, vol. 14(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-23143-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.