IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v585y2020i7825d10.1038_s41586-020-2689-7.html
   My bibliography  Save this article

Identification of the human DPR core promoter element using machine learning

Author

Listed:
  • Long Vo ngoc

    (University of California, San Diego)

  • Cassidy Yunjing Huang

    (University of California, San Diego)

  • California Jack Cassidy

    (University of California, San Diego)

  • Claudia Medrano

    (University of California, San Diego)

  • James T. Kadonaga

    (University of California, San Diego)

Abstract

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription1–5, but the downstream core promoter in humans has been difficult to understand1–3. Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength. We then analysed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These results show that the DPR is a functionally important core promoter element that is widely used in human promoters. Notably, there appears to be a duality between the DPR and the TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.

Suggested Citation

  • Long Vo ngoc & Cassidy Yunjing Huang & California Jack Cassidy & Claudia Medrano & James T. Kadonaga, 2020. "Identification of the human DPR core promoter element using machine learning," Nature, Nature, vol. 585(7825), pages 459-463, September.
  • Handle: RePEc:nat:nature:v:585:y:2020:i:7825:d:10.1038_s41586-020-2689-7
    DOI: 10.1038/s41586-020-2689-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-020-2689-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-020-2689-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:585:y:2020:i:7825:d:10.1038_s41586-020-2689-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.