IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-35596-5.html
   My bibliography  Save this article

A method to build extended sequence context models of point mutations and indels

Author

Listed:
  • Jörn Bethune

    (Aarhus University Hospital
    Aarhus University)

  • April Kleppe

    (Aarhus University Hospital
    Aarhus University)

  • Søren Besenbacher

    (Aarhus University Hospital
    Aarhus University
    Aarhus University)

Abstract

The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint.

Suggested Citation

  • Jörn Bethune & April Kleppe & Søren Besenbacher, 2022. "A method to build extended sequence context models of point mutations and indels," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-35596-5
    DOI: 10.1038/s41467-022-35596-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-35596-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-35596-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Augustine Kong & Michael L. Frigge & Gisli Masson & Soren Besenbacher & Patrick Sulem & Gisli Magnusson & Sigurjon A. Gudjonsson & Asgeir Sigurdsson & Aslaug Jonasdottir & Adalbjorg Jonasdottir & Wend, 2012. "Rate of de novo mutations and the importance of father’s age to disease risk," Nature, Nature, vol. 488(7412), pages 471-475, August.
    2. Jedidiah Carlson & Adam E. Locke & Matthew Flickinger & Matthew Zawistowski & Shawn Levy & Richard M. Myers & Michael Boehnke & Hyun Min Kang & Laura J. Scott & Jun Z. Li & Sebastian Zöllner, 2018. "Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans," Nature Communications, Nature, vol. 9(1), pages 1-13, December.
    3. Konrad J. Karczewski & Laurent C. Francioli & Grace Tiao & Beryl B. Cummings & Jessica Alföldi & Qingbo Wang & Ryan L. Collins & Kristen M. Laricchia & Andrea Ganna & Daniel P. Birnbaum & Laura D. Gau, 2020. "The mutational constraint spectrum quantified from variation in 141,456 humans," Nature, Nature, vol. 581(7809), pages 434-443, May.
    4. D. G. MacArthur & T. A. Manolio & D. P. Dimmock & H. L. Rehm & J. Shendure & G. R. Abecasis & D. R. Adams & R. B. Altman & S. E. Antonarakis & E. A. Ashley & J. C. Barrett & L. G. Biesecker & D. F. Co, 2014. "Guidelines for investigating causality of sequence variants in human disease," Nature, Nature, vol. 508(7497), pages 469-476, April.
    5. Heng Li & Richard Durbin, 2011. "Inference of human population history from individual whole-genome sequences," Nature, Nature, vol. 475(7357), pages 493-496, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kian Hong Kock & Patrick K. Kimes & Stephen S. Gisselbrecht & Sachi Inukai & Sabrina K. Phanor & James T. Anderson & Gayatri Ramakrishnan & Colin H. Lipper & Dongyuan Song & Jesse V. Kurland & Julia M, 2024. "DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Scott D. Findlay & Lindsay Romo & Christopher B. Burge, 2024. "Quantifying negative selection in human 3ʹ UTRs uncovers constrained targets of RNA-binding proteins," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Bian Li & Dan M. Roden & John A. Capra, 2022. "The 3D mutational constraint on amino acid sites in the human proteome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Stephanie M. Bilinovich & Kristy Lewis & Barbara L. Thompson & Jeremy W. Prokop & Daniel B. Campbell, 2020. "Environmental Epigenetics of Diesel Particulate Matter Toxicogenomics," IJERPH, MDPI, vol. 17(20), pages 1-13, October.
    5. Matt C. Danzi & Maike F. Dohrn & Sarah Fazal & Danique Beijer & Adriana P. Rebelo & Vivian Cintra & Stephan Züchner, 2023. "Deep structured learning for variant prioritization in Mendelian diseases," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    6. Asmundur Oddsson & Patrick Sulem & Gardar Sveinbjornsson & Gudny A. Arnadottir & Valgerdur Steinthorsdottir & Gisli H. Halldorsson & Bjarni A. Atlason & Gudjon R. Oskarsson & Hannes Helgason & Henriet, 2023. "Deficit of homozygosity among 1.52 million individuals and genetic causes of recessive lethality," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    7. Vincent Michaud & Eulalie Lasseaux & David J. Green & Dave T. Gerrard & Claudio Plaisant & Tomas Fitzgerald & Ewan Birney & Benoît Arveiler & Graeme C. Black & Panagiotis I. Sergouniotis, 2022. "The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    8. Gideon S Bradburd & Peter L Ralph & Graham M Coop, 2016. "A Spatial Framework for Understanding Population Structure and Admixture," PLOS Genetics, Public Library of Science, vol. 12(1), pages 1-38, January.
    9. Natalie DeForest & Yuqi Wang & Zhiyi Zhu & Jacqueline S. Dron & Ryan Koesterer & Pradeep Natarajan & Jason Flannick & Tiffany Amariuta & Gina M. Peloso & Amit R. Majithia, 2024. "Genome-wide discovery and integrative genomic characterization of insulin resistance loci using serum triglycerides to HDL-cholesterol ratio as a proxy," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    10. Laura M. Mueller & Abigail Isaacson & Heather Wilson & Anna Salowka & Isabel Tay & Maolian Gong & Nancy Samir Elbarbary & Klemens Raile & Francesca M. Spagnoli, 2024. "Heterozygous missense variant in GLI2 impairs human endocrine pancreas development," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    11. Juraj Bergman & Rasmus Ø. Pedersen & Erick J. Lundgren & Rhys T. Lemoine & Sophie Monsarrat & Elena A. Pearce & Mikkel H. Schierup & Jens-Christian Svenning, 2023. "Worldwide Late Pleistocene and Early Holocene population declines in extant megafauna are associated with Homo sapiens expansion rather than climate change," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    12. Fiona A. Hagenbeek & Jana S. Hirzinger & Sophie Breunig & Susanne Bruins & Dmitry V. Kuznetsov & Kirsten Schut & Veronika V. Odintsova & Dorret I. Boomsma, 2023. "Maximizing the value of twin studies in health and behaviour," Nature Human Behaviour, Nature, vol. 7(6), pages 849-860, June.
    13. Per Unneberg & Mårten Larsson & Anna Olsson & Ola Wallerman & Anna Petri & Ignas Bunikis & Olga Vinnere Pettersson & Chiara Papetti & Astthor Gislason & Henrik Glenner & Joan E. Cartes & Leocadio Blan, 2024. "Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins," Nature Communications, Nature, vol. 15(1), pages 1-29, December.
    14. Alexendar R. Perez & Laura Sala & Richard K. Perez & Joana A. Vidigal, 2021. "CSC software corrects off-target mediated gRNA depletion in CRISPR-Cas9 essentiality screens," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    15. Gaëlle Odelin & Adèle Faucherre & Damien Marchese & Amélie Pinard & Hager Jaouadi & Solena Scouarnec & Raphaël Chiarelli & Younes Achouri & Emilie Faure & Marine Herbane & Alexis Théron & Jean-Françoi, 2023. "Variations in the poly-histidine repeat motif of HOXA1 contribute to bicuspid aortic valve in mouse and zebrafish," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    16. Matthew Tegtmeyer & Jatin Arora & Samira Asgari & Beth A. Cimini & Ajay Nadig & Emily Peirent & Dhara Liyanage & Gregory P. Way & Erin Weisbart & Aparna Nathan & Tiffany Amariuta & Kevin Eggan & Marzi, 2024. "High-dimensional phenotyping to define the genetic basis of cellular morphology," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    17. Erik Schoenmakers & Federica Marelli & Helle F. Jørgensen & W. Edward Visser & Carla Moran & Stefan Groeneweg & Carolina Avalos & Sean J. Jurgens & Nichola Figg & Alison Finigan & Neha Wali & Maura Ag, 2023. "Selenoprotein deficiency disorder predisposes to aortic aneurysm formation," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    18. Sarah E. Garnish & Katherine R. Martin & Maria Kauppi & Victoria E. Jackson & Rebecca Ambrose & Vik Ven Eng & Shene Chiou & Yanxiang Meng & Daniel Frank & Emma C. Tovey Crutchfield & Komal M. Patel & , 2023. "A common human MLKL polymorphism confers resistance to negative regulation by phosphorylation," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    19. Matthew J. O’Neill & Tao Yang & Julie Laudeman & Maria E. Calandranis & M. Lorena Harvey & Joseph F. Solus & Dan M. Roden & Andrew M. Glazer, 2024. "ParSE-seq: a calibrated multiplexed assay to facilitate the clinical classification of putative splice-altering variants," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    20. Xiaoyi Raymond Gao & Marion Chiariglione & Alexander J. Arch, 2022. "Whole-exome sequencing study identifies rare variants and genes associated with intraocular pressure and glaucoma," Nature Communications, Nature, vol. 13(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-35596-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.