IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-41143-7.html
   My bibliography  Save this article

A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets

Author

Listed:
  • Dalton T. Ham

    (Schulich School of Medicine and Dentistry)

  • Tyler S. Browne

    (Schulich School of Medicine and Dentistry)

  • Pooja N. Banglorewala

    (Schulich School of Medicine and Dentistry)

  • Tyler L. Wilson

    (Tesseraqt Optimization Inc)

  • Richard K. Michael

    (Tesseraqt Optimization Inc)

  • Gregory B. Gloor

    (Schulich School of Medicine and Dentistry)

  • David R. Edgell

    (Schulich School of Medicine and Dentistry)

Abstract

The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications.

Suggested Citation

  • Dalton T. Ham & Tyler S. Browne & Pooja N. Banglorewala & Tyler L. Wilson & Richard K. Michael & Gregory B. Gloor & David R. Edgell, 2023. "A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-41143-7
    DOI: 10.1038/s41467-023-41143-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-41143-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-41143-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Carolin Anders & Ole Niewoehner & Alessia Duerst & Martin Jinek, 2014. "Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease," Nature, Nature, vol. 513(7519), pages 569-573, September.
    2. Elitza Deltcheva & Krzysztof Chylinski & Cynthia M. Sharma & Karine Gonzales & Yanjie Chao & Zaid A. Pirzada & Maria R. Eckert & Jörg Vogel & Emmanuelle Charpentier, 2011. "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III," Nature, Nature, vol. 471(7340), pages 602-607, March.
    3. Benjamin P. Kleinstiver & Michelle S. Prew & Shengdar Q. Tsai & Ved V. Topkar & Nhu T. Nguyen & Zongli Zheng & Andrew P. W. Gonzales & Zhuyun Li & Randall T. Peterson & Jing-Ruey Joanna Yeh & Martin J, 2015. "Engineered CRISPR-Cas9 nucleases with altered PAM specificities," Nature, Nature, vol. 523(7561), pages 481-485, July.
    4. Thomas A. Hamilton & Gregory M. Pellegrino & Jasmine A. Therrien & Dalton T. Ham & Peter C. Bartlett & Bogumil J. Karas & Gregory B. Gloor & David R. Edgell, 2019. "Efficient inter-species conjugative transfer of a CRISPR nuclease for targeted bacterial killing," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    5. Andrew D Fernandes & Jean M Macklaim & Thomas G Linn & Gregor Reid & Gregory B Gloor, 2013. "ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-15, July.
    6. Mazhar Adli, 2018. "The CRISPR tool kit for genome editing and beyond," Nature Communications, Nature, vol. 9(1), pages 1-13, December.
    7. E. A. Moreb & M. D. Lynch, 2021. "Genome dependent Cas9/gRNA search time underlies sequence dependent gRNA activity," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    8. Dipankar Baisya & Adithya Ramesh & Cory Schwartz & Stefano Lonardi & Ian Wheeldon, 2022. "Genome-wide functional screens enable the prediction of high activity CRISPR-Cas9 and -Cas12a guides in Yarrowia lipolytica," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    9. Marie-Ève Dupuis & Manuela Villion & Alfonso H. Magadán & Sylvain Moineau, 2013. "CRISPR-Cas and restriction–modification systems are compatible and increase phage resistance," Nature Communications, Nature, vol. 4(1), pages 1-7, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Grace N. Hibshman & Jack P. K. Bravo & Matthew M. Hooper & Tyler L. Dangerfield & Hongshan Zhang & Ilya J. Finkelstein & Kenneth A. Johnson & David W. Taylor, 2024. "Unraveling the mechanisms of PAMless DNA interrogation by SpRY-Cas9," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Jian Wang & Yuxi Teng & Ruihua Zhang & Yifei Wu & Lei Lou & Yusong Zou & Michelle Li & Zhong-Ru Xie & Yajun Yan, 2021. "Engineering a PAM-flexible SpdCas9 variant as a universal gene repressor," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    3. Giulia I. Corsi & Kunli Qu & Ferhat Alkan & Xiaoguang Pan & Yonglun Luo & Jan Gorodkin, 2022. "CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    4. Yang Liu & Filipe Pinto & Xinyi Wan & Zhugen Yang & Shuguang Peng & Mengxi Li & Jonathan M. Cooper & Zhen Xie & Christopher E. French & Baojun Wang, 2022. "Reprogrammed tracrRNAs enable repurposing of RNAs as crRNAs and sequence-specific RNA biosensors," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    5. Shiran Abadi & Winston X Yan & David Amar & Itay Mayrose, 2017. "A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-24, October.
    6. Daniel C. Volke & Román A. Martino & Ekaterina Kozaeva & Andrea M. Smania & Pablo I. Nikel, 2022. "Modular (de)construction of complex bacterial phenotypes by CRISPR/nCas9-assisted, multiplex cytidine base-editing," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    7. Sundaram Acharya & Asgar Hussain Ansari & Prosad Kumar Das & Seiichi Hirano & Meghali Aich & Riya Rauthan & Sudipta Mahato & Savitri Maddileti & Sajal Sarkar & Manoj Kumar & Rhythm Phutela & Sneha Gul, 2024. "PAM-flexible Engineered FnCas9 variants for robust and ultra-precise genome editing and diagnostics," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    8. Fang Liang & Yu Zhang & Lin Li & Yexin Yang & Ji-Feng Fei & Yanmei Liu & Wei Qin, 2022. "SpG and SpRY variants expand the CRISPR toolbox for genome editing in zebrafish," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    9. Raed Ibraheim & Phillip W. L. Tai & Aamir Mir & Nida Javeed & Jiaming Wang & Tomás C. Rodríguez & Suk Namkung & Samantha Nelson & Eraj Shafiq Khokhar & Esther Mintzer & Stacy Maitland & Zexiang Chen &, 2021. "Self-inactivating, all-in-one AAV vectors for precision Cas9 genome editing via homology-directed repair in vivo," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    10. Ulaganathan, Kandasamy & Goud, Sravanthi & Reddy, Madhavi & Kayalvili, Ulaganathan, 2017. "Genome engineering for breaking barriers in lignocellulosic bioethanol production," Renewable and Sustainable Energy Reviews, Elsevier, vol. 74(C), pages 1080-1107.
    11. Zhaohui Zhong & Guanqing Liu & Zhongjie Tang & Shuyue Xiang & Liang Yang & Lan Huang & Yao He & Tingting Fan & Shishi Liu & Xuelian Zheng & Tao Zhang & Yiping Qi & Jian Huang & Yong Zhang, 2023. "Efficient plant genome engineering using a probiotic sourced CRISPR-Cas9 system," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    12. Bin Zhu & David J. Edwards & Katherine M. Spaine & Laahirie Edupuganti & Andrey Matveyev & Myrna G. Serrano & Gregory A. Buck, 2024. "The association of maternal factors with the neonatal microbiota and health," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    13. Margot Karlikow & Evan Amalfitano & Xiaolong Yang & Jennifer Doucet & Abigail Chapman & Peivand Sadat Mousavi & Paige Homme & Polina Sutyrina & Winston Chan & Sofia Lemak & Alexander F. Yakunin & Adam, 2023. "CRISPR-induced DNA reorganization for multiplexed nucleic acid detection," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    14. Sarah J Vancuren & Scott J Dos Santos & Janet E Hill & the Maternal Microbiome Legacy Project Team, 2020. "Evaluation of variant calling for cpn60 barcode sequence-based microbiome profiling," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-14, July.
    15. Sehrish Khan & Muhammad Shahid Mahmood & Sajjad ur Rahman & Farzana Rizvi & Aftab Ahmad, 2020. "Evaluation of the CRISPR/Cas9 system for the development of resistance against Cotton leaf curl virus in model plants," Plant Protection Science, Czech Academy of Agricultural Sciences, vol. 56(3), pages 154-162.
    16. Kazuki Kato & Sae Okazaki & Soumya Kannan & Han Altae-Tran & F. Esra Demircioglu & Yukari Isayama & Junichiro Ishikawa & Masahiro Fukuda & Rhiannon K. Macrae & Tomohiro Nishizawa & Kira S. Makarova & , 2022. "Structure of the IscB–ωRNA ribonucleoprotein complex, the likely ancestor of CRISPR-Cas9," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    17. Xu Feng & Ruyi Xu & Jianglan Liao & Jingyu Zhao & Baochang Zhang & Xiaoxiao Xu & Pengpeng Zhao & Xiaoning Wang & Jianyun Yao & Pengxia Wang & Xiaoxue Wang & Wenyuan Han & Qunxin She, 2024. "Flexible TAM requirement of TnpB enables efficient single-nucleotide editing with expanded targeting scope," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Péter István Kulcsár & András Tálas & Zoltán Ligeti & Eszter Tóth & Zsófia Rakvács & Zsuzsa Bartos & Sarah Laura Krausz & Ágnes Welker & Vanessza Laura Végi & Krisztina Huszár & Ervin Welker, 2023. "A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    19. Jiongyu Zhang & Chengyu Hou & Changchun Liu, 2024. "CRISPR-powered quantitative keyword search engine in DNA data storage," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    20. Aaron A. Smargon & Assael A. Madrigal & Brian A. Yee & Kevin D. Dong & Jasmine R. Mueller & Gene W. Yeo, 2022. "Crosstalk between CRISPR-Cas9 and the human transcriptome," Nature Communications, Nature, vol. 13(1), pages 1-8, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-41143-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.