IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-56745-6.html
   My bibliography  Save this article

FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking

Author

Listed:
  • Sophia Vincoff

    (Duke University)

  • Shrey Goel

    (Duke University)

  • Kseniia Kholina

    (Duke University)

  • Rishab Pulugurta

    (Duke University)

  • Pranay Vure

    (Duke University)

  • Pranam Chatterjee

    (Duke University
    Duke University
    Duke University)

Abstract

Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%–40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.

Suggested Citation

  • Sophia Vincoff & Shrey Goel & Kseniia Kholina & Rishab Pulugurta & Pranay Vure & Pranam Chatterjee, 2025. "FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56745-6
    DOI: 10.1038/s41467-025-56745-6
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-56745-6
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-56745-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Joseph L. Watson & David Juergens & Nathaniel R. Bennett & Brian L. Trippe & Jason Yim & Helen E. Eisenach & Woody Ahern & Andrew J. Borst & Robert J. Ragotte & Lukas F. Milles & Basile I. M. Wicky & , 2023. "De novo design of protein structure and function with RFdiffusion," Nature, Nature, vol. 620(7976), pages 1089-1100, August.
    2. Yaw Asante & Katharina Benischke & Issra Osman & Quy A. Ngo & Jakob Wurth & Dominik Laubscher & Hyunmin Kim & Bhavatharini Udhayakumar & Md Imdadul H. Khan & Diana H. Chin & Jadon Porch & Maharshi Cha, 2023. "PAX3-FOXO1 uses its activation domain to recruit CBP/P300 and shape RNA Pol2 cluster distribution," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    3. Josh Abramson & Jonas Adler & Jack Dunger & Richard Evans & Tim Green & Alexander Pritzel & Olaf Ronneberger & Lindsay Willmore & Andrew J. Ballard & Joshua Bambrick & Sebastian W. Bodenstein & David , 2024. "Addendum: Accurate structure prediction of biomolecular interactions with AlphaFold 3," Nature, Nature, vol. 636(8042), pages 4-4, December.
    4. Noelia Ferruz & Steffen Schmidt & Birte Höcker, 2022. "ProtGPT2 is a deep unsupervised language model for protein design," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    5. John Jumper & Richard Evans & Alexander Pritzel & Tim Green & Michael Figurnov & Olaf Ronneberger & Kathryn Tunyasuvunakool & Russ Bates & Augustin Žídek & Anna Potapenko & Alex Bridgland & Clemens Me, 2021. "Highly accurate protein structure prediction with AlphaFold," Nature, Nature, vol. 596(7873), pages 583-589, August.
    6. Josh Abramson & Jonas Adler & Jack Dunger & Richard Evans & Tim Green & Alexander Pritzel & Olaf Ronneberger & Lindsay Willmore & Andrew J. Ballard & Joshua Bambrick & Sebastian W. Bodenstein & David , 2024. "Accurate structure prediction of biomolecular interactions with AlphaFold 3," Nature, Nature, vol. 630(8016), pages 493-500, June.
    7. Gang Hu & Akila Katuwawala & Kui Wang & Zhonghua Wu & Sina Ghadermarzi & Jianzhao Gao & Lukasz Kurgan, 2021. "flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions," Nature Communications, Nature, vol. 12(1), pages 1-8, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Z. Faidon Brotzakis & Shengyu Zhang & Mhd Hussein Murtada & Michele Vendruscolo, 2025. "AlphaFold prediction of structural ensembles of disordered proteins," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
    2. Paloma García Casas & Michela Rossini & Linnea Påvénius & Mezida Saeed & Nikita Arnst & Sonia Sonda & Tânia Fernandes & Irene D’Arsiè & Matteo Bruzzone & Valeria Berno & Andrea Raimondi & Maria Livia , 2024. "Simultaneous detection of membrane contact dynamics and associated Ca2+ signals by reversible chemogenetic reporters," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    3. Huiyu Cai & Zuobai Zhang & Mingkai Wang & Bozitao Zhong & Quanxiao Li & Yuxuan Zhong & Yanling Wu & Tianlei Ying & Jian Tang, 2024. "Pretrainable geometric graph neural network for antibody affinity maturation," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    4. Nathalie Béchon & Nitzan Tal & Avigail Stokar-Avihail & Alon Savidor & Meital Kupervaser & Sarah Melamed & Gil Amitai & Rotem Sorek, 2024. "Diversification of molecular pattern recognition in bacterial NLR-like proteins," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    5. Amika Singla & Daniel J. Boesch & Ho Yee Joyce Fung & Chigozie Ngoka & Avery S. Enriquez & Ran Song & Daniel A. Kramer & Yan Han & Esther Banarer & Andrew Lemoff & Puneet Juneja & Daniel D. Billadeau , 2024. "Structural basis for Retriever-SNX17 assembly and endosomal sorting," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    6. Veda Sheersh Boorla & Costas D. Maranas, 2025. "CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    7. Aika Iwama & Ryoji Kise & Hiroaki Akasaka & Fumiya K. Sano & Hidetaka S. Oshima & Asuka Inoue & Wataru Shihoya & Osamu Nureki, 2024. "Structure and dynamics of the pyroglutamylated RF-amide peptide QRFP receptor GPR103," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    8. Michael Kugler & Felix J. Metzner & Gregor Witte & Karl-Peter Hopfner & Katja Lammens, 2024. "Phosphorylation-mediated conformational change regulates human SLFN11," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    9. Helena E. Sverak & Luke N. Yaeger & Liam J. Worrall & Condurache M. Vacariu & Amy J. Glenwright & Marija Vuckovic & Zayni-Dean Al Azawi & Ryan P. Lamers & Victoria A. Marko & Clarissa Skorupski & Arvi, 2024. "Cryo-EM characterization of the anydromuropeptide permease AmpG central to bacterial fitness and β-lactam antibiotic resistance," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    10. Fabian Ries & Jasmin Gorlt & Sabrina Kaiser & Vanessa Scherer & Charlotte Seydel & Sandra Nguyen & Andreas Klingl & Julia Legen & Christian Schmitz-Linneweber & Hinrik Plaggenborg & Jediael Z. Y. Ng &, 2025. "A truncated variant of the ribosome-associated trigger factor specifically contributes to plant chloroplast ribosome biogenesis," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    11. Kevin E. Wu & Kevin K. Yang & Rianne Berg & Sarah Alamdari & James Y. Zou & Alex X. Lu & Ava P. Amini, 2024. "Protein structure generation via folding diffusion," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    12. Reilly Pidgeon & Sacha Mitchell & Michael Shamash & Layan Suleiman & Lharbi Dridi & Corinne F. Maurice & Bastien Castagner, 2025. "Diet-derived urolithin A is produced by a dehydroxylase encoded by human gut Enterocloster species," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    13. Luc Provencher & Wilson Nartey & Peter M. Brownlee & Austin W. Atkins & Jean-Philippe Gagné & Lou Baudrier & Nicholas S. Y. Ting & Cortt G. Piett & Shujuan Fang & Dustin D. Pearson & Shaun Moore & Pie, 2025. "CHD6 has poly(ADP-ribose)- and DNA-binding domains and regulates PARP1/2-trapping inhibitor sensitivity via abasic site repair," Nature Communications, Nature, vol. 16(1), pages 1-24, December.
    14. Marius Klein & Klemens Wild & Irmgard Sinning, 2024. "Multi-protein assemblies orchestrate co-translational enzymatic processing on the human ribosome," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    15. Lucien F. Krapp & Fernando A. Meireles & Luciano A. Abriata & Jean Devillard & Sarah Vacle & Maria J. Marcaida & Matteo Dal Peraro, 2024. "Context-aware geometric deep learning for protein sequence design," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    16. Kenneth Bødkter Schou & Samuel Mandacaru & Muhammad Tahir & Nikola Tom & Ann-Sofie Nilsson & Jens S. Andersen & Matteo Tiberti & Elena Papaleo & Jiri Bartek, 2024. "Exploring the structural landscape of DNA maintenance proteins," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    17. Fan Zhang & Shaobai Li & Hao Wu & Shanshuang Chen, 2025. "Cryo-EM structure and oligomerization of the human planar cell polarity core protein Vangl1," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    18. Patrick Bryant & Atharva Kelkar & Andrea Guljas & Cecilia Clementi & Frank Noé, 2024. "Structure prediction of protein-ligand complexes from sequence information with Umol," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    19. Amir Pandi & David Adam & Amir Zare & Van Tuan Trinh & Stefan L. Schaefer & Marie Burt & Björn Klabunde & Elizaveta Bobkova & Manish Kushwaha & Yeganeh Foroughijabbari & Peter Braun & Christoph Spahn , 2023. "Cell-free biosynthesis combined with deep learning accelerates de novo-development of antimicrobial peptides," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    20. Devlina Chakravarty & Joseph W. Schafer & Ethan A. Chen & Joseph F. Thole & Leslie A. Ronish & Myeongsang Lee & Lauren L. Porter, 2024. "AlphaFold predictions of fold-switched conformations are driven by structure memorization," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56745-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.