IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-36476-2.html
   My bibliography  Save this article

Multilingual translation for zero-shot biomedical classification using BioTranslator

Author

Listed:
  • Hanwen Xu

    (University of Washington)

  • Addie Woicik

    (University of Washington)

  • Hoifung Poon

    (Microsoft Research)

  • Russ B. Altman

    (Stanford University
    Stanford University
    Chan Zuckerberg Biohub)

  • Sheng Wang

    (University of Washington)

Abstract

Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea of BioTranslator is to develop a multilingual translation framework, where multiple modalities of biological data are all translated to text. We demonstrate how BioTranslator enables the identification of novel cell types using only a textual description and how BioTranslator can be further generalized to protein function prediction and drug target identification. Our tool frees scientists from limiting their analyses within predefined controlled vocabularies, enabling them to interact with biological data using free text.

Suggested Citation

  • Hanwen Xu & Addie Woicik & Hoifung Poon & Russ B. Altman & Sheng Wang, 2023. "Multilingual translation for zero-shot biomedical classification using BioTranslator," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-36476-2
    DOI: 10.1038/s41467-023-36476-2
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-36476-2
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-36476-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sheng Wang & Angela Oliveira Pisco & Aaron McGeever & Maria Brbic & Marinka Zitnik & Spyros Darmanis & Jure Leskovec & Jim Karkanias & Russ B. Altman, 2021. "Leveraging the Cell Ontology to classify unseen cell types," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    2. Mathew J. Garnett & Elena J. Edelman & Sonja J. Heidorn & Chris D. Greenman & Anahita Dastur & King Wai Lau & Patricia Greninger & I. Richard Thompson & Xi Luo & Jorge Soares & Qingsong Liu & Francesc, 2012. "Systematic identification of genomic markers of drug sensitivity in cancer cells," Nature, Nature, vol. 483(7391), pages 570-575, March.
    3. Kevin W Boyack & David Newman & Russell J Duhon & Richard Klavans & Michael Patek & Joseph R Biberstine & Bob Schijvenaars & André Skupin & Nianli Ma & Katy Börner, 2011. "Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-11, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peter Sjögårde & Fereshteh Didegah, 2022. "The association between topic growth and citation impact of research publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1903-1921, April.
    2. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    3. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    4. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.
    5. Manuel A. Vázquez & Jorge Pereira-Delgado & Jesús Cid-Sueiro & Jerónimo Arenas-García, 2022. "Validation of scientific topic models using graph analysis and corpus metadata," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5441-5458, September.
    6. G. Gambardella & G. Viscido & B. Tumaini & A. Isacchi & R. Bosotti & D. di Bernardo, 2022. "A single-cell analysis of breast cancer cell lines to study tumour heterogeneity and drug response," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    7. Shi, Chengchun & Xu, Tianlin & Bergsma, Wicher & Li, Lexin, 2021. "Double generative adversarial networks for conditional independence testing," LSE Research Online Documents on Economics 112550, London School of Economics and Political Science, LSE Library.
    8. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    9. Ballester, Omar & Penner, Orion, 2022. "Robustness, replicability and scalability in topic modelling," Journal of Informetrics, Elsevier, vol. 16(1).
    10. L. Mathur & B. Szalai & N. H. Du & R. Utharala & M. Ballinger & J. J. M. Landry & M. Ryckelynck & V. Benes & J. Saez-Rodriguez & C. A. Merten, 2022. "Combi-seq for multiplexed transcriptome-based profiling of drug combinations using deterministic barcoding in single-cell droplets," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    11. Milad Dehghani & Ki Joon Kim, 2019. "Past and Present Research on Wearable Technologies: Bibliometric and Cluster Analyses of Published Research from 2000 to 2016," International Journal of Innovation and Technology Management (IJITM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 1-21, February.
    12. Juste Raimbault, 2019. "Exploration of an interdisciplinary scientific landscape," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 617-641, May.
    13. Renchu Guan & Chen Yang & Maurizio Marchese & Yanchun Liang & Xiaohu Shi, 2014. "Full Text Clustering and Relationship Network Analysis of Biomedical Publications," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-9, September.
    14. Nishanth Ulhas Nair & Patricia Greninger & Xiaohu Zhang & Adam A. Friedman & Arnaud Amzallag & Eliane Cortez & Avinash Das Sahu & Joo Sang Lee & Anahita Dastur & Regina K. Egan & Ellen Murchie & Miche, 2023. "A landscape of response to drug combinations in non-small cell lung cancer," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    15. Johanna Zerbib & Marica Rosaria Ippolito & Yonatan Eliezer & Giuseppina Feudis & Eli Reuveni & Anouk Savir Kadmon & Sara Martin & Sonia Viganò & Gil Leor & James Berstler & Julia Muenzner & Michael Mü, 2024. "Human aneuploid cells depend on the RAF/MEK/ERK pathway for overcoming increased DNA damage," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    16. Shixuan Liu & Camille Ezran & Michael F. Z. Wang & Zhengda Li & Kyle Awayan & Jonathan Z. Long & Iwijn De Vlaminck & Sheng Wang & Jacques Epelbaum & Christin S. Kuo & Jérémy Terrien & Mark A. Krasnow , 2024. "An organism-wide atlas of hormonal signaling based on the mouse lemur single-cell transcriptome," Nature Communications, Nature, vol. 15(1), pages 1-27, December.
    17. Seonghun Kim & Seockhun Bae & Yinhua Piao & Kyuri Jo, 2021. "Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data," Mathematics, MDPI, vol. 9(7), pages 1-17, April.
    18. Wesley Tansey & Yixin Wang & Raul Rabadan & David Blei, 2020. "Double Empirical Bayes Testing," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 91-113, December.
    19. Min Pan & William C. Wright & Richard H. Chapple & Asif Zubair & Manbir Sandhu & Jake E. Batchelder & Brandt C. Huddle & Jonathan Low & Kaley B. Blankenship & Yingzhe Wang & Brittney Gordon & Payton A, 2021. "The chemotherapeutic CX-5461 primarily targets TOP2B and exhibits selective activity in high-risk neuroblastoma," Nature Communications, Nature, vol. 12(1), pages 1-20, December.
    20. Hyeong-Min Lee & William C. Wright & Min Pan & Jonathan Low & Duane Currier & Jie Fang & Shivendra Singh & Stephanie Nance & Ian Delahunty & Yuna Kim & Richard H. Chapple & Yinwen Zhang & Xueying Liu , 2023. "A CRISPR-drug perturbational map for identifying compounds to combine with commonly used chemotherapeutics," Nature Communications, Nature, vol. 14(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-36476-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.