IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008288.html
   My bibliography  Save this article

A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection

Author

Listed:
  • Oliver M Crook
  • Aikaterini Geladaki
  • Daniel J H Nightingale
  • Owen L Vennard
  • Kathryn S Lilley
  • Laurent Gatto
  • Paul D W Kirk

Abstract

The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein’s sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.

Suggested Citation

  • Oliver M Crook & Aikaterini Geladaki & Daniel J H Nightingale & Owen L Vennard & Kathryn S Lilley & Laurent Gatto & Paul D W Kirk, 2020. "A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-21, November.
  • Handle: RePEc:plo:pcbi00:1008288
    DOI: 10.1371/journal.pcbi.1008288
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008288
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008288&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008288?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Oliver M Crook & Claire M Mulvey & Paul D W Kirk & Kathryn S Lilley & Laurent Gatto, 2018. "A Bayesian mixture modelling approach for spatial proteomics," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-29, November.
    2. Letizia Lanzetti & Vladimir Rybin & Maria Grazia Malabarba & Savvas Christoforidis & Giorgio Scita & Marino Zerial & Pier Paolo Di Fiore, 2000. "The Eps8 protein coordinates EGF receptor signalling through Rac and trafficking through Rab5," Nature, Nature, vol. 408(6810), pages 374-377, November.
    3. Lisa M Breckels & Sean B Holden & David Wojnar & Claire M Mulvey & Andy Christoforou & Arnoud Groen & Matthew W B Trotter & Oliver Kohlbacher & Kathryn S Lilley & Laurent Gatto, 2016. "Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-26, May.
    4. repec:dau:papers:123456789/4648 is not listed on IDEAS
    5. Anne Simonsen & Roger Lippe & Savvas Christoforidis & Jean-Michel Gaullier & Andreas Brech & Judy Callaghan & Ban-Hock Toh & Carol Murphy & Marino Zerial & Harald Stenmark, 1998. "EEA1 links PI(3)K function to Rab5 regulation of endosome fusion," Nature, Nature, vol. 394(6692), pages 494-498, July.
    6. McAlister, G. C. & Nusinow, D. P. & Jedrychowski, M. P. & Wu?hr, M. & Huttlin, E. L. & Erickson, B. K. & Rad, R. & Haas, W. & Gygi, S. P., "undated". "MultiNotch MS3 Enables Accurate, Sensitive, and Multiplexed Detection of Differential Expression across Cancer Cell Line Proteomes," Working Paper 346466, Harvard University OpenScholar.
    7. Aikaterini Geladaki & Nina Kočevar Britovšek & Lisa M. Breckels & Tom S. Smith & Owen L. Vennard & Claire M. Mulvey & Oliver M. Crook & Laurent Gatto & Kathryn S. Lilley, 2019. "Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics," Nature Communications, Nature, vol. 10(1), pages 1-15, December.
    8. Won-Ki Huh & James V. Falvo & Luke C. Gerke & Adam S. Carroll & Russell W. Howson & Jonathan S. Weissman & Erin K. O'Shea, 2003. "Global analysis of protein localization in budding yeast," Nature, Nature, vol. 425(6959), pages 686-691, October.
    9. Andy Christoforou & Claire M. Mulvey & Lisa M. Breckels & Aikaterini Geladaki & Tracey Hurrell & Penelope C. Hayward & Thomas Naake & Laurent Gatto & Rosa Viner & Alfonso Martinez Arias & Kathryn S. L, 2016. "A draft map of the mouse pluripotent stem cell spatial proteome," Nature Communications, Nature, vol. 7(1), pages 1-12, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oliver M. Crook & Colin T. R. Davies & Lisa M. Breckels & Josie A. Christopher & Laurent Gatto & Paul D. W. Kirk & Kathryn S. Lilley, 2022. "Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    2. Nicola M. Moloney & Konstantin Barylyuk & Eelco Tromer & Oliver M. Crook & Lisa M. Breckels & Kathryn S. Lilley & Ross F. Waller & Paula MacGregor, 2023. "Mapping diversity in African trypanosomes using high resolution spatial proteomics," Nature Communications, Nature, vol. 14(1), pages 1-16, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Oliver M. Crook & Colin T. R. Davies & Lisa M. Breckels & Josie A. Christopher & Laurent Gatto & Paul D. W. Kirk & Kathryn S. Lilley, 2022. "Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    2. Nicola M. Moloney & Konstantin Barylyuk & Eelco Tromer & Oliver M. Crook & Lisa M. Breckels & Kathryn S. Lilley & Ross F. Waller & Paula MacGregor, 2023. "Mapping diversity in African trypanosomes using high resolution spatial proteomics," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    3. Jordan Currie & Vyshnavi Manda & Sean K. Robinson & Celine Lai & Vertica Agnihotri & Veronica Hidalgo & R. W. Ludwig & Kai Zhang & Jay Pavelka & Zhao V. Wang & June-Wha Rhee & Maggie P. Y. Lam & Edwar, 2024. "Simultaneous proteome localization and turnover analysis reveals spatiotemporal features of protein homeostasis disruptions," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    4. Oliver M Crook & Claire M Mulvey & Paul D W Kirk & Kathryn S Lilley & Laurent Gatto, 2018. "A Bayesian mixture modelling approach for spatial proteomics," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-29, November.
    5. Ying Zhu & Kerem Can Akkaya & Julia Ruta & Nanako Yokoyama & Cong Wang & Max Ruwolt & Diogo Borges Lima & Martin Lehmann & Fan Liu, 2024. "Cross-link assisted spatial proteomics to map sub-organelle proteomes and membrane protein topologies," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    6. Ana Martinez-Val & Dorte B. Bekker-Jensen & Sophia Steigerwald & Claire Koenig & Ole Østergaard & Adi Mehta & Trung Tran & Krzysztof Sikorski & Estefanía Torres-Vega & Ewa Kwasniewicz & Sólveig Hlín B, 2021. "Spatial-proteomics reveals phospho-signaling dynamics at subcellular resolution," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    7. Julia P. Schessner & Vincent Albrecht & Alexandra K. Davies & Pavel Sinitcyn & Georg H. H. Borner, 2023. "Deep and fast label-free Dynamic Organellar Mapping," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    8. Yudong Gao & Daichi Shonai & Matthew Trn & Jieqing Zhao & Erik J. Soderblom & S. Alexandra Garcia-Moreno & Charles A. Gersbach & William C. Wetsel & Geraldine Dawson & Dmitry Velmeshev & Yong-hui Jian, 2024. "Proximity analysis of native proteomes reveals phenotypic modifiers in a mouse model of autism and related neurodevelopmental conditions," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    9. Arthur Fischbach & Angela Johns & Kara L. Schneider & Xinxin Hao & Peter Tessarz & Thomas Nyström, 2023. "Artificial Hsp104-mediated systems for re-localizing protein aggregates," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    10. Louis-François Handfield & Yolanda T Chong & Jibril Simmons & Brenda J Andrews & Alan M Moses, 2013. "Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    11. Lisa M Breckels & Sean B Holden & David Wojnar & Claire M Mulvey & Andy Christoforou & Arnoud Groen & Matthew W B Trotter & Oliver Kohlbacher & Kathryn S Lilley & Laurent Gatto, 2016. "Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-26, May.
    12. Maya Dinur-Mills & Merav Tal & Ophry Pines, 2008. "Dual Targeted Mitochondrial Proteins Are Characterized by Lower MTS Parameters and Total Net Charge," PLOS ONE, Public Library of Science, vol. 3(5), pages 1-8, May.
    13. Eva Maria Wenzel & Nina Marie Pedersen & Liv Anker Elfmark & Ling Wang & Ingrid Kjos & Espen Stang & Lene Malerød & Andreas Brech & Harald Stenmark & Camilla Raiborg, 2024. "Intercellular transfer of cancer cell invasiveness via endosome-mediated protease shedding," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    14. Octavio R. Salazar & Ke Chen & Vanessa J. Melino & Muppala P. Reddy & Eva Hřibová & Jana Čížková & Denisa Beránková & Juan Pablo Arciniegas Vega & Lina María Cáceres Leal & Manuel Aranda & Lukasz Jare, 2024. "SOS1 tonoplast neo-localization and the RGG protein SALTY are important in the extreme salinity tolerance of Salicornia bigelovii," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    15. Md. Abdulla Al Mamun & Wei Cao & Shugo Nakamura & Jun-ichi Maruyama, 2023. "Large-scale identification of genes involved in septal pore plugging in multicellular fungi," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    16. Verena Kohler & Andreas Kohler & Lisa Larsson Berglund & Xinxin Hao & Sarah Gersing & Axel Imhof & Thomas Nyström & Johanna L. Höög & Martin Ott & Claes Andréasson & Sabrina Büttner, 2024. "Nuclear Hsp104 safeguards the dormant translation machinery during quiescence," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    17. Nebojsa Jukic & Alma P. Perrino & Frédéric Humbert & Aurélien Roux & Simon Scheuring, 2022. "Snf7 spirals sense and alter membrane curvature," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    18. Jian Cui & Jinghua Liu & Yuhua Li & Tieliu Shi, 2011. "Integrative Identification of Arabidopsis Mitochondrial Proteome and Its Function Exploitation through Protein Interaction Network," PLOS ONE, Public Library of Science, vol. 6(1), pages 1-16, January.
    19. Xiaomei Wu & Erli Pang & Kui Lin & Zhen-Ming Pei, 2013. "Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-11, May.
    20. Kiyan Shabestary & Cinzia Klemm & Benedict Carling & James Marshall & Juline Savigny & Marko Storch & Rodrigo Ledesma-Amaro, 2024. "Phenotypic heterogeneity follows a growth-viability tradeoff in response to amino acid identity," Nature Communications, Nature, vol. 15(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008288. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.