IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0262056.html
   My bibliography  Save this article

Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks

Author

Listed:
  • Meghana Venkata Palukuri
  • Edward M Marcotte

Abstract

Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.

Suggested Citation

  • Meghana Venkata Palukuri & Edward M Marcotte, 2021. "Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks," PLOS ONE, Public Library of Science, vol. 16(12), pages 1-23, December.
  • Handle: RePEc:plo:pone00:0262056
    DOI: 10.1371/journal.pone.0262056
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0262056
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0262056&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0262056?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David E. Gordon & Gwendolyn M. Jang & Mehdi Bouhaddou & Jiewei Xu & Kirsten Obernier & Kris M. White & Matthew J. O’Meara & Veronica V. Rezelj & Jeffrey Z. Guo & Danielle L. Swaney & Tia A. Tummino & , 2020. "A SARS-CoV-2 protein interaction map reveals targets for drug repurposing," Nature, Nature, vol. 583(7816), pages 459-468, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Taha Y. Taha & Irene P. Chen & Jennifer M. Hayashi & Takako Tabata & Keith Walcott & Gabriella R. Kimmerly & Abdullah M. Syed & Alison Ciling & Rahul K. Suryawanshi & Hannah S. Martin & Bryan H. Bach , 2023. "Rapid assembly of SARS-CoV-2 genomes reveals attenuation of the Omicron BA.1 variant through NSP6," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    2. David Gomez-Zepeda & Danielle Arnold-Schild & Julian Beyrle & Arthur Declercq & Ralf Gabriels & Elena Kumm & Annica Preikschat & Mateusz Krzysztof Łącki & Aurélie Hirschler & Jeewan Babu Rijal & Chris, 2024. "Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS2Rescore with MS2PIP timsTOF fragmentation prediction model," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    3. Christine E. Peters & Ursula Schulze-Gahmen & Manon Eckhardt & Gwendolyn M. Jang & Jiewei Xu & Ernst H. Pulido & Conner Bardine & Charles S. Craik & Melanie Ott & Or Gozani & Kliment A. Verba & Ruth H, 2022. "Structure-function analysis of enterovirus protease 2A in complex with its essential host factor SETD3," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Scotland E. Farley & Jennifer E. Kyle & Hans C. Leier & Lisa M. Bramer & Jules B. Weinstein & Timothy A. Bates & Joon-Yong Lee & Thomas O. Metz & Carsten Schultz & Fikadu G. Tafesse, 2022. "A global lipid map reveals host dependency factors conserved across SARS-CoV-2 variants," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    5. Andrea M. Chiariello & Alex Abraham & Simona Bianco & Andrea Esposito & Andrea Fontana & Francesca Vercellone & Mattia Conte & Mario Nicodemi, 2024. "Multiscale modelling of chromatin 4D organization in SARS-CoV-2 infected cells," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    6. Gabriela Dias Noske & Yun Song & Rafaela Sachetto Fernandes & Rod Chalk & Haitem Elmassoudi & Lizbé Koekemoer & C. David Owen & Tarick J. El-Baba & Carol V. Robinson & Glaucius Oliva & Andre Schutzer , 2023. "An in-solution snapshot of SARS-COV-2 main protease maturation process and inhibition," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    7. Haofeng Wang & Qi Yang & Xiaoce Liu & Zili Xu & Maolin Shao & Dongxu Li & Yinkai Duan & Jielin Tang & Xianqiang Yu & Yumin Zhang & Aihua Hao & Yajie Wang & Jie Chen & Chenghao Zhu & Luke Guddat & Hong, 2023. "Structure-based discovery of dual pathway inhibitors for SARS-CoV-2 entry," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    8. Ayan Chatterjee & Robin Walters & Zohair Shafi & Omair Shafi Ahmed & Michael Sebek & Deisy Gysi & Rose Yu & Tina Eliassi-Rad & Albert-László Barabási & Giulia Menichetti, 2023. "Improving the generalizability of protein-ligand binding predictions with AI-Bind," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    9. Sara Sunshine & Andreas S. Puschnik & Joseph M. Replogle & Matthew T. Laurie & Jamin Liu & Beth Shoshana Zha & James K. Nuñez & Janie R. Byrum & Aidan H. McMorrow & Matthew B. Frieman & Juliane Winkle, 2023. "Systematic functional interrogation of SARS-CoV-2 host factors using Perturb-seq," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    10. Xiaopan Gao & Huabin Tian & Kaixiang Zhu & Qing Li & Wei Hao & Linyue Wang & Bo Qin & Hongyu Deng & Sheng Cui, 2022. "Structural basis for Sarbecovirus ORF6 mediated blockage of nucleocytoplasmic transport," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    11. Thomas Kruse & Caroline Benz & Dimitriya H. Garvanska & Richard Lindqvist & Filip Mihalic & Fabian Coscia & Raviteja Inturi & Ahmed Sayadi & Leandro Simonetti & Emma Nilsson & Muhammad Ali & Johanna K, 2021. "Large scale discovery of coronavirus-host factor protein interaction motifs reveals SARS-CoV-2 specific mechanisms and vulnerabilities," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    12. Filip Mihalič & Leandro Simonetti & Girolamo Giudice & Marie Rubin Sander & Richard Lindqvist & Marie Berit Akpiroro Peters & Caroline Benz & Eszter Kassa & Dilip Badgujar & Raviteja Inturi & Muhammad, 2023. "Large-scale phage-based screening reveals extensive pan-viral mimicry of host short linear motifs," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    13. Hanbaek Lyu & Yacoub H. Kureh & Joshua Vendrow & Mason A. Porter, 2024. "Learning low-rank latent mesoscale structures in networks," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    14. Ma’ayan Israeli & Yaara Finkel & Yfat Yahalom-Ronen & Nir Paran & Theodor Chitlaru & Ofir Israeli & Inbar Cohen-Gihon & Moshe Aftalion & Reut Falach & Shahar Rotem & Uri Elia & Ital Nemet & Limor Klik, 2022. "Genome-wide CRISPR screens identify GATA6 as a proviral host factor for SARS-CoV-2 via modulation of ACE2," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    15. Ramiz Salama & Fadi Al-Turjman, 2023. "Sustainable Energy Production in Smart Cities," Sustainability, MDPI, vol. 15(22), pages 1-25, November.
    16. Charulata Jindal & Sandeep Kumar & Sunil Sharma & Yuk Ming Choi & Jimmy T. Efird, 2020. "The Prevention and Management of COVID-19: Seeking a Practical and Timely Solution," IJERPH, MDPI, vol. 17(11), pages 1-11, June.
    17. Kelsey M. Haas & Michael J. McGregor & Mehdi Bouhaddou & Benjamin J. Polacco & Eun-Young Kim & Thong T. Nguyen & Billy W. Newton & Matthew Urbanowski & Heejin Kim & Michael A. P. Williams & Veronica V, 2023. "Proteomic and genetic analyses of influenza A viruses identify pan-viral host targets," Nature Communications, Nature, vol. 14(1), pages 1-27, December.
    18. Matthias M. Zimmer & Anuja Kibe & Ulfert Rand & Lukas Pekarek & Liqing Ye & Stefan Buck & Redmond P. Smyth & Luka Cicin-Sain & Neva Caliskan, 2021. "The short isoform of the host antiviral protein ZAP acts as an inhibitor of SARS-CoV-2 programmed ribosomal frameshifting," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    19. Emilie Murigneux & Laurent Softic & Corentin Aubé & Carmen Grandi & Delphine Judith & Johanna Bruce & Morgane Le Gall & François Guillonneau & Alain Schmitt & Vincent Parissi & Clarisse Berlioz-Torren, 2024. "Proteomic analysis of SARS-CoV-2 particles unveils a key role of G3BP proteins in viral assembly," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    20. Jiakai Hou & Yanjun Wei & Jing Zou & Roshni Jaffery & Long Sun & Shaoheng Liang & Ningbo Zheng & Ashley M. Guerrero & Nicholas A. Egan & Ritu Bohat & Si Chen & Caishang Zheng & Xiaobo Mao & S. Stephen, 2024. "Integrated multi-omics analyses identify anti-viral host factors and pathways controlling SARS-CoV-2 infection," Nature Communications, Nature, vol. 15(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0262056. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.