IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-28411-8.html
   My bibliography  Save this article

SMAP is a pipeline for sample matching in proteogenomics

Author

Listed:
  • Ling Li

    (University of North Dakota)

  • Mingming Niu

    (Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital)

  • Alyssa Erickson

    (University of North Dakota)

  • Jie Luo

    (State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Zhejiang Academy of Agricultural Sciences)

  • Kincaid Rowbotham

    (University of North Dakota)

  • Kai Guo

    (University of Michigan)

  • He Huang

    (University of North Dakota)

  • Yuxin Li

    (Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital)

  • Yi Jiang

    (School of Public Health, Tongji Medical College, Huazhong University of Science and Technology)

  • Junguk Hur

    (School of medicine and health sciences, University of North Dakota)

  • Chunyu Liu

    (SUNY Upstate Medical University)

  • Junmin Peng

    (Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital)

  • Xusheng Wang

    (University of North Dakota)

Abstract

The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP , and a web-based version can be accessed at https://smap.shinyapps.io/smap/ .

Suggested Citation

  • Ling Li & Mingming Niu & Alyssa Erickson & Jie Luo & Kincaid Rowbotham & Kai Guo & He Huang & Yuxin Li & Yi Jiang & Junguk Hur & Chunyu Liu & Junmin Peng & Xusheng Wang, 2022. "SMAP is a pipeline for sample matching in proteogenomics," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28411-8
    DOI: 10.1038/s41467-022-28411-8
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-28411-8
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-28411-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yi Jiang & Gina Giase & Kay Grennan & Annie W Shieh & Yan Xia & Lide Han & Quan Wang & Qiang Wei & Rui Chen & Sihan Liu & Kevin P White & Chao Chen & Bingshan Li & Chunyu Liu, 2020. "DRAMS: A tool to detect and re-align mixed-up samples for integrative studies of multi-omics data," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-19, April.
    2. Philipp Mertins & D. R. Mani & Kelly V. Ruggles & Michael A. Gillette & Karl R. Clauser & Pei Wang & Xianlong Wang & Jana W. Qiao & Song Cao & Francesca Petralia & Emily Kawaler & Filip Mundt & Karste, 2016. "Proteogenomics connects somatic mutations to signalling in breast cancer," Nature, Nature, vol. 534(7605), pages 55-62, June.
    3. Ruedi Aebersold & Matthias Mann, 2003. "Mass spectrometry-based proteomics," Nature, Nature, vol. 422(6928), pages 198-207, March.
    4. Bing Zhang & Jing Wang & Xiaojing Wang & Jing Zhu & Qi Liu & Zhiao Shi & Matthew C. Chambers & Lisa J. Zimmerman & Kent F. Shaddox & Sangtae Kim & Sherri R. Davies & Sean Wang & Pei Wang & Christopher, 2014. "Proteogenomic characterization of human colon and rectal cancer," Nature, Nature, vol. 513(7518), pages 382-387, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cerniauskas, Simonas & Fulton, Lewis & Ogden, Joan, 2023. "Tech Brief: Pipelines for a Hydrogen System in California," Institute of Transportation Studies, Working Paper Series qt1z0325v2, Institute of Transportation Studies, UC Davis.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yiqun Zhang & Fengju Chen & Darshan S. Chandrashekar & Sooryanarayana Varambally & Chad J. Creighton, 2022. "Proteogenomic characterization of 2002 human cancers reveals pan-cancer molecular subtypes and associated pathways," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    2. Hailiang Zhang & Lin Bai & Xin-Qiang Wu & Xi Tian & Jinwen Feng & Xiaohui Wu & Guo-Hai Shi & Xiaoru Pei & Jiacheng Lyu & Guojian Yang & Yang Liu & Wenhao Xu & Aihetaimujiang Anwaier & Yu Zhu & Da-Long, 2023. "Proteogenomics of clear cell renal cell carcinoma response to tyrosine kinase inhibitor," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    3. Fengju Chen & Yiqun Zhang & Darshan S. Chandrashekar & Sooryanarayana Varambally & Chad J. Creighton, 2023. "Global impact of somatic structural variation on the cancer proteome," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    4. Isabelle Rose Leo & Luay Aswad & Matthias Stahl & Elena Kunold & Frederik Post & Tom Erkers & Nona Struyf & Georgios Mermelekas & Rubin Narayan Joshi & Eva Gracia-Villacampa & Päivi Östling & Olli P. , 2022. "Integrative multi-omics and drug response profiling of childhood acute lymphoblastic leukemia cell lines," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    5. Kertcher, Zack & Venkatraman, Rohan & Coslor, Erica, 2020. "Pleasingly parallel: Early cross-disciplinary work for innovation diffusion across boundaries in grid computing," Journal of Business Research, Elsevier, vol. 116(C), pages 581-594.
    6. S. Vickovic & B. Lötstedt & J. Klughammer & S. Mages & Å Segerstolpe & O. Rozenblatt-Rosen & A. Regev, 2022. "SM-Omics is an automated platform for high-throughput spatial multi-omics," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    7. Naomi S Hachiya, 2017. "Unfoldin, A Novel Tool for the Analysis of Protein Misfolding or Neurodegenerative Diseases," Open Access Journal of Neurology & Neurosurgery, Juniper Publishers Inc., vol. 6(3), pages 40-44, October.
    8. Katrin Stuber & Tobias Schneider & Jill Werner & Michael Kovermann & Andreas Marx & Martin Scheffner, 2021. "Structural and functional consequences of NEDD8 phosphorylation," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    9. Alexander Kaever & Manuel Landesfeind & Kirstin Feussner & Burkhard Morgenstern & Ivo Feussner & Peter Meinicke, 2014. "Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
    10. Dayle L Sampson & Tony J Parker & Zee Upton & Cameron P Hurst, 2011. "A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches," PLOS ONE, Public Library of Science, vol. 6(9), pages 1-11, September.
    11. Jiang Tan & Hui-Zhen Fu & Yuh-Shan Ho, 2014. "A bibliometric analysis of research on proteomics in Science Citation Index Expanded," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1473-1490, February.
    12. Chengxin Dai & Anja Füllgrabe & Julianus Pfeuffer & Elizaveta M. Solovyeva & Jingwen Deng & Pablo Moreno & Selvakumar Kamatchinathan & Deepti Jaiswal Kundu & Nancy George & Silvie Fexova & Björn Grüni, 2021. "A proteomics sample metadata representation for multiomics integration and big data analysis," Nature Communications, Nature, vol. 12(1), pages 1-8, December.
    13. Jacques Colinge & Keiryn L Bennett, 2007. "Introduction to Computational Proteomics," PLOS Computational Biology, Public Library of Science, vol. 3(7), pages 1-10, July.
    14. Guler, Arzu Tugce & Waaijer, Cathelijn J.F. & Mohammed, Yassene & Palmblad, Magnus, 2016. "Automating bibliometric analyses using Taverna scientific workflows: A tutorial on integrating Web Services," Journal of Informetrics, Elsevier, vol. 10(3), pages 830-841.
    15. Lei Xin & Rui Qiao & Xin Chen & Hieu Tran & Shengying Pan & Sahar Rabinoviz & Haibo Bian & Xianliang He & Brenton Morse & Baozhen Shan & Ming Li, 2022. "A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    16. Alla D. Fedorova & Stephen J. Kiniry & Dmitry E. Andreev & Jonathan M. Mudge & Pavel V. Baranov, 2022. "Thousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    17. S. Mouron & M. J. Bueno & A. Lluch & L. Manso & I. Calvo & J. Cortes & J. A. Garcia-Saenz & M. Gil-Gil & N. Martinez-Janez & J. V. Apala & E. Caleiras & Pilar Ximénez-Embún & J. Muñoz & L. Gonzalez-Co, 2022. "Phosphoproteomic analysis of neoadjuvant breast cancer suggests that increased sensitivity to paclitaxel is driven by CDK4 and filamin A," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    18. Jonathan J. Swietlik & Stefanie Bärthel & Chiara Falcomatà & Diana Fink & Ankit Sinha & Jingyuan Cheng & Stefan Ebner & Peter Landgraf & Daniela C. Dieterich & Henrik Daub & Dieter Saur & Felix Meissn, 2023. "Cell-selective proteomics segregates pancreatic cancer subtypes by extracellular proteins in tumors and circulation," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    19. Paul A Stewart & Katja Parapatics & Eric A Welsh & André C Müller & Haoyun Cao & Bin Fang & John M Koomen & Steven A Eschrich & Keiryn L Bennett & Eric B Haura, 2015. "A Pilot Proteogenomic Study with Data Integration Identifies MCT1 and GLUT1 as Prognostic Markers in Lung Adenocarcinoma," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-18, November.
    20. Jennifer G. Abelin & Erik J. Bergstrom & Keith D. Rivera & Hannah B. Taylor & Susan Klaeger & Charles Xu & Eva K. Verzani & C. Jackson White & Hilina B. Woldemichael & Maya Virshup & Meagan E. Olive &, 2023. "Workflow enabling deepscale immunopeptidome, proteome, ubiquitylome, phosphoproteome, and acetylome analyses of sample-limited tissues," Nature Communications, Nature, vol. 14(1), pages 1-22, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28411-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.