MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets

My bibliography Save this article

MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets

Author

Listed:

Kristen M Naegle
Roy E Welsch
Michael B Yaffe
Forest M White
Douglas A Lauffenburger

Registered:

Abstract

Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology (‘MCAM’) employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. Author Summary: Proteomic measurements, especially modification measurements, are greatly expanding the current knowledge of the state of proteins under various conditions. Harnessing these measurements to understand how these modifications are enzymatically regulated and their subsequent function in cellular signaling and physiology is a challenging new problem. Clustering has been very useful in reducing the dimensionality of many types of high-throughput biological data, as well inferring function of poorly understood molecular species. However, its implementation requires a great deal of technical expertise since there are a large number of parameters one must decide on in clustering, including data transforms, distance metrics, and algorithms. Previous knowledge of useful parameters does not exist for measurements of a new type. In this work we address two issues. First, we develop a framework that incorporates any number of possible parameters of clustering to produce a suite of clustering solutions. These solutions are then judged on their ability to infer biological information through statistical enrichment of existing biological annotations. Second, we apply this framework to dynamic phosphorylation measurements of the ERBB network, constructing the first extensive analysis of clustering of phosphoproteomic data and generating insight into novel components and novel functions of known components of the ERBB network.

Suggested Citation

Kristen M Naegle & Roy E Welsch & Michael B Yaffe & Forest M White & Douglas A Lauffenburger, 2011. "MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets," PLOS Computational Biology, Public Library of Science, vol. 7(7), pages 1-15, July.

Handle: RePEc:plo:pcbi00:1002119
DOI: 10.1371/journal.pcbi.1002119

Download full text from publisher

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Mark L Grimes & Wan-Jui Lee & Laurens van der Maaten & Paul Shannon, 2013. "Wrangling Phosphoproteomic Data to Elucidate Cancer Signaling Pathways," PLOS ONE, Public Library of Science, vol. 8(1), pages 1-16, January.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002119. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets

Author

Abstract

Suggested Citation

Download full text from publisher

Citations

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data