Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories

My bibliography Save this article

Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories

Author

Listed:

Jong Victor L.
(Biostatistics and Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3508 GA Utrecht, The Netherlands Erasmus Medical Center Rotterdam, Department of Viroscience, ‘s Gravendijkwal 230, 3015 CE Rotterdam, The Netherlands)
Novianti Putri W.
(Biostatistics and Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3508 GA Utrecht, The Netherlands)
Roes Kit C.B.
(Biostatistics and Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3508 GA Utrecht, The Netherlands)
Eijkemans Marinus J.C.
(Biostatistics and Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3508 GA Utrecht, The Netherlands)

Registered:

Abstract

The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box’s M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.

Suggested Citation

Jong Victor L. & Novianti Putri W. & Roes Kit C.B. & Eijkemans Marinus J.C., 2014. "Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(6), pages 717-732, December.

Handle: RePEc:bpj:sagmbi:v:13:y:2014:i:6:p:16:n:6
DOI: 10.1515/sagmb-2014-0003

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Zhang Bin & Horvath Steve, 2005. "A General Framework for Weighted Gene Co-Expression Network Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-45, August.
Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
Lee, Jae Won & Lee, Jung Bok & Park, Mira & Song, Seuck Heun, 2005. "An extensive comparison of recent classification tools applied to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 48(4), pages 869-885, April.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Sahra Uygun & Cheng Peng & Melissa D Lehti-Shiu & Robert L Last & Shin-Han Shiu, 2016. "Utility and Limitations of Using Gene Expression Data to Identify Functional Associations," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-27, December.
Min Jin Ha & Wei Sun, 2014. "Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation," Biometrics, The International Biometric Society, vol. 70(3), pages 762-770, September.
Frénay, Benoît & Doquire, Gauthier & Verleysen, Michel, 2014. "Estimating mutual information for feature selection in the presence of label noise," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 832-848.
Thiemo Fetzer & Samuel Marden, 2017. "Take What You Can: Property Rights, Contestability and Conflict," Economic Journal, Royal Economic Society, vol. 0(601), pages 757-783, May.
- Thiemo, Fetzer & Marden, Samuel, 2016. "Take what you can: property rights, contestability and conflict," CAGE Online Working Paper Series 285, Competitive Advantage in the Global Economy (CAGE).
- Thiemo Fetzer & Samuel Marden, 2016. "Take what you can: property rights, contestability and conflict," HiCN Working Papers 214, Households in Conflict Network.
- Thiemo Fetzer & Samuel Marden, 2016. "Take What You Can: Property Rights, Contestability and Conflict," SERC Discussion Papers 0194, Centre for Economic Performance, LSE.
- Thiemo Fetzer & Samuel Marden, 2016. "Take what you can: property rights, contestability and conflict," Working Paper Series 09216, Department of Economics, University of Sussex Business School.
- Fetzer, Thiemo & Marden, Samuel, 2016. "Take what you can: property rights, contestability andconflict," LSE Research Online Documents on Economics 66534, London School of Economics and Political Science, LSE Library.
Daniel Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2022. "Valuing the Time of the Self-Employed," CESifo Working Paper Series 9567, CESifo.
- Daniel Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2023. "Valuing the Time of the Self-Employed," Working Papers 310, Princeton University, Department of Economics, Center for Economic Policy Studies..
- Daniel J. Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2022. "Valuing the Time of the Self-Employed," NBER Working Papers 29752, National Bureau of Economic Research, Inc.
- Agness, Daniel & Baseler, Travis & Chassang, Sylvain & Dupas, Pascaline & Snowberg, Erik, 2022. "Valuing the Time of the Self-Employed," CEPR Discussion Papers 17017, C.E.P.R. Discussion Papers.
- Daniel Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2022. "Valuing the Time of the Self-Employed," Working Papers 2022-2, Princeton University. Economics Department..
Khanh Duong, 2024. "Is meritocracy just? New evidence from Boolean analysis and Machine learning," Journal of Computational Social Science, Springer, vol. 7(2), pages 1795-1821, October.
Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
Yixuan Qiu & Jing Lei & Kathryn Roeder, 2023. "Gradient-based sparse principal component analysis with extensions to online learning," Biometrika, Biometrika Trust, vol. 110(2), pages 339-360.
Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
Ruiz Vargas, E. & Mitchell, D.G.V. & Greening, S.G. & Wahl, L.M., 2014. "Topology of whole-brain functional MRI networks: Improving the truncated scale-free model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 405(C), pages 151-158.
Ahrum Son & Hyunsoo Kim & Jolene K. Diedrich & Casimir Bamberger & Daniel B. McClatchy & Stuart A. Lipton & John R. Yates, 2024. "Using in vivo intact structure for system-wide quantitative analysis of changes in proteins," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
Orietta Nicolis & Jean Paul Maidana & Fabian Contreras & Danilo Leal, 2024. "Analyzing the Impact of COVID-19 on Economic Sustainability: A Clustering Approach," Sustainability, MDPI, vol. 16(4), pages 1-30, February.
Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
Forzani, Liliana & Gieco, Antonella & Tolmasky, Carlos, 2017. "Likelihood ratio test for partial sphericity in high and ultra-high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 18-38.
Yujia Li & Xiangrui Zeng & Chien‐Wei Lin & George C. Tseng, 2022. "Simultaneous estimation of cluster number and feature sparsity in high‐dimensional cluster analysis," Biometrics, The International Biometric Society, vol. 78(2), pages 574-585, June.
Yan Guo & Hui Yu & Haocan Song & Jiapeng He & Olufunmilola Oyebamiji & Huining Kang & Jie Ping & Scott Ness & Yu Shyr & Fei Ye, 2021. "MetaGSCA: A tool for meta-analysis of gene set differential coexpression," PLOS Computational Biology, Public Library of Science, vol. 17(5), pages 1-15, May.
Vojtech Blazek & Michal Petruzela & Tomas Vantuch & Zdenek Slanina & Stanislav Mišák & Wojciech Walendziuk, 2020. "The Estimation of the Influence of Household Appliances on the Power Quality in a Microgrid System," Energies, MDPI, vol. 13(17), pages 1-21, August.
Xue Jiang & Han Zhang & Xiongwen Quan & Zhandong Liu & Yanbin Yin, 2017. "Disease-related gene module detection based on a multi-label propagation clustering algorithm," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-17, May.
Mandel, Antoine & Landini, Simone & Gallegati, Mauro & Gintis, Herbert, 2015. "Price dynamics, financial fragility and aggregate volatility," Journal of Economic Dynamics and Control, Elsevier, vol. 51(C), pages 257-277.
- Antoine Mandel & Simone Landini & Mauro Gallegati & Herbert Gintis, 2013. "Price Dynamics, financial fragility and aggregate volatility," Post-Print halshs-00917892, HAL.
- Antoine Mandel & Simone Landini & Mauro Gallegati & Herbert Gintis, 2013. "Price Dynamics, financial fragility and aggregate volatility," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00917892, HAL.
- Antoine Mandel & Simone Landini & Mauro Gallegati & Herbert Gintis, 2015. "Price dynamics, financial fragility and aggregate volatility," Post-Print halshs-01152302, HAL.
- Antoine Mandel & Simone Landini & Mauro Gallegati & Herbert Gintis, 2015. "Price dynamics, financial fragility and aggregate volatility," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01152302, HAL.
- Antoine Mandel & Simone Landini & Mauro Gallegati & Herbert Gintis, 2013. "Price dynamics, financial fragility and aggregate volatility," Documents de travail du Centre d'Economie de la Sorbonne 13076, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
- Antoine Mandel & Simone Landini & Mauro Gallegati & Herbert Gintis, 2015. "Price dynamics, financial fragility and aggregate volatility," PSE-Ecole d'économie de Paris (Postprint) halshs-01152302, HAL.

More about this item

Keywords

clustering on correlation; gene expression data; homogeneity of correlation structures; microarray analysis;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:13:y:2014:i:6:p:16:n:6. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data