Author
Listed:
- Ivo D Dinov
- Ben Heavner
- Ming Tang
- Gustavo Glusman
- Kyle Chard
- Mike Darcy
- Ravi Madduri
- Judy Pa
- Cathie Spino
- Carl Kesselman
- Ian Foster
- Eric W Deutsch
- Nathan D Price
- John D Van Horn
- Joseph Ames
- Kristi Clark
- Leroy Hood
- Benjamin M Hampstead
- William Dauer
- Arthur W Toga
Abstract
Background: A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings: Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions: Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.
Suggested Citation
Ivo D Dinov & Ben Heavner & Ming Tang & Gustavo Glusman & Kyle Chard & Mike Darcy & Ravi Madduri & Judy Pa & Cathie Spino & Carl Kesselman & Ian Foster & Eric W Deutsch & Nathan D Price & John D Van H, 2016.
"Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations,"
PLOS ONE, Public Library of Science, vol. 11(8), pages 1-28, August.
Handle:
RePEc:plo:pone00:0157077
DOI: 10.1371/journal.pone.0157077
Download full text from publisher
Citations
Citations are extracted by the
CitEc Project, subscribe to its
RSS feed for this item.
Cited by:
- Wei-wei Zhang & Jyoti Bhola & Rajeev Kumar & Nitin Saluja, 2022.
"Study and analysis of big data for characterization of user association in large scale,"
International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(1), pages 375-384, March.
- Tiwana Varrecchia & Stefano Filippo Castiglia & Alberto Ranavolo & Carmela Conte & Antonella Tatarelli & Gianluca Coppola & Cherubino Di Lorenzo & Francesco Draicchio & Francesco Pierelli & Mariano Se, 2021.
"An artificial neural network approach to detect presence and severity of Parkinson’s disease via gait parameters,"
PLOS ONE, Public Library of Science, vol. 16(2), pages 1-16, February.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0157077. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.