Variable selection by Random Forests using data with missing values
Author
Abstract
Suggested Citation
DOI: 10.1016/j.csda.2014.06.017
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
References listed on IDEAS
- van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
- Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
- Kung, Yi-Hung & Lin, Chang-Ting & Shih, Yu-Shan, 2012. "Split variable selection for tree modeling on rank data," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2830-2836.
- Horton, Nicholas J. & Kleinman, Ken P., 2007. "Much Ado About Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models," The American Statistician, American Statistical Association, vol. 61, pages 79-90, February.
- Doove, L.L. & Van Buuren, S. & Dusseldorp, E., 2014. "Recursive partitioning for missing data imputation in the presence of interaction effects," Computational Statistics & Data Analysis, Elsevier, vol. 72(C), pages 92-104.
- Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.
- Hapfelmeier, A. & Hothorn, T. & Ulm, K., 2012. "Recursive partitioning on incomplete data using surrogate decisions and multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1552-1565.
- Lee, Tzu-Haw & Shih, Yu-Shan, 2006. "Unbiased variable selection for classification trees with multivariate responses," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 659-667, November.
- Shih, Yu-Shan & Tsai, Hsin-Wen, 2004. "Variable selection bias in regression trees with constant fits," Computational Statistics & Data Analysis, Elsevier, vol. 45(3), pages 595-607, April.
- Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
Citations
Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
Cited by:
- Hapfelmeier, Alexander & Hornung, Roman & Haller, Bernhard, 2023. "Efficient permutation testing of variable importance measures by the example of random forests," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
- Ha, Tran Vinh & Asada, Takumi & Arimura, Mikiharu, 2019. "Determination of the influence factors on household vehicle ownership patterns in Phnom Penh using statistical and machine learning methods," Journal of Transport Geography, Elsevier, vol. 78(C), pages 70-86.
- Sachin Kumar & T. Gopi & N. Harikeerthana & Munish Kumar Gupta & Vidit Gaur & Grzegorz M. Krolczyk & ChuanSong Wu, 2023. "Machine learning techniques in additive manufacturing: a state of the art review on design, processes and production control," Journal of Intelligent Manufacturing, Springer, vol. 34(1), pages 21-55, January.
- Lee, Min Cherng & Mitra, Robin, 2016. "Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 24-38.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Hapfelmeier, Alexander & Hornung, Roman & Haller, Bernhard, 2023. "Efficient permutation testing of variable importance measures by the example of random forests," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
- Liangyuan Hu & Lihua Li, 2022. "Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series," IJERPH, MDPI, vol. 19(23), pages 1-13, December.
- Youngjoo Cho & Debashis Ghosh, 2021. "Quantile-Based Subgroup Identification for Randomized Clinical Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(1), pages 90-128, April.
- Ollech, Daniel & Webel, Karsten, 2020. "A random forest-based approach to identifying the most informative seasonality tests," Discussion Papers 55/2020, Deutsche Bundesbank.
- Thelma Dede Baddoo & Zhijia Li & Samuel Nii Odai & Kenneth Rodolphe Chabi Boni & Isaac Kwesi Nooni & Samuel Ato Andam-Akorful, 2021. "Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation," IJERPH, MDPI, vol. 18(16), pages 1-26, August.
- Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
- Burim Ramosaj & Markus Pauly, 2019. "Predicting missing values: a comparative study on non-parametric approaches for imputation," Computational Statistics, Springer, vol. 34(4), pages 1741-1764, December.
- Kristian Kleinke & Mark Stemmler & Jost Reinecke & Friedrich Lösel, 2011. "Efficient ways to impute incomplete panel data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(4), pages 351-373, December.
- Wei, Pengfei & Lu, Zhenzhou & Song, Jingwen, 2015. "Variable importance analysis: A comprehensive review," Reliability Engineering and System Safety, Elsevier, vol. 142(C), pages 399-432.
- Hayes, Timothy & McArdle, John J., 2017. "Should we impute or should we weight? Examining the performance of two CART-based techniques for addressing missing data in small sample research with nonnormal variables," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 35-52.
- Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
- Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
- Saurabh Saxena & Darius Roman & Valentin Robu & David Flynn & Michael Pecht, 2021. "Battery Stress Factor Ranking for Accelerated Degradation Test Planning Using Machine Learning," Energies, MDPI, vol. 14(3), pages 1-17, January.
- Fellinghauer, Bernd & Bühlmann, Peter & Ryffel, Martin & von Rhein, Michael & Reinhardt, Jan D., 2013. "Stable graphical model estimation with Random Forests for discrete, continuous, and mixed variables," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 132-152.
- Yi-Sheng Chao & Hsing-Chien Wu & Chao-Jung Wu & Wei-Chih Chen, 2018. "Index or illusion: The case of frailty indices in the Health and Retirement Study," PLOS ONE, Public Library of Science, vol. 13(7), pages 1-19, July.
- Gerhard Tutz & Moritz Berger, 2016. "Item-focussed Trees for the Identification of Items in Differential Item Functioning," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 727-750, September.
- Daniel L. Chen & Markus Loecher, 2022. "Mood and the Malleability of Moral Reasoning: The Impact of Irrelevant Factors on Judicial Decisions," Working Papers hal-03864854, HAL.
- Ingrida Vaiciulyte & Zivile Kalsyte & Leonidas Sakalauskas & Darius Plikynas, 2017. "Assessment of market reaction on the share performance on the basis of its visualization in 2D space," Journal of Business Economics and Management, Taylor & Francis Journals, vol. 18(2), pages 309-318, March.
- Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
- Göran Kauermann & Mehboob Ali, 2021. "Semi-parametric regression when some (expensive) covariates are missing by design," Statistical Papers, Springer, vol. 62(4), pages 1675-1696, August.
More about this item
Keywords
Random Forests; Variable importance; Variable selection; Missing data; Multiple imputation; Complete case analysis;All these keywords.
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:80:y:2014:i:c:p:129-139. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.