IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v17y2023i2d10.1007_s11634-022-00513-7.html
   My bibliography  Save this article

Nonparametric regression and classification with functional, categorical, and mixed covariates

Author

Listed:
  • Leonie Selk

    (Helmut-Schmidt-University)

  • Jan Gertheiss

    (Helmut-Schmidt-University
    University of Pennsylvania)

Abstract

We consider nonparametric prediction with multiple covariates, in particular categorical or functional predictors, or a mixture of both. The method proposed bases on an extension of the Nadaraya-Watson estimator where a kernel function is applied on a linear combination of distance measures each calculated on single covariates, with weights being estimated from the training data. The dependent variable can be categorical (binary or multi-class) or continuous, thus we consider both classification and regression problems. The methodology presented is illustrated and evaluated on artificial and real world data. Particularly it is observed that prediction accuracy can be increased, and irrelevant, noise variables can be identified/removed by ‘downgrading’ the corresponding distance measures in a completely data-driven way.

Suggested Citation

  • Leonie Selk & Jan Gertheiss, 2023. "Nonparametric regression and classification with functional, categorical, and mixed covariates," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 519-543, June.
  • Handle: RePEc:spr:advdac:v:17:y:2023:i:2:d:10.1007_s11634-022-00513-7
    DOI: 10.1007/s11634-022-00513-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-022-00513-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-022-00513-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Han Lin Shang, 2014. "Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 26(3), pages 599-615, September.
    2. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Peter Hall & Qi Li & Jeffrey S. Racine, 2007. "Nonparametric Estimation of Regression Functions in the Presence of Irrelevant Regressors," The Review of Economics and Statistics, MIT Press, vol. 89(4), pages 784-789, November.
    4. Mirosław Krzyśko & Łukasz Smaga, 2017. "An Application Of Functional Multivariate Regression Model To Multiclass Classification," Statistics in Transition New Series, Polish Statistical Association, vol. 18(3), pages 433-442, September.
    5. Sergio E Baranzini & Parvin Mousavi & Jordi Rio & Stacy J Caillier & Althea Stillman & Pablo Villoslada & Matthew M Wyatt & Manuel Comabella & Larry D Greller & Roland Somogyi & Xavier Montalban & Jor, 2004. "Transcription-Based Prediction of Response to IFNβ Using Supervised Computational Methods," PLOS Biology, Public Library of Science, vol. 3(1), pages 1-1, December.
    6. Racine, Jeff & Li, Qi, 2004. "Nonparametric estimation of regression functions with both categorical and continuous data," Journal of Econometrics, Elsevier, vol. 119(1), pages 99-130, March.
    7. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    8. Alban Mbina Mbina & Guy Martial Nkiet & Fulgence Eyi Obiang, 2019. "Variable selection in discriminant analysis for mixed continuous-binary variables and several groups," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 773-795, September.
    9. Yao, Fang & Muller, Hans-Georg & Wang, Jane-Ling, 2005. "Functional Data Analysis for Sparse Longitudinal Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 577-590, June.
    10. Jeffery Racine & Jeffrey Hart & Qi Li, 2006. "Testing the Significance of Categorical Predictor Variables in Nonparametric Regression Models," Econometric Reviews, Taylor & Francis Journals, vol. 25(4), pages 523-544.
    11. Asma Gul & Aris Perperoglou & Zardad Khan & Osama Mahmoud & Miftahuddin Miftahuddin & Werner Adler & Berthold Lausen, 2018. "Ensemble of a subset of kNN classifiers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 827-840, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arribas Ivan & Perez Francisco & Tortosa-Ausina Emili, 2010. "The Determinants of International Financial Integration Revisited: The Role of Networks and Geographic Neutrality," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 15(1), pages 1-55, December.
    2. James Bugden & Robert Waschik & Iain Fraser & Jeffrey S. Racine, 2016. "Parametric and non-parametric analysis of tax changes," Global Business and Economics Review, Inderscience Enterprises Ltd, vol. 18(5), pages 533-549.
    3. Henderson, Daniel J. & Papageorgiou, Chris & Parmeter, Christopher F., 2013. "Who benefits from financial development? New methods, new evidence," European Economic Review, Elsevier, vol. 63(C), pages 47-67.
    4. Ekpeno L. Effiong & Emmanuel E. Asuquo, 2017. "Migrants' Remittances, Governance and Heterogeneity," International Economic Journal, Taylor & Francis Journals, vol. 31(4), pages 535-554, October.
    5. Li, Degui & Simar, Léopold & Zelenyuk, Valentin, 2016. "Generalized nonparametric smoothing with mixed discrete and continuous data," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 424-444.
    6. Simar, Leopold & Zelenyuk, Valentin, 2011. "To Smooth or Not to Smooth? The Case of Discrete Variables in Nonparametric Regressions," LIDAM Discussion Papers ISBA 2011042, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    7. Xibin Zhang & Maxwell L. King & Han Lin Shang, 2016. "Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors," Econometrics, MDPI, vol. 4(2), pages 1-27, April.
    8. repec:jss:jstsof:27:i05 is not listed on IDEAS
    9. Zongwu Cai & Qi Li, 2013. "Some Recent Develop- ments on Nonparametric Econometrics," Working Papers 2013-10-14, Wang Yanan Institute for Studies in Economics (WISE), Xiamen University.
    10. Byeong Park & Léopold Simar & Valentin Zelenyuk, 2015. "Categorical data in local maximum likelihood: theory and applications to productivity analysis," Journal of Productivity Analysis, Springer, vol. 43(2), pages 199-214, April.
    11. Hayfield, Tristen & Racine, Jeffrey S., 2008. "Nonparametric Econometrics: The np Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i05).
    12. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    13. Jeffrey Racine, 2008. "Nonparametric econometrics: a primer (in Russian)," Quantile, Quantile, issue 4, pages 7-56, March.
    14. Hans R. A. Koster & Jos N. van Ommeren & Piet Rietveld, 2016. "Historic amenities, income and sorting of households," Journal of Economic Geography, Oxford University Press, vol. 16(1), pages 203-236.
    15. Qi Li & Juan Lin & Jeffrey S. Racine, 2013. "Optimal Bandwidth Selection for Nonparametric Conditional Distribution and Quantile Functions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 31(1), pages 57-65, January.
    16. Park, Byeong U. & Simar, Léopold & Zelenyuk, Valentin, 2017. "Nonparametric estimation of dynamic discrete choice models for time series data," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 97-120.
    17. Michael S. Delgado & Daniel J. Henderson & Christopher F. Parmeter, 2014. "Does Education Matter for Economic Growth?," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 76(3), pages 334-359, June.
    18. Nolwenn Roudaut & Anne Vanhems, 2012. "Explaining firms efficiency in the Ivorian manufacturing sector: a robust nonparametric approach," Journal of Productivity Analysis, Springer, vol. 37(2), pages 155-169, April.
    19. Daniel J. Henderson & Andrew Houtenville & Le Wang, 2017. "The Distribution of Returns to Education for People with Disabilities," Journal of Labor Research, Springer, vol. 38(3), pages 261-282, September.
    20. Chen, Xirong & Li, Degui & Li, Qi & Li, Zheng, 2019. "Nonparametric estimation of conditional quantile functions in the presence of irrelevant covariates," Journal of Econometrics, Elsevier, vol. 212(2), pages 433-450.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:17:y:2023:i:2:d:10.1007_s11634-022-00513-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.