IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v13y2019i4d10.1007_s11634-018-0344-z.html
   My bibliography  Save this article

Supervised learning via smoothed Polya trees

Author

Listed:
  • William Cipolli

    (Colgate University)

  • Timothy Hanson

    (University of South Carolina)

Abstract

We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) (Cox in J R Stat Soc Ser B (Methodol) 20:215–242, 1958) and Linear Discriminant Analysis (LDA) (Fisher in Ann Eugen 7:179–188, 1936; Rao in J R Stat Soc Ser B 10:159–203, 1948) to the Bayesian nonparametric setting, providing a competitor to MclustDA (Fraley and Raftery in Am Stat Assoc 97:611–631, 2002). This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.

Suggested Citation

  • William Cipolli & Timothy Hanson, 2019. "Supervised learning via smoothed Polya trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 877-904, December.
  • Handle: RePEc:spr:advdac:v:13:y:2019:i:4:d:10.1007_s11634-018-0344-z
    DOI: 10.1007/s11634-018-0344-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-018-0344-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-018-0344-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Marco Marzio & Charles C. Taylor, 2005. "On boosting kernel density methods for multivariate data: density estimation and classification," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 14(2), pages 163-178, November.
    2. Chen, Yuhui & Hanson, Timothy E., 2014. "Bayesian nonparametric k-sample tests for censored and uncensored data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 335-346.
    3. Bergé, Laurent & Bouveyron, Charles & Girard, Stéphane, 2012. "HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 46(i06).
    4. Mukhopadhyay, Subhadeep & Ghosh, Anil K., 2011. "Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2344-2353, July.
    5. Hanson, Timothy E., 2006. "Inference for Mixtures of Finite Polya Tree Models," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1548-1565, December.
    6. Adriano Z. Zambom & Ronaldo Dias, 2013. "A Review of Kernel Density Estimation with Applications to Econometrics," International Econometric Review (IER), Econometric Research Association, vol. 5(1), pages 20-42, April.
    7. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mahsa Samsami & Ralf Wagner, 2021. "Investment Decisions with Endogeneity: A Dirichlet Tree Analysis," JRFM, MDPI, vol. 14(7), pages 1-19, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ma, Zichen & Hanson, Timothy E., 2020. "Bayesian nonparametric test for independence between random vectors," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    2. Han, Qinkai & Wang, Tianyang & Chu, Fulei, 2022. "Nonparametric copula modeling of wind speed-wind shear for the assessment of height-dependent wind energy in China," Renewable and Sustainable Energy Reviews, Elsevier, vol. 161(C).
    3. Luping Zhao & Timothy E. Hanson, 2011. "Spatially Dependent Polya Tree Modeling for Survival Data," Biometrics, The International Biometric Society, vol. 67(2), pages 391-403, June.
    4. Han, Qinkai & Chu, Fulei, 2021. "Directional wind energy assessment of China based on nonparametric copula models," Renewable Energy, Elsevier, vol. 164(C), pages 1334-1349.
    5. Miśkiewicz, Janusz, 2016. "Improving quality of sample entropy estimation for continuous distribution probability functions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 450(C), pages 473-485.
    6. Mairech, Hanene & López-Bernal, Álvaro & Moriondo, Marco & Dibari, Camilla & Regni, Luca & Proietti, Primo & Villalobos, Francisco J. & Testi, Luca, 2020. "Is new olive farming sustainable? A spatial comparison of productive and environmental performances between traditional and new olive orchards with the model OliveCan," Agricultural Systems, Elsevier, vol. 181(C).
    7. Luz Adriana Pereira & Daniel Taylor‐Rodríguez & Luis Gutiérrez, 2020. "A Bayesian nonparametric testing procedure for paired samples," Biometrics, The International Biometric Society, vol. 76(4), pages 1133-1146, December.
    8. Zhuang, Haoxin & Diao, Liqun & Yi, Grace Y., 2023. "Polya tree Monte Carlo method," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    9. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    10. Rafael Carvalho Ceregatti & Rafael Izbicki & Luis Ernesto Bueno Salasar, 2021. "WIKS: a general Bayesian nonparametric index for quantifying differences between two populations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 274-291, March.
    11. Jiajia Zhang & Timothy Hanson & Haiming Zhou, 2019. "Bayes factors for choosing among six common survival models," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(2), pages 361-379, April.
    12. Bin, Peng, 2015. "Regional Disparity and Dynamic Development of China: a Multidimensional Index," MPRA Paper 61849, University Library of Munich, Germany.
    13. Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2022. "Gaussian mixture model with an extended ultrametric covariance structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 399-427, June.
    14. Philip S. Boonstra & Bhramar Mukherjee & Jeremy M. G. Taylor & Mef Nilbert & Victor Moreno & Stephen B. Gruber, 2011. "Bayesian Modeling for Genetic Anticipation in Presence of Mutational Heterogeneity: A Case Study in Lynch Syndrome," Biometrics, The International Biometric Society, vol. 67(4), pages 1627-1637, December.
    15. Chen, Yuhui & Hanson, Timothy E., 2014. "Bayesian nonparametric k-sample tests for censored and uncensored data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 335-346.
    16. Sugawara, Shinya, 2012. "A nonparametric Bayesian approach for counterfactual prediction with an application to the Japanese private nursing home market," MPRA Paper 42154, University Library of Munich, Germany.
    17. Vaghefi, A. & Farzan, Farbod & Jafari, Mohsen A., 2015. "Modeling industrial loads in non-residential buildings," Applied Energy, Elsevier, vol. 158(C), pages 378-389.
    18. Swen Kuh & Grace S. Chiu & Anton H. Westveld, 2020. "Latent Causal Socioeconomic Health Index," Papers 2009.12217, arXiv.org, revised Oct 2023.
    19. Timothy Hanson & Mingan Yang, 2007. "Bayesian Semiparametric Proportional Odds Models," Biometrics, The International Biometric Society, vol. 63(1), pages 88-95, March.
    20. Li, Li & Hanson, Timothy E., 2014. "A Bayesian semiparametric regression model for reliability data using effective age," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 177-188.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:13:y:2019:i:4:d:10.1007_s11634-018-0344-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.