IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v21y2019i1d10.1007_s10796-018-9850-y.html
   My bibliography  Save this article

Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

Author

Listed:
  • Eric Golinko

    (Florida Atlantic University)

  • Xingquan Zhu

    (Florida Atlantic University)

Abstract

Feature embedding is an emerging research area which intends to transform features from the original space into a new space to support effective learning. Many feature embedding algorithms exist, but they often suffer from several major drawbacks, including (1) only handle single feature types, or users have to clearly separate features into different feature views and supply such information for feature embedding learning; (2) designed for either supervised or unsupervised learning tasks, but not for both; and (3) feature embedding for new out-of-training samples have to be obtained through a retraining phase, therefore unsuitable for online learning tasks. In this paper, we propose a generalized feature embedding algorithm, GEL, for both supervised, unsupervised, and online learning tasks. GEL learns feature embedding from any type of data or data with mixed feature types. For supervised learning tasks with class label information, GEL leverages a Class Partitioned Instance Representation (CPIR) process to arrange instances, based on their labels, as a dense binary representation via row and feature vectors for feature embedding learning. If class labels are unavailable, CPIR is naturally degenerated and treats all instances as one class. Based on the CPIR representation, GEL uses eigenvector decomposition to convert the proximity matrix into a low-dimensional space. For new out-of-training samples, their low-dimensional representation are derived through a direct conversion without a retraining phase. The learned numerical embedding features can be directly used to represent instances for effective learning. Experiments and comparisons on 28 datasets, including categorical, numerical, and ordinal features, demonstrate that embedding features learned from GEL can effectively represent the original instances for clustering, classification, and online learning.

Suggested Citation

  • Eric Golinko & Xingquan Zhu, 2019. "Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks," Information Systems Frontiers, Springer, vol. 21(1), pages 125-142, February.
  • Handle: RePEc:spr:infosf:v:21:y:2019:i:1:d:10.1007_s10796-018-9850-y
    DOI: 10.1007/s10796-018-9850-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-018-9850-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-018-9850-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Harry Crane, 2015. "Clustering from Categorical Data Sequences," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 810-823, June.
    2. Bates, Douglas & Eddelbuettel, Dirk, 2013. "Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i05).
    3. Chao Chen & Mei-Ling Shyu & Shu-Ching Chen, 2016. "Weighted subspace modeling for semantic concept retrieval using gaussian mixture models," Information Systems Frontiers, Springer, vol. 18(5), pages 877-889, October.
    4. Roy Gelbard, 2013. "“Padding” bitmaps to support similarity and mining," Information Systems Frontiers, Springer, vol. 15(1), pages 99-110, March.
    5. Lixin Shen & Hong Wang & Li Da Xu & Xue Ma & Sohail Chaudhry & Wu He, 2016. "Identity management based on PCA and SVM," Information Systems Frontiers, Springer, vol. 18(4), pages 711-716, August.
    6. Nenadic, Oleg & Greenacre, Michael, 2007. "Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 20(i03).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thouraya Bouabana-Tebibel & Stuart H. Rubin & Lydia Bouzar-Benlabiod, 2019. "Guest Editorial: Recent Trends in Reuse and Integration," Information Systems Frontiers, Springer, vol. 21(1), pages 1-3, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Greenacre, 2012. "Fuzzy coding in constrained ordinations," Economics Working Papers 1325, Department of Economics and Business, Universitat Pompeu Fabra.
    2. Tilman Schmider & Anne Grethe Hestnes & Julia Brzykcy & Hannes Schmidt & Arno Schintlmeister & Benjamin R. K. Roller & Ezequiel Jesús Teran & Andrea Söllinger & Oliver Schmidt & Martin F. Polz & Andre, 2024. "Physiological basis for atmospheric methane oxidation and methanotrophic growth on air," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Richards, Greg & van der Ark, L. Andries, 2013. "Dimensions of cultural consumption among tourists: Multiple correspondence analysis," Tourism Management, Elsevier, vol. 37(C), pages 71-76.
    4. Michael Greenacre, 2008. "Correspondence analysis of raw data," Economics Working Papers 1112, Department of Economics and Business, Universitat Pompeu Fabra, revised Jul 2009.
    5. Belém Barbosa & José Ramón Saura & Dag Bennett, 2024. "How do entrepreneurs perform digital marketing across the customer journey? A review and discussion of the main uses," The Journal of Technology Transfer, Springer, vol. 49(1), pages 69-103, February.
    6. M. L. M. Souza & R. R. Bastos & M. D. T. Vieira, 2022. "Calculating weighted scores from a multiple correspondence analysis solution," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(6), pages 4841-4854, December.
    7. Pötscher, Benedikt M. & Preinerstorfer, David, 2023. "How Reliable Are Bootstrap-Based Heteroskedasticity Robust Tests?," Econometric Theory, Cambridge University Press, vol. 39(4), pages 789-847, August.
    8. Martinetti, Davide & Geniaux, Ghislain, 2017. "Approximate likelihood estimation of spatial probit models," Regional Science and Urban Economics, Elsevier, vol. 64(C), pages 30-45.
    9. Thouraya Bouabana-Tebibel & Stuart H. Rubin, 2016. "Towards common reusable semantics," Information Systems Frontiers, Springer, vol. 18(5), pages 819-823, October.
    10. Aaron T L Lun & Hervé Pagès & Mike L Smith, 2018. "beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types," PLOS Computational Biology, Public Library of Science, vol. 14(5), pages 1-15, May.
    11. Michael Braun & Paul Damien, 2016. "Scalable Rejection Sampling for Bayesian Hierarchical Models," Marketing Science, INFORMS, vol. 35(3), pages 427-444, May.
    12. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    13. Gangl, Katharina & Kastlunger, Barbara & Kirchler, Erich & Voracek, Martin, 2012. "Confidence in the economy in times of crisis: Social representations of experts and laypeople," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 41(5), pages 603-614.
    14. Martina Zámková & Martin Prokop, 2014. "Comparison of Consumer Behavior of Slovaks and Czechs in the Market of Organic Products by Using Correspondence Analysis," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 62(4), pages 783-795.
    15. Mengyue Wang & Xin Li & Patrick Y. K. Chau, 2021. "Leveraging Image-Processing Techniques for Empirical Research: Feasibility and Reliability in Online Shopping Context," Information Systems Frontiers, Springer, vol. 23(3), pages 607-626, June.
    16. C. J. Torrecilla-Salinas & O. Troyer & M. J. Escalona & M. Mejías, 2019. "A Delphi-based expert judgment method applied to the validation of a mature Agile framework for Web development projects," Information Technology and Management, Springer, vol. 20(1), pages 9-40, March.
    17. Chao Wu & Xiaofang Guo & Jun Zhao & Quan Lv & Hongbin Li & Edward B. McNeil & Virasakdi Chongsuvivatwong & Hongning Zhou, 2017. "Behaviors Related to Mosquito-Borne Diseases among Different Ethnic Minority Groups along the China-Laos Border Areas," IJERPH, MDPI, vol. 14(10), pages 1-11, October.
    18. Marchese, Scott & Diao, Guoqing, 2018. "Joint regression analysis of mixed-type outcome data via efficient scores," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 156-170.
    19. Márquez, Laura Andreina Matos & Rezende, Eva Caroline Nunes & Machado, Karine Borges & Nascimento, Emilly Layne Martins do & Castro, Joana D'arc Bardella & Nabout, João Carlos, 2023. "Trends in valuation approaches for cultural ecosystem services: A systematic literature review," Ecosystem Services, Elsevier, vol. 64(C).
    20. Øystein Sørensen & Anders M. Fjell & Kristine B. Walhovd, 2023. "Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 456-486, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:21:y:2019:i:1:d:10.1007_s10796-018-9850-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.