IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v34y2017i3d10.1007_s00357-017-9239-5.html
   My bibliography  Save this article

Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees

Author

Listed:
  • Roberta Siciliano

    (University of Naples Federico II)

  • Antonio D’Ambrosio

    (University of Naples Federico II)

  • Massimo Aria

    (University of Naples Federico II)

  • Sonia Amodio

    (Leiden University Medical Center)

Abstract

This paper constitutes part II of the contribution to the analysis of web visit histories through a new methodological framework for web usage-structure mining considering association rules theory. The aim is to explore through a tree structure the sequence of direct rules (i.e. paths) that characterize a web navigator who keeps standing longer on a web page with respect to the path characterizing navigators who leave the web earlier. A novel tree-based structure is introduced to take into account that the learning sample changes click by click leaving out navigators who drop off from the web after any click. The response variable at each time point is the remaining number of clicks before leaving the web. The split is induced by the predictors that describe the preferred web sections. The methodology introduced results in a Nested Stump Regression Tree that is an hierarchy of stump trees, where a stump is a tree with only one split or, equivalently, with only two terminal nodes. Suitable properties are outlined. As in first part of the contribution to the analysis of the web visit histories, a methodological description is provided by considering a web portal with a fixed set of web sections, i.e. a data set coming from the UCI Machine Learning Repository.

Suggested Citation

  • Roberta Siciliano & Antonio D’Ambrosio & Massimo Aria & Sonia Amodio, 2017. "Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 473-493, October.
  • Handle: RePEc:spr:jclass:v:34:y:2017:i:3:d:10.1007_s00357-017-9239-5
    DOI: 10.1007/s00357-017-9239-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-017-9239-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-017-9239-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Antonio D’Ambrosio & Massimo Aria & Roberta Siciliano, 2012. "Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 227-258, July.
    2. Cappelli, Carmela & Mola, Francesco & Siciliano, Roberta, 2002. "A statistical approach to growing a reliable honest tree," Computational Statistics & Data Analysis, Elsevier, vol. 38(3), pages 285-299, January.
    3. Fu, Wei & Simonoff, Jeffrey S., 2015. "Unbiased regression trees for longitudinal and clustered data," Computational Statistics & Data Analysis, Elsevier, vol. 88(C), pages 53-74.
    4. Roberta Siciliano & Antonia D’Ambrosio & Massimo Aria & Sonia Amodio, 2016. "Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 298-324, July.
    5. Marjolein Fokkema & Niels Smits & Achim Zeileis & Torsten Hothorn & Henk Kelderman, 2015. "Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees," Working Papers 2015-10, Faculty of Economics and Statistics, Universität Innsbruck.
    6. Roberta Siciliano & Antonio D’Ambrosio & Massimo Aria & Sonia Amodio, 2016. "Erratum to: Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 325-325, July.
    7. Siciliano, Roberta & Mola, Francesco, 2000. "Multivariate data analysis and modeling through classification and regression trees," Computational Statistics & Data Analysis, Elsevier, vol. 32(3-4), pages 285-301, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Carmela Iorio & Giuseppe Pandolfo & Antonio D’Ambrosio & Roberta Siciliano, 2020. "Mining big data in tourism," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(5), pages 1655-1669, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carmela Iorio & Giuseppe Pandolfo & Antonio D’Ambrosio & Roberta Siciliano, 2020. "Mining big data in tourism," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(5), pages 1655-1669, December.
    2. George Petrakos & Claudio Conversano & Gregory Farmakis & Francesco Mola & Roberta Siciliano & Photis Stavropoulos, 2004. "New ways of specifying data edits," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 167(2), pages 249-274, May.
    3. Amodio, S. & D’Ambrosio, A. & Siciliano, R., 2016. "Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the Kemeny axiomatic approach," European Journal of Operational Research, Elsevier, vol. 249(2), pages 667-676.
    4. Massimo Aria & Antonio D’Ambrosio & Carmela Iorio & Roberta Siciliano & Valentina Cozza, 2020. "Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images," Statistical Papers, Springer, vol. 61(4), pages 1645-1661, August.
    5. Ivan Miguel Pires & Faisal Hussain & Nuno M. Garcia & Eftim Zdravevski, 2020. "Improving Human Activity Monitoring by Imputation of Missing Sensory Data: Experimental Study," Future Internet, MDPI, vol. 12(9), pages 1-18, September.
    6. Mariangela Sciandra & Antonella Plaia & Vincenza Capursi, 2017. "Classification trees for multivariate ordinal response: an application to Student Evaluation Teaching," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(2), pages 641-655, March.
    7. Karolis Matikonis & Matthew Gobey, 2024. "Small Business Property Tax Reductions and Firm Productivity," Small Business Economics, Springer, vol. 62(1), pages 307-324, January.
    8. Claudio Conversano & Francesco Mola & Roberta Siciliano, 2001. "Partitioning Algorithms and Combined Model Integration for Data Mining," Computational Statistics, Springer, vol. 16(3), pages 323-339, September.
    9. Thomas Bassetti & Raul Caruso & Friedrich Schneider, 2018. "The tree of political violence: a GMERT analysis," Empirical Economics, Springer, vol. 54(2), pages 839-850, March.
    10. Raval, Devesh & Rosenbaum, Ted & Wilson, Nathan E., 2021. "How do machine learning algorithms perform in predicting hospital choices? evidence from changing environments," Journal of Health Economics, Elsevier, vol. 78(C).
    11. Piccarreta, Raffaella, 2010. "Binary trees for dissimilarity data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1516-1524, June.
    12. Noh, Hyun Gon & Song, Moon Sup & Park, Sung Hyun, 2004. "An unbiased method for constructing multilabel classification trees," Computational Statistics & Data Analysis, Elsevier, vol. 47(1), pages 149-164, August.
    13. Claudio Conversano & Roberta Siciliano, 2009. "Incremental Tree-Based Missing Data Imputation with Lexicographic Ordering," Journal of Classification, Springer;The Classification Society, vol. 26(3), pages 361-379, December.
    14. Lukasz Struski & Marek Śmieja & Jacek Tabor, 2020. "Pointed Subspace Approach to Incomplete Data," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 42-57, April.
    15. repec:hal:journl:hal-04178278 is not listed on IDEAS
    16. Tsionas, Mike, 2022. "Efficiency estimation using probabilistic regression trees with an application to Chilean manufacturing industries," International Journal of Production Economics, Elsevier, vol. 249(C).
    17. Pier Perri & Peter Heijden, 2012. "A Property of the CHAID Partitioning Method for Dichotomous Randomized Response Data and Categorical Predictors," Journal of Classification, Springer;The Classification Society, vol. 29(1), pages 76-90, April.
    18. Steffen Nestler & Sarah Humberg, 2022. "A Lasso and a Regression Tree Mixed-Effect Model with Random Effects for the Level, the Residual Variance, and the Autocorrelation," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 506-532, June.
    19. Lee, Tzu-Haw & Shih, Yu-Shan, 2006. "Unbiased variable selection for classification trees with multivariate responses," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 659-667, November.
    20. Seung Yeoun Choi & Sean Hay Kim, 2022. "Selection of a Transparent Meta-Model Algorithm for Feasibility Analysis Stage of Energy Efficient Building Design: Clustering vs. Tree," Energies, MDPI, vol. 15(18), pages 1-25, September.
    21. Manhal Ali & Reza Salehnejad & Mohaimen Mansur, 2018. "Hospital heterogeneity: what drives the quality of health care," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 19(3), pages 385-408, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:34:y:2017:i:3:d:10.1007_s00357-017-9239-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.