IDEAS home Printed from https://ideas.repec.org/a/spr/elcore/v23y2023i3d10.1007_s10660-021-09518-4.html
   My bibliography  Save this article

Real-time user clickstream behavior analysis based on apache storm streaming

Author

Listed:
  • Gautam Pal

    (The University of Liverpool)

  • Katie Atkinson

    (The University of Liverpool)

  • Gangmin Li

    (University of Bedfordshire)

Abstract

This paper presents an approach to analyzing consumers’ e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer’s usage pattern, we uncover the user’s browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user’s clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.

Suggested Citation

  • Gautam Pal & Katie Atkinson & Gangmin Li, 2023. "Real-time user clickstream behavior analysis based on apache storm streaming," Electronic Commerce Research, Springer, vol. 23(3), pages 1829-1859, September.
  • Handle: RePEc:spr:elcore:v:23:y:2023:i:3:d:10.1007_s10660-021-09518-4
    DOI: 10.1007/s10660-021-09518-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10660-021-09518-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10660-021-09518-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Scholz, Michael, 2016. "R Package clickstream: Analyzing Clickstream Data with Markov Chains," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i04).
    2. Thomas W. Dinsmore, 2016. "Disruptive Analytics," Springer Books, Springer, number 978-1-4842-1311-7, January.
    3. Gautam Pal & Gangmin Li & Katie Atkinson, 2018. "Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics," Data, MDPI, vol. 3(4), pages 1-15, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cristian Preda & Quentin Grimonprez & Vincent Vandewalle, 2021. "Categorical Functional Data Analysis. The cfda R Package," Mathematics, MDPI, vol. 9(23), pages 1-31, November.
    2. Pavlos Delias & Vassilios Zoumpoulidis & Ioannis Kazanidis, 2019. "Visualizing and exploring event databases: a methodology to benefit from process analytics," Operational Research, Springer, vol. 19(4), pages 887-908, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:elcore:v:23:y:2023:i:3:d:10.1007_s10660-021-09518-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.