IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i3p558-578.html
   My bibliography  Save this article

Clustering based on Kolmogorov–Smirnov statistic with application to bank card transaction data

Author

Listed:
  • Yingqiu Zhu
  • Qiong Deng
  • Danyang Huang
  • Bingyi Jing
  • Bo Zhang

Abstract

Rapid developments in third‐party online payment platforms now make it possible to record massive bank card transaction data. Clustering on such transaction data is of great importance for the analysis of merchant behaviours. However, traditional methods based on generated features inevitably lead to much loss of information. To make better use of bank card transaction data, this study investigates the possibility of using the empirical cumulative distribution of transaction amounts. As the distance between two merchants can be measured using the two‐sample Kolmogorov–Smirnov test statistic, we propose the Kolmogorov–Smirnov K‐means clustering approach based on this distance measure. An approximation step is conducted to ensure the feasibility of the proposed method even for large‐scale transaction data, and the associated theoretical properties are investigated. Both simulations and an empirical study demonstrate that our method outperforms feature‐based methods and is computationally efficient for large‐scale data sets.

Suggested Citation

  • Yingqiu Zhu & Qiong Deng & Danyang Huang & Bingyi Jing & Bo Zhang, 2021. "Clustering based on Kolmogorov–Smirnov statistic with application to bank card transaction data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 558-578, June.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:558-578
    DOI: 10.1111/rssc.12471
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12471
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12471?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Holger Dannenberg & Dirk Zupancic, 2009. "Customer segmentation," Springer Books, in: Excellence in Sales, chapter 7, pages 85-93, Springer.
    2. Mahmood Alborzi & Mohammad Khanbabaei, 2016. "Using data mining and neural networks techniques to propose a new hybrid customer behaviour analysis and credit scoring model in banking services based on a developed RFM analysis method," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 23(1), pages 1-22.
    3. McCarty, John A. & Hastak, Manoj, 2007. "Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression," Journal of Business Research, Elsevier, vol. 60(6), pages 656-662, June.
    4. Holger Dannenberg & Dirk Zupancic, 2009. "Excellence in Sales," Springer Books, Springer, number 978-3-8349-8782-2, March.
    5. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    6. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    7. Holger Dannenberg & Dirk Zupancic, 2009. "Definition of sales process goals for customer segments," Springer Books, in: Excellence in Sales, chapter 8, pages 95-100, Springer.
    8. Peppard, Joe, 2000. "Customer Relationship Management (CRM) in financial services," European Management Journal, Elsevier, vol. 18(3), pages 312-327, June.
    9. Jan Roelf Bult & Tom Wansbeek, 1995. "Optimal Selection for Direct Mail," Marketing Science, INFORMS, vol. 14(4), pages 378-394.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thao Nguyen‐Trang & Tai Vo‐Van & Ha Che‐Ngoc, 2024. "An efficient automatic clustering algorithm for probability density functions and its applications in surface material classification," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 78(1), pages 244-260, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Danijel Bratina & Armand Faganel, 2023. "Using Supervised Machine Learning Methods for RFM Segmentation: A Casino Direct Marketing Communication Case," Tržište/Market, Faculty of Economics and Business, University of Zagreb, vol. 35(1), pages 7-22.
    2. Philippe Baecke & Dirk Van Den Poel, 2010. "Improving Purchasing Behavior Predictions By Data Augmentation With Situational Variables," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 9(06), pages 853-872.
    3. Keren Naa Abeka Arthur & Richard Owen, 2019. "A Micro-ethnographic Study of Big Data-Based Innovation in the Financial Services Sector: Governance, Ethics and Organisational Practices," Journal of Business Ethics, Springer, vol. 160(2), pages 363-375, December.
    4. Sipan Aslan & Ceylan Yozgatligil & Cem Iyigun, 2018. "Temporal clustering of time series via threshold autoregressive models: application to commodity prices," Annals of Operations Research, Springer, vol. 260(1), pages 51-77, January.
    5. Hache, Emmanuel & Leboullenger, Déborah & Mignon, Valérie, 2017. "Beyond average energy consumption in the French residential housing market: A household classification approach," Energy Policy, Elsevier, vol. 107(C), pages 82-95.
    6. Thiemo Fetzer & Samuel Marden, 2017. "Take What You Can: Property Rights, Contestability and Conflict," Economic Journal, Royal Economic Society, vol. 0(601), pages 757-783, May.
    7. Francesco Trebbi & Eric Weese, 2019. "Insurgency and Small Wars: Estimation of Unobserved Coalition Structures," Econometrica, Econometric Society, vol. 87(2), pages 463-496, March.
    8. Sadaf Ajmal & Sana-Ur -Rehman, 2019. "An Implementation of Customer Relationship Management and Customer satisfaction in Banking Sector of Quetta, Balochistan," International Business Research, Canadian Center of Science and Education, vol. 12(10), pages 26-37, October.
    9. Daniel Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2022. "Valuing the Time of the Self-Employed," Working Papers 2022-2, Princeton University. Economics Department..
    10. Khanh Duong, 2024. "Is meritocracy just? New evidence from Boolean analysis and Machine learning," Journal of Computational Social Science, Springer, vol. 7(2), pages 1795-1821, October.
    11. Thomas, Suman Ann & Feng, Shanfei & Krishnan, Trichy V., 2015. "To retain? To upgrade? The effects of direct mail on regular donation behavior," International Journal of Research in Marketing, Elsevier, vol. 32(1), pages 48-63.
    12. Hayk Manucharyan, 2020. "How do managers actually choose suppliers? Evidence from revealed preference data," Working Papers 2020-12, Faculty of Economic Sciences, University of Warsaw.
    13. Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    14. Chen, Yanhong & Liu, Luning & Zheng, Dequan & Li, Bin, 2023. "Estimating travellers’ value when purchasing auxiliary services in the airline industry based on the RFM model," Journal of Retailing and Consumer Services, Elsevier, vol. 74(C).
    15. Alexandra-Nicoleta Ciucu-Durnoi & Camelia Delcea, 2024. "Ecosystem Degradation in Romania: Exploring the Core Drivers," Stats, MDPI, vol. 7(1), pages 1-16, January.
    16. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    17. Audrey Mauguen & Emily C. Zabor & Nancy E. Thomas & Marianne Berwick & Venkatraman E. Seshan & Colin B. Begg, 2017. "Defining Cancer Subtypes With Distinctive Etiologic Profiles: An Application to the Epidemiology of Melanoma," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 54-63, January.
    18. Noh, Heeyong & Song, Young-Keun & Lee, Sungjoo, 2016. "Identifying emerging core technologies for the future: Case study of patents published by leading telecommunication organizations," Telecommunications Policy, Elsevier, vol. 40(10), pages 956-970.
    19. Dost, Florian & Wilken, Robert & Eisenbeiss, Maik & Skiera, Bernd, 2014. "On the Edge of Buying: A Targeting Approach for Indecisive Buyers Based on Willingness-to-Pay Ranges," Journal of Retailing, Elsevier, vol. 90(3), pages 393-407.
    20. Jie Sun & Jie Li & Hamido Fujita & Wenguo Ai, 2023. "Multiclass financial distress prediction based on one‐versus‐one decomposition integrated with improved decision‐directed acyclic graph," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(5), pages 1167-1186, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:558-578. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.