IDEAS home Printed from https://ideas.repec.org/a/mup/actaun/actaun_2013061040973.html
   My bibliography  Save this article

Data pre-processing for web log mining: Case study of commercial bank website usage analysis

Author

Listed:
  • Jozef Kapusta

    (Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia)

  • Anna Pilková

    (Department of Strategy and Entrepreneurship, Commenius Univeristy in Bratislava, Šafárikovo nám. 6, 818 06 Bratislava, Slovakia)

  • Michal Munk

    (Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia)

  • Peter Švec

    (Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia)

Abstract

We use data cleaning, integration, reduction and data conversion methods in the pre-processing level of data analysis. Data processing techniques improve the overall quality of the patterns mined. The paper describes using of standard pre-processing methods for preparing data of the commercial bank website in the form of the log file obtained from the web server. Data cleaning, as the simplest step of data pre-processing, is non-trivial as the analysed content is highly specific. We had to deal with the problem of frequent changes of the content and even frequent changes of the structure. Regular changes in the structure make use of the sitemap impossible. We presented approaches how to deal with this problem. We were able to create the sitemap dynamically just based on the content of the log file. In this case study, we also examined just the one part of the website over the standard analysis of an entire website, as we did not have access to all log files for the security reason. As the result, the traditional practices had to be adapted for this special case. Analysing just the small fraction of the website resulted in the short session time of regular visitors. We were not able to use recommended methods to determine the optimal value of session time. Therefore, we proposed new methods based on outliers identification for raising the accuracy of the session length in this paper.

Suggested Citation

  • Jozef Kapusta & Anna Pilková & Michal Munk & Peter Švec, 2013. "Data pre-processing for web log mining: Case study of commercial bank website usage analysis," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 61(4), pages 973-979.
  • Handle: RePEc:mup:actaun:actaun_2013061040973
    DOI: 10.11118/actaun201361040973
    as

    Download full text from publisher

    File URL: http://acta.mendelu.cz/doi/10.11118/actaun201361040973.html
    Download Restriction: free of charge

    File URL: http://acta.mendelu.cz/doi/10.11118/actaun201361040973.pdf
    Download Restriction: free of charge

    File URL: https://libkey.io/10.11118/actaun201361040973?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Stephanou, Constantinos, 2010. "Rethinking market discipline in banking : lessons from the financial crisis," Policy Research Working Paper Series 5227, The World Bank.
    2. Jiří Fejfar & Jiří Šťastný, 2011. "Time series clustering in large data sets," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 59(2), pages 75-80.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Michal Munk & Anna Pilkova & Lubomir Benko & Petra Blažeková, 2017. "Pillar 3: market discipline of the key stakeholders in CEE commercial bank and turbulent times," Journal of Business Economics and Management, Taylor & Francis Journals, vol. 18(5), pages 954-973, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Apanard P. Prabha & Clas Wihlborg & Thomas D. Willett, 2012. "Market Discipline for Financial Institutions and Markets for Information," Chapters, in: James R. Barth & Chen Lin & Clas Wihlborg (ed.), Research Handbook on International Banking and Governance, chapter 13, Edward Elgar Publishing.
    2. Joohyung Ha, 2021. "Bank accounting conservatism and bank loan quality," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 48(3-4), pages 498-532, March.
    3. MITOI Elena & ACHIM Luminita & DESPA Madalin & TURLEA Codrut, 2020. "Ifrs 9 And The Interaction With Basel Iii Regulation Pillars," Annals of Faculty of Economics, University of Oradea, Faculty of Economics, vol. 1(2), pages 213-222, December.
    4. Gaëtan Le Quang, 2019. "Discretionary loan loss provisions and market discipline," Economics Bulletin, AccessEcon, vol. 39(4), pages 2931-2941.
    5. Martin Drlík & Anna Pilková & Michal Munk & Peter Švec, 2013. "Modelling of domestic and foreign visitors' behaviour at commercial bank website during the recent financial crisis," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 61(7), pages 2065-2070.
    6. Andreas G.F. Hoepner & John O.S. Wilson, 2012. "Social, Environmental, Ethical and Trust (SEET) Issues in Banking: An Overview," Chapters, in: James R. Barth & Chen Lin & Clas Wihlborg (ed.), Research Handbook on International Banking and Governance, chapter 24, Edward Elgar Publishing.
    7. Barbara Casu & Claudia Girardone & Philip Molyneux, 2012. "Is There a Conflict between Competition and Financial Stability?," Chapters, in: James R. Barth & Chen Lin & Clas Wihlborg (ed.), Research Handbook on International Banking and Governance, chapter 3, Edward Elgar Publishing.
    8. Malgorzata Olszak & Mateusz Pipien & Sylwia Roszkowska & Iwona Kowalska, 2014. "The effects of capital on bank lending in large EU banks – the role of procyclicality, income smoothing, regulations and supervision," Faculty of Management Working Paper Series 52014, University of Warsaw, Faculty of Management.
    9. José Alves Dantas & Otávio Ribeiro de Medeiros & Paulo Roberto Barbosa Lustosa, 2013. "The Role of economic variables and credit portfolio attributes for estimating discretionary loan loss provisions in Brazilian banks," Brazilian Business Review, Fucape Business School, vol. 10(4), pages 65-90, October.
    10. Delis, Manthos D. & Hasan, Iftekhar & Iosifidi, Maria & Li, Lingxiang, 2018. "Accounting quality in banking: The role of regulatory interventions," Journal of Banking & Finance, Elsevier, vol. 97(C), pages 297-317.
    11. Hou, Xiaohui & Gao, Zhixian & Wang, Qing, 2016. "Internet finance development and banking market discipline: Evidence from China," Journal of Financial Stability, Elsevier, vol. 22(C), pages 88-100.
    12. Tito Cordella & Giovanni Dell’Ariccia & Robert Marquez, 2018. "Government Guarantees, Transparency, and Bank Risk Taking," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 66(1), pages 116-143, March.
    13. Bushman, Robert M., 2014. "Thoughts on financial accounting and the banking industry," Journal of Accounting and Economics, Elsevier, vol. 58(2), pages 384-395.
    14. Mies, Michael, 2024. "Bank opacity, systemic risk and financial stability," Journal of Financial Stability, Elsevier, vol. 70(C).
    15. Kim, Jinyong & Kim, Mingook & Lee, Jeong Hwan, 2019. "The effect of TARP on loan loss provisions and bank transparency," Journal of Banking & Finance, Elsevier, vol. 102(C), pages 79-99.
    16. Jiří Fejfar & Jiří Šťastný & Martin Pokorný & Jiří Balej & Petr Zach, 2013. "Analysis of sound data streamed over the network," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 61(7), pages 2105-2110.
    17. Edoardo Martino, 2022. "Getting bank governance right," Journal of Banking Regulation, Palgrave Macmillan, vol. 23(3), pages 302-321, September.
    18. Hong Liu & Phil Molyneux & John O. S. Wilson, 2013. "Competition And Stability In European Banking: A Regional Analysis," Manchester School, University of Manchester, vol. 81(2), pages 176-201, March.
    19. Di Fabio, Costanza & Ramassa, Paola & Quagli, Alberto, 2021. "Income smoothing in European banks: The contrasting effects of monitoring mechanisms," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 43(C).
    20. Elfers, Ferdinand & Koenraadt, Jeroen, 2022. "What you don’t know won’t hurt you: Market monitoring and bank supervisors’ preference for private information," Journal of Banking & Finance, Elsevier, vol. 143(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mup:actaun:actaun_2013061040973. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ivo Andrle (email available below). General contact details of provider: https://mendelu.cz/en/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.