IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v197y2012i1p123-13410.1007-s10479-010-0704-3.html
   My bibliography  Save this article

Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data

Author

Listed:
  • Gang Kou
  • Chunwei Lou

Abstract

The developments in World Wide Web and the advances in digital data collection and storage technologies during the last two decades allow companies and organizations to store and share huge amounts of electronic documents. It is hard and inefficient to manually organize, analyze and present these documents. Search engine helps users to find relevant information by present a list of web pages in response to queries. How to assist users to find the most relevant web pages from vast text collections efficiently is a big challenge. The purpose of this study is to propose a hierarchical clustering method that combines multiple factors to identify clusters of web pages that can satisfy users’ information needs. The clusters are primarily envisioned to be used for search and navigation and potentially for some form of visualization as well. An experiment on Clickstream data from a processional search engine was conducted to examine the results shown that the clustering method is effective and efficient, in terms of both objective and subjective measures. Copyright Springer Science+Business Media, LLC 2012

Suggested Citation

  • Gang Kou & Chunwei Lou, 2012. "Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data," Annals of Operations Research, Springer, vol. 197(1), pages 123-134, August.
  • Handle: RePEc:spr:annopr:v:197:y:2012:i:1:p:123-134:10.1007/s10479-010-0704-3
    DOI: 10.1007/s10479-010-0704-3
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10479-010-0704-3
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10479-010-0704-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jongwon Lee & Heeseok Lee, 2008. "Strategic Agent Based Web System Development Methodology," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(02), pages 309-337.
    2. Sergio Alejandro Gómez & Carlos Iván Chesñevar & Guillermo Ricardo Simari, 2008. "Defeasible Reasoning In Web-Based Forms Through Argumentation," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(01), pages 71-101.
    3. Soongoo Hong & Pairin Katerattanakul & Seok Jeong Joo, 2008. "Evaluating Government Website Accessibility: A Comparative Study," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(03), pages 491-515.
    4. Raid Al-Aomar & Fikri Dweiri, 2008. "A Customer-Oriented Decision Agent For Product Selection In Web-Based Services," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(01), pages 35-52.
    5. Yong Shi, 2009. "Current Research Trend: Information Technology And Decision Making In 2008," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 8(01), pages 1-5.
    6. Yong Shi & Yi Peng & Gang Kou & Zhengxin Chen, 2005. "Classifying Credit Card Accounts For Business Intelligence And Decision Making: A Multiple-Criteria Quadratic Programming Approach," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 4(04), pages 581-599.
    7. Jia Hu & Ning Zhong, 2008. "Web Farming With Clickstream," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(02), pages 291-308.
    8. Sang-Sung Park & Kwang-Kyu Seo & Dong-Sik Jang, 2007. "Fuzzy Art-Based Image Clustering Method For Content-Based Image Retrieval," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 6(02), pages 213-233.
    9. Qingyu Zhang & Richard S. Segall, 2008. "Web Mining: A Survey Of Current Research, Techniques, And Software," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(04), pages 683-720.
    10. Wen Zhang & Taketoshi Yoshida & Xijin Tang, 2009. "Distribution Of Multi-Words In Chinese And English Documents," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 8(02), pages 249-265.
    11. Yi Peng & Gang Kou & Yong Shi & Zhengxin Chen, 2008. "A Descriptive Framework For The Field Of Data Mining And Knowledge Discovery," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(04), pages 639-682.
    12. Gang Kou & Yi Peng & Yong Shi & Morgan Wise & Weixuan Xu, 2005. "Discovering Credit Cardholders’ Behavior by Multiple Criteria Linear Programming," Annals of Operations Research, Springer, vol. 135(1), pages 261-274, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    2. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2017. "A multidisciplinary perspective of big data in management research," International Journal of Production Economics, Elsevier, vol. 191(C), pages 97-112.
    3. Borja Ena & Alberto Gomez & Borja Ponte & Paolo Priore & Diego Diaz, 2022. "Homogeneous grouping of non-prime steel products for online auctions: a case study," Annals of Operations Research, Springer, vol. 315(1), pages 591-621, August.
    4. Ziyun Deng & Tingqin He, 2018. "A Method for Filtering Pages by Similarity Degree based on Dynamic Programming," Future Internet, MDPI, vol. 10(12), pages 1-12, December.
    5. Idil Yavuz & Orrin Cooper, 2017. "A dynamic clustering method to improve the coherency of an ANP Supermatrix," Annals of Operations Research, Springer, vol. 254(1), pages 507-531, July.
    6. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2019. "Technology in the 21st century: New challenges and opportunities," Technological Forecasting and Social Change, Elsevier, vol. 143(C), pages 321-335.
    7. R. Sujatha & T. M. Rajalaxmi, 2016. "Hierarchical Fuzzy Hidden Markov Chain for Web Applications," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(01), pages 83-118, January.
    8. John A. Aloysius & Hartmut Hoehle & Soheil Goodarzi & Viswanath Venkatesh, 2018. "Big data initiatives in retail environments: Linking service process perceptions to shopping outcomes," Annals of Operations Research, Springer, vol. 270(1), pages 25-51, November.
    9. Roman Vavrek, 2019. "Evaluation of the Impact of Selected Weighting Methods on the Results of the TOPSIS Technique," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(06), pages 1821-1843, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrea Ko & Saira Gillani, 2020. "A Research Review and Taxonomy Development for Decision Support and Business Analytics Using Semantic Text Mining," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 97-126, January.
    2. Wikil Kwak & Yong Shi & Gang Kou, 2012. "Bankruptcy prediction for Korean firms after the 1997 financial crisis: using a multiple criteria linear programming data mining approach," Review of Quantitative Finance and Accounting, Springer, vol. 38(4), pages 441-453, May.
    3. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    4. Chun-Hao Chen & Tzung-Pei Hong & Yeong-Chyi Lee & Vincent S. Tseng, 2015. "Finding Active Membership Functions for Genetic-Fuzzy Data Mining," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 14(06), pages 1215-1242, November.
    5. Ginger Saltos & Mihaela Cocea, 2017. "An Exploration of Crime Prediction Using Data Mining on Open Data," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(05), pages 1155-1181, September.
    6. Philippe Baecke & Dirk Van Den Poel, 2010. "Improving Purchasing Behavior Predictions By Data Augmentation With Situational Variables," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 9(06), pages 853-872.
    7. Yugang Yu & Chengbin Chu & Haoxun Chen & Feng Chu, 2012. "Large scale stochastic inventory routing problems with split delivery and service level constraints," Annals of Operations Research, Springer, vol. 197(1), pages 135-158, August.
    8. Lean Yu & Shouyang Wang & Fenghua Wen & Kin Lai, 2012. "Genetic algorithm-based multi-criteria project portfolio selection," Annals of Operations Research, Springer, vol. 197(1), pages 71-86, August.
    9. Francisco Luna & David Quintana & Sandra García & Pedro Isasi, 2016. "Enhancing Financial Portfolio Robustness with an Objective Based on ϵ-Neighborhoods," Post-Print cea-01849801, HAL.
    10. Po-Lung Yu & Yen-Chu Chen, 2012. "Dynamic multiple criteria decision making in changeable spaces: from habitual domains to innovation dynamics," Annals of Operations Research, Springer, vol. 197(1), pages 201-220, August.
    11. Parishani, Maede & Rasti-Barzoki, Morteza, 2024. "CWBCM method to determine the importance of classification performance evaluation criteria in machine learning: Case studies of COVID-19, Diabetes, and Thyroid Disease," Omega, Elsevier, vol. 127(C).
    12. Jie Wu & Liang Liang, 2012. "A multiple criteria ranking method based on game cross-evaluation approach," Annals of Operations Research, Springer, vol. 197(1), pages 191-200, August.
    13. Rahime Ceylan & Hasan Koyuncu, 2016. "A New Breakpoint in Hybrid Particle Swarm-Neural Network Architecture: Individual Boundary Adjustment," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(06), pages 1313-1343, November.
    14. Maria A. S. Xavier & Fernando A. F. Ferreira & José P. Esperança, 2021. "An intuition-based evaluation framework for social credit applications," Annals of Operations Research, Springer, vol. 296(1), pages 571-590, January.
    15. Jianfeng Xu & Yuanjian Zhang & Peng Zhang & Azhar Mahmood & Yu Li & Shaheen Khatoon, 2017. "Data Mining on ICU Mortality Prediction Using Early Temporal Data: A Survey," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 117-159, January.
    16. Peng, Yi & Kou, Gang & Wang, Guoxun & Shi, Yong, 2011. "FAMCDM: A fusion approach of MCDM methods to rank multiclass classification algorithms," Omega, Elsevier, vol. 39(6), pages 677-689, December.
    17. Fenghua Wen & Xin Yang & Xu Gong & Kin Keung Lai, 2017. "Multi-Scale Volatility Feature Analysis and Prediction of Gold Price," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 205-223, January.
    18. Zhou, Kaile & Yang, Shanlin & Shao, Zhen, 2016. "Energy Internet: The business perspective," Applied Energy, Elsevier, vol. 178(C), pages 212-222.
    19. H. Tolga Kahraman & Seref Sagiroglu & Ilhami Colak, 2016. "Novel User Modeling Approaches for Personalized Learning Environments," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(03), pages 575-602, May.
    20. Lean Yu & Xinxie Li & Ling Tang & Zongyi Zhang & Gang Kou, 2015. "Social credit: a comprehensive literature review," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 1(1), pages 1-18, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:197:y:2012:i:1:p:123-134:10.1007/s10479-010-0704-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.