IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v20y2018i2d10.1007_s10796-016-9690-6.html
   My bibliography  Save this article

A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

Author

Listed:
  • Qi Liu

    (Xi’an JiaoTong University
    The key lab of the ministry of education for process control and efficiency engineering)

  • Gengzhong Feng

    (Xi’an JiaoTong University
    The key lab of the ministry of education for process control and efficiency engineering)

  • Nengmin Wang

    (Xi’an JiaoTong University
    The key lab of the ministry of education for process control and efficiency engineering)

  • Giri Kumar Tayi

    (SUNY at Albany)

Abstract

Discovering knowledge from data means finding useful patterns in data, this process has increased the opportunity and challenge for businesses in the big data era. Meanwhile, improving the quality of the discovered knowledge is important for making correct decisions in an unpredictable environment. Various models have been developed in the past; however, few used both data quality and prior knowledge to control the quality of the discovery processes and results. In this paper, a multi-objective model of knowledge discovery in databases is developed, which aids the discovery process by utilizing prior process knowledge and different measures of data quality. To illustrate the model, association rule mining is considered and formulated as a multi-objective problem that takes into account data quality measures and prior process knowledge instead of a single objective problem. Measures such as confidence, support, comprehensibility and interestingness are used. A Pareto-based integrated multi-objective Artificial Bee Colony (IMOABC) algorithm is developed to solve the problem. Using well-known and publicly available databases, experiments are carried out to compare the performance of IMOABC with NSGA-II, MOPSO and Apriori algorithms, respectively. The computational results show that IMOABC outperforms NSGA-II, MOPSO and Apriori on different measures and it could be easily customized or tailored to be in line with user requirements and still generates high-quality association rules.

Suggested Citation

  • Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
  • Handle: RePEc:spr:infosf:v:20:y:2018:i:2:d:10.1007_s10796-016-9690-6
    DOI: 10.1007/s10796-016-9690-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-016-9690-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-016-9690-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Corne, David & Dhaenens, Clarisse & Jourdan, Laetitia, 2012. "Synergies between operations research and data mining: The emerging use of multi-objective approaches," European Journal of Operational Research, Elsevier, vol. 221(3), pages 469-479.
    2. Sikora, Riyaz & Piramuthu, Selwyn, 2007. "Framework for efficient feature selection in genetic algorithm based data mining," European Journal of Operational Research, Elsevier, vol. 180(2), pages 723-737, July.
    3. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    4. Jinwook Lee & András Prékopa, 2013. "Properties and calculation of multivariate risk measures: MVaR and MCVaR," Annals of Operations Research, Springer, vol. 211(1), pages 225-254, December.
    5. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2004. "Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product," Management Science, INFORMS, vol. 50(7), pages 967-982, July.
    6. Tomo Popovic & Mladen Kezunovic & Bozo Krstajic, 2015. "Smart grid data analytics for digital protective relay event recordings," Information Systems Frontiers, Springer, vol. 17(3), pages 591-600, June.
    7. Atanu Lahiri & Debabrata Dey, 2013. "Effects of Piracy on Quality of Information Goods," Management Science, INFORMS, vol. 59(1), pages 245-264, June.
    8. Qiang Yang & Xindong Wu, 2006. "10 Challenging Problems In Data Mining Research," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 5(04), pages 597-604.
    9. César Guerra-García & Ismael Caballero & Mario Piattini, 2013. "Capturing data quality requirements for web applications by means of DQ_WebRE," Information Systems Frontiers, Springer, vol. 15(3), pages 433-445, July.
    10. de la Iglesia, B. & Richards, G. & Philpott, M.S. & Rayward-Smith, V.J., 2006. "The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification," European Journal of Operational Research, Elsevier, vol. 169(3), pages 898-917, March.
    11. Szeto, W.Y. & Wu, Yongzhong & Ho, Sin C., 2011. "An artificial bee colony algorithm for the capacitated vehicle routing problem," European Journal of Operational Research, Elsevier, vol. 215(1), pages 126-135, November.
    12. Naeem Khalid Janjua & Farookh Khadeer Hussain & Omar Khadeer Hussain, 2013. "Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making," Information Systems Frontiers, Springer, vol. 15(2), pages 167-192, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yalcin, Ahmet Selcuk & Kilic, Huseyin Selcuk & Delen, Dursun, 2022. "The use of multi-criteria decision-making methods in business analytics: A comprehensive literature review," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    2. Manuela Svoboda, 2022. "Evaluation of Motivation, Expectation, and Present Situation in 3rd Year Undergraduate Students of German Language and Literature at the University of Rijeka, Croatia," European Journal of Education Articles, Revistia Research and Publishing, vol. 5, ejed_v5_i.
    3. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    4. Lu, Jizhou & Feng, Gengzhong & Shum, Stephen & Lai, Kin Keung, 2021. "On the value of information sharing in the presence of information errors," European Journal of Operational Research, Elsevier, vol. 294(3), pages 1139-1152.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    2. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    3. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    4. Van Nguyen, Truong & Zhang, Jie & Zhou, Li & Meng, Meng & He, Yong, 2020. "A data-driven optimization of large-scale dry port location using the hybrid approach of data mining and complex network theory," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 134(C).
    5. Clarisse Dhaenens & Laetitia Jourdan, 2022. "Metaheuristics for data mining: survey and opportunities for big data," Annals of Operations Research, Springer, vol. 314(1), pages 117-140, July.
    6. James Fan & Christopher Griffin, 2014. "Optimal Digital Product Maintenance with a Continuous Revenue Stream," Papers 1412.8624, arXiv.org, revised Feb 2017.
    7. Can Sun & Yonghua Ji & Xianjun Geng, 2023. "Which Enemy to Dance with? A New Role of Software Piracy in Influencing Antipiracy Strategies," Information Systems Research, INFORMS, vol. 34(4), pages 1711-1727, December.
    8. DE CNUDDE, Sofie & MARTENS, David & EVGENIOU, Theodoros & PROVOST, Foster, 2017. "A benchmarking study of classification techniques for behavioral data," Working Papers 2017005, University of Antwerp, Faculty of Business and Economics.
    9. Raeesi, Ramin & Sahebjamnia, Navid & Mansouri, S. Afshin, 2023. "The synergistic effect of operational research and big data analytics in greening container terminal operations: A review and future directions," European Journal of Operational Research, Elsevier, vol. 310(3), pages 943-973.
    10. Dan Wu & Guofang Nan & Minqiang Li, 2020. "Optimal Piracy Control: Should a Firm Implement Digital Rights Management?," Information Systems Frontiers, Springer, vol. 22(4), pages 947-960, August.
    11. Liao, Jui-Jung & Shih, Ching-Hui & Chen, Tai-Feng & Hsu, Ming-Fu, 2014. "An ensemble-based model for two-class imbalanced financial problem," Economic Modelling, Elsevier, vol. 37(C), pages 175-183.
    12. Marcelo Brutti Righi & Paulo Sergio Ceretta, 2015. "Shortfall Deviation Risk: An alternative to risk measurement," Papers 1501.02007, arXiv.org, revised May 2016.
    13. Hu, Shu & Fu, Ke & Wu, Tong, 2021. "The role of consumer behavior and power structures in coping with shoddy goods," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 155(C).
    14. Sordo, Miguel A., 2016. "A multivariate extension of the increasing convex order to compare risks," Insurance: Mathematics and Economics, Elsevier, vol. 68(C), pages 224-230.
    15. Tom Pape, 2020. "Prioritising data items for business analytics: Framework and application to human resources," Papers 2012.13813, arXiv.org.
    16. Claudia Diamantini & Paolo Lo Giudice & Domenico Potena & Emanuele Storti & Domenico Ursino, 2021. "An Approach to Extracting Topic-guided Views from the Sources of a Data Lake," Information Systems Frontiers, Springer, vol. 23(1), pages 243-262, February.
    17. Peng, Shuxia & Li, Bo & Wu, Shuang, 2023. "Presence of piracy and legal protection: Decisions in the digital goods market under different contracts," European Journal of Operational Research, Elsevier, vol. 309(2), pages 578-596.
    18. Guo, Jiaqi & Long, Jiancheng & Xu, Xiaoming & Yu, Miao & Yuan, Kai, 2022. "The vehicle routing problem of intercity ride-sharing between two cities," Transportation Research Part B: Methodological, Elsevier, vol. 158(C), pages 113-139.
    19. Terrence August & Duy Dao & Hyoduk Shin, 2015. "Optimal Timing of Sequential Distribution: The Impact of Congestion Externalities and Day-and-Date Strategies," Marketing Science, INFORMS, vol. 34(5), pages 755-774, September.
    20. Prékopa, András & Lee, Jinwook, 2018. "Risk tomography," European Journal of Operational Research, Elsevier, vol. 265(1), pages 149-168.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:20:y:2018:i:2:d:10.1007_s10796-016-9690-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.