IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v197y2009i2p764-772.html
   My bibliography  Save this article

Data preparation using data quality matrices for classification mining

Author

Listed:
  • Davidson, Ian
  • Tayi, Giri

Abstract

Data mining aims to find patterns in organizational databases. However, most techniques in mining do not consider knowledge of the quality of the database. In this work, we show how to incorporate into classification mining recent advances in the data quality field that view a database as the product of an imprecise manufacturing process where the flaws/defects are captured in quality matrices. We develop a general purpose method of incorporating data quality matrices into the data mining classification task. Our work differs from existing data preparation techniques since while other approaches detect and fix errors to ensure consistency with the entire data set our work makes use of the apriori knowledge of how the data is produced/manufactured.

Suggested Citation

  • Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
  • Handle: RePEc:eee:ejores:v:197:y:2009:i:2:p:764-772
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377-2217(08)00560-2
    Download Restriction: Full text for ScienceDirect subscribers only
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    2. Donald Ballou & Richard Wang & Harold Pazer & Giri Kumar Tayi, 1998. "Modeling Information Manufacturing Systems to Determine Information Product Quality," Management Science, INFORMS, vol. 44(4), pages 462-484, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    2. Perko, Igor, 2017. "Behaviour-based short-term invoice probability of default evaluation," European Journal of Operational Research, Elsevier, vol. 257(3), pages 1045-1054.
    3. Farnè, Matteo & Vouldis, Angelos T., 2018. "A methodology for automised outlier detection in high-dimensional datasets: an application to euro area banks' supervisory data," Working Paper Series 2171, European Central Bank.
    4. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    5. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    2. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    3. Juha-Miikka Nurmilaakso, 2014. "Coordination costs and ICT investments: an economic analysis," Netnomics, Springer, vol. 15(2), pages 57-67, September.
    4. Even, Adir & Shankaranarayanan, G. & Berger, Paul D., 2010. "Managing the Quality of Marketing Data: Cost/benefit Tradeoffs and Optimal Configuration," Journal of Interactive Marketing, Elsevier, vol. 24(3), pages 209-221.
    5. Daniel Gartner & Yiye Zhang & Rema Padman, 2018. "Cognitive workload reduction in hospital information systems," Health Care Management Science, Springer, vol. 21(2), pages 224-243, June.
    6. Tom Pape, 2020. "Prioritising data items for business analytics: Framework and application to human resources," Papers 2012.13813, arXiv.org.
    7. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.
    8. Anzanello, Michel J. & Albin, Susan L. & Chaovalitwongse, Wanpracha A., 2012. "Multicriteria variable selection for classification of production batches," European Journal of Operational Research, Elsevier, vol. 218(1), pages 97-105.
    9. Jesse G. Wales & Alexander J. Zolan & William T. Hamilton & Alexandra M. Newman & Michael J. Wagner, 2023. "Combining simulation and optimization to derive operating policies for a concentrating solar power plant," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 119-150, March.
    10. Klein, B. D. & Rossin, D. F., 1999. "Data quality in neural network models: effect of error rate and magnitude of error on predictive accuracy," Omega, Elsevier, vol. 27(5), pages 569-582, October.
    11. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.
    12. Geuens, Stijn & Coussement, Kristof & De Bock, Koen W., 2018. "A framework for configuring collaborative filtering-based recommendations derived from purchase data," European Journal of Operational Research, Elsevier, vol. 265(1), pages 208-218.
    13. Saridakis, Charalampos & Katsikeas, Constantine S. & Angelidou, Sofia & Oikonomidou, Maria & Pratikakis, Polyvios, 2023. "Mining Twitter lists to extract brand-related associative information for celebrity endorsement," European Journal of Operational Research, Elsevier, vol. 311(1), pages 316-332.
    14. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    15. Filom, Siyavash & Amiri, Amir M. & Razavi, Saiedeh, 2022. "Applications of machine learning methods in port operations – A systematic literature review," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 161(C).
    16. Bonney, Maurice & Jaber, Mohamad Y., 2013. "Developing an input–output activity matrix (IOAM) for environmental and economic analysis of manufacturing systems and logistics chains," International Journal of Production Economics, Elsevier, vol. 143(2), pages 589-597.
    17. Mareček, Jakub & Richtárik, Peter & Takáč, Martin, 2017. "Matrix completion under interval uncertainty," European Journal of Operational Research, Elsevier, vol. 256(1), pages 35-43.
    18. Asunur Cezar & Srinivasan Raghunathan & Sumit Sarkar, 2020. "Adversarial Classification: Impact of Agents’ Faking Cost on Firms and Agents," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2789-2807, December.
    19. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    20. André Marie Mbakop & Joseph Voufo & Florent Biyeme & Louise Angèle Ngozag & Lucien Meva’a, 2021. "Analysis of Information Flow Characteristics in Shop Floor: State-of-the-Art and Future Research Directions for Developing Countries," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 22(1), pages 43-53, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:197:y:2009:i:2:p:764-772. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.