IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v16y2019i7p1138-d218304.html
   My bibliography  Save this article

Leveraging Data Quality to Better Prepare for Process Mining: An Approach Illustrated Through Analysing Road Trauma Pre-Hospital Retrieval and Transport Processes in Queensland

Author

Listed:
  • Robert Andrews

    (School of Information Systems, Queensland University of Technology (QUT), Brisbane 4000, Australia)

  • Moe T. Wynn

    (School of Information Systems, Queensland University of Technology (QUT), Brisbane 4000, Australia)

  • Kirsten Vallmuur

    (Institute of Health and Biomedical Innovation and School of Public Health and Social Work, Queensland University of Technology (QUT), Brisbane 4059, Australia
    Jamieson Trauma Institute, Royal Brisbane and Women’s Hospital, Metro North Hospital and Health Service, Brisbane 4029, Australia)

  • Arthur H. M. ter Hofstede

    (School of Information Systems, Queensland University of Technology (QUT), Brisbane 4000, Australia)

  • Emma Bosley

    (Queensland Ambulance Service (QAS), Brisbane 4034, Australia)

  • Mark Elcock

    (Retrieval Services Queensland (RSQ), Brisbane 4000, Australia)

  • Stephen Rashford

    (Queensland Ambulance Service (QAS), Brisbane 4034, Australia)

Abstract

While noting the importance of data quality, existing process mining methodologies (i) do not provide details on how to assess the quality of event data (ii) do not consider how the identification of data quality issues can be exploited in the planning, data extraction and log building phases of any process mining analysis, (iii) do not highlight potential impacts of poor quality data on different types of process analyses. As our key contribution, we develop a process-centric, data quality-driven approach to preparing for a process mining analysis which can be applied to any existing process mining methodology. Our approach, adapted from elements of the well known CRISP-DM data mining methodology, includes conceptual data modeling, quality assessment at both attribute and event level, and trial discovery and conformance to develop understanding of system processes and data properties to inform data extraction. We illustrate our approach in a case study involving the Queensland Ambulance Service (QAS) and Retrieval Services Queensland (RSQ). We describe the detailed preparation for a process mining analysis of retrieval and transport processes (ground and aero-medical) for road-trauma patients in Queensland. Sample datasets obtained from QAS and RSQ are utilised to show how quality metrics, data models and exploratory process mining analyses can be used to (i) identify data quality issues, (ii) anticipate and explain certain observable features in process mining analyses, (iii) distinguish between systemic and occasional quality issues, and (iv) reason about the mechanisms by which identified quality issues may have arisen in the event log. We contend that this knowledge can be used to guide the data extraction and pre-processing stages of a process mining case study to properly align the data with the case study research questions.

Suggested Citation

  • Robert Andrews & Moe T. Wynn & Kirsten Vallmuur & Arthur H. M. ter Hofstede & Emma Bosley & Mark Elcock & Stephen Rashford, 2019. "Leveraging Data Quality to Better Prepare for Process Mining: An Approach Illustrated Through Analysing Road Trauma Pre-Hospital Retrieval and Transport Processes in Queensland," IJERPH, MDPI, vol. 16(7), pages 1-25, March.
  • Handle: RePEc:gam:jijerp:v:16:y:2019:i:7:p:1138-:d:218304
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/16/7/1138/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/16/7/1138/
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Robert Andrews & Moe T. Wynn & Kirsten Vallmuur & Arthur H. M. ter Hofstede & Emma Bosley, 2020. "A Comparative Process Mining Analysis of Road Trauma Patient Pathways," IJERPH, MDPI, vol. 17(10), pages 1-22, May.
    2. Jonghyeon Ko & Marco Comuzzi, 2023. "A Systematic Review of Anomaly Detection for Business Process Event Logs," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 65(4), pages 441-462, August.
    3. Hiroki Horita & Yuta Kurihashi & Nozomi Miyamori, 2020. "Extraction of Missing Tendency Using Decision Tree Learning in Business Process Event Log," Data, MDPI, vol. 5(3), pages 1-12, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:16:y:2019:i:7:p:1138-:d:218304. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.