IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v9y2024i2p24-d1327257.html
   My bibliography  Save this article

Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems

Author

Listed:
  • Henrik tom Wörden

    (Indiscale GmbH, 37083 Göttingen, Germany)

  • Florian Spreckelsen

    (Indiscale GmbH, 37083 Göttingen, Germany)

  • Stefan Luther

    (Max Planck Institute for Dynamics and Self-Organization, 37077 Göttingen, Germany
    Institute for the Dynamics of Complex Systems, Georg-August-Universität, 37077 Göttingen, Germany
    German Center for Cardiovascular Research (DZHK), Partner Site Göttingen, 37075 Göttingen, Germany
    Institute of Pharmacology and Toxicology, University Medical Center Göttingen, 37075 Göttingen, Germany)

  • Ulrich Parlitz

    (Max Planck Institute for Dynamics and Self-Organization, 37077 Göttingen, Germany
    Institute for the Dynamics of Complex Systems, Georg-August-Universität, 37077 Göttingen, Germany
    German Center for Cardiovascular Research (DZHK), Partner Site Göttingen, 37075 Göttingen, Germany)

  • Alexander Schlemmer

    (Max Planck Institute for Dynamics and Self-Organization, 37077 Göttingen, Germany
    German Center for Cardiovascular Research (DZHK), Partner Site Göttingen, 37075 Göttingen, Germany)

Abstract

Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”).

Suggested Citation

  • Henrik tom Wörden & Florian Spreckelsen & Stefan Luther & Ulrich Parlitz & Alexander Schlemmer, 2024. "Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems," Data, MDPI, vol. 9(2), pages 1-15, January.
  • Handle: RePEc:gam:jdataj:v:9:y:2024:i:2:p:24-:d:1327257
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/9/2/24/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/9/2/24/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Florian Spreckelsen & Baltasar Rüchardt & Jan Lebert & Stefan Luther & Ulrich Parlitz & Alexander Schlemmer, 2020. "Guidelines for a Standardized Filesystem Layout for Scientific Data," Data, MDPI, vol. 5(2), pages 1-13, April.
    2. Panos Vassiliadis, 2009. "A Survey of Extract–Transform–Load Technology," International Journal of Data Warehousing and Mining (IJDWM), IGI Global, vol. 5(3), pages 1-27, July.
    3. Koenraad De Smedt & Dimitris Koureas & Peter Wittenburg, 2020. "FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units," Publications, MDPI, vol. 8(2), pages 1-17, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhan, Yuanzhu & Tan, Kim Hua, 2020. "An analytic infrastructure for harvesting big data to enhance supply chain performance," European Journal of Operational Research, Elsevier, vol. 281(3), pages 559-574.
    2. Johannes Schneider & Stefan Seidel & Marcus Basalla & Jan Brocke, 2023. "Reuse, Reduce, Support: Design Principles for Green Data Mining," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 65(1), pages 65-83, February.
    3. Lawson, James G. & Street, Daniel A., 2021. "Detecting dirty data using SQL: Rigorous house insurance case," Journal of Accounting Education, Elsevier, vol. 55(C).
    4. David Gil & Magnus Johnsson & Higinio Mora & Julian Szymanski, 2019. "Advances in Architectures, Big Data, and Machine Learning Techniques for Complex Internet of Things Systems," Complexity, Hindawi, vol. 2019, pages 1-3, March.
    5. Benedict Bender & Clementine Bertheau & Tim Körppen & Hannah Lauppe & Norbert Gronau, 2022. "A proposal for future data organization in enterprise systems—an analysis of established database approaches," Information Systems and e-Business Management, Springer, vol. 20(3), pages 441-494, September.
    6. Jan Schweikert & Karl-Uwe Stucky & Wolfgang Süß & Veit Hagenmeyer, 2023. "A Photovoltaic System Model Integrating FAIR Digital Objects and Ontologies," Energies, MDPI, vol. 16(3), pages 1-21, February.
    7. Runsheng Miao & Yuchen Huang & Zhenyu Zhang, 2023. "A Social Media Knowledge Retrieval Method Based on Knowledge Demands and Knowledge Supplies," Mathematics, MDPI, vol. 11(14), pages 1-27, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:9:y:2024:i:2:p:24-:d:1327257. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.