IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/vyid10.1007_s10796-020-10010-x.html
   My bibliography  Save this article

An Approach to Extracting Topic-guided Views from the Sources of a Data Lake

Author

Listed:
  • Claudia Diamantini

    (DII, Polytechnic University of Marche)

  • Paolo Lo Giudice

    (DIIES, University “Mediterranea” of Reggio Calabria)

  • Domenico Potena

    (DII, Polytechnic University of Marche)

  • Emanuele Storti

    (DII, Polytechnic University of Marche)

  • Domenico Ursino

    (DII, Polytechnic University of Marche)

Abstract

In the last years, data lakes are emerging as an effective and an efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. In this scenario, one of the most challenging issues to address consists in the extraction of topic-guided (i.e., thematic) views from the (very heterogeneous and often unstructured) sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structuring” unstructured data. Finally, we define a technique to extract topic-guided views from the sources of a data lake, based on similarity and other semantic relationships among source metadata.

Suggested Citation

  • Claudia Diamantini & Paolo Lo Giudice & Domenico Potena & Emanuele Storti & Domenico Ursino, 0. "An Approach to Extracting Topic-guided Views from the Sources of a Data Lake," Information Systems Frontiers, Springer, vol. 0, pages 1-20.
  • Handle: RePEc:spr:infosf:v::y::i::d:10.1007_s10796-020-10010-x
    DOI: 10.1007/s10796-020-10010-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-020-10010-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-020-10010-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chao Chen & Mei-Ling Shyu & Shu-Ching Chen, 2016. "Weighted subspace modeling for semantic concept retrieval using gaussian mixture models," Information Systems Frontiers, Springer, vol. 18(5), pages 877-889, October.
    2. Spiros Mouzakitis & Dimitris Papaspyros & Michael Petychakis & Sotiris Koussouris & Anastasios Zafeiropoulos & Eleni Fotopoulou & Lena Farid & Fabrizio Orlandi & Judie Attard & John Psarras, 0. "Challenges and opportunities in renovating public sector information by enabling linked data and analytics," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    3. Spiros Mouzakitis & Dimitris Papaspyros & Michael Petychakis & Sotiris Koussouris & Anastasios Zafeiropoulos & Eleni Fotopoulou & Lena Farid & Fabrizio Orlandi & Judie Attard & John Psarras, 2017. "Challenges and opportunities in renovating public sector information by enabling linked data and analytics," Information Systems Frontiers, Springer, vol. 19(2), pages 321-336, April.
    4. Nicole Bidoit & Dario Colazzo & Noor Malla & Carlo Sartiani, 0. "Evaluating Queries and Updates on Big XML Documents," Information Systems Frontiers, Springer, vol. 0, pages 1-28.
    5. Nicole Bidoit & Dario Colazzo & Noor Malla & Carlo Sartiani, 2018. "Evaluating Queries and Updates on Big XML Documents," Information Systems Frontiers, Springer, vol. 20(1), pages 63-90, February.
    6. Amanda Spink & Dietmar Wolfram & Major B. J. Jansen & Tefko Saracevic, 2001. "Searching the web: The public and their queries," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 52(3), pages 226-234.
    7. Alain Mouttham & Craig Kuziemsky & Dishant Langayan & Liam Peyton & Jose Pereira, 2012. "Interoperable support for collaborative, mobile, and accessible health care," Information Systems Frontiers, Springer, vol. 14(1), pages 73-85, March.
    8. Lisette García-Moya & Shahad Kudama & María José Aramburu & Rafael Berlanga, 2013. "Storing and analysing voice of the market data in the corporate data warehouse," Information Systems Frontiers, Springer, vol. 15(3), pages 331-349, July.
    9. Naeem Khalid Janjua & Farookh Khadeer Hussain & Omar Khadeer Hussain, 2013. "Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making," Information Systems Frontiers, Springer, vol. 15(2), pages 167-192, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Claudia Diamantini & Paolo Lo Giudice & Domenico Potena & Emanuele Storti & Domenico Ursino, 2021. "An Approach to Extracting Topic-guided Views from the Sources of a Data Lake," Information Systems Frontiers, Springer, vol. 23(1), pages 243-262, February.
    2. Abdallah Khelil & Amin Mesmoudi & Jorge Galicia & Ladjel Bellatreche & Mohand-Saïd Hacid & Emmanuel Coquery, 2021. "Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing," Information Systems Frontiers, Springer, vol. 23(1), pages 165-183, February.
    3. Marijn Janssen & David Konopnicki & Jane L. Snowdon & Adegboyega Ojo, 2017. "Driving public sector innovation using big and open linked data (BOLD)," Information Systems Frontiers, Springer, vol. 19(2), pages 189-195, April.
    4. Marijn Janssen & David Konopnicki & Jane L. Snowdon & Adegboyega Ojo, 0. "Driving public sector innovation using big and open linked data (BOLD)," Information Systems Frontiers, Springer, vol. 0, pages 1-7.
    5. Ladjel Bellatreche & Patrick Valduriez & Tadeusz Morzy, 2018. "Advances in Databases and Information Systems," Information Systems Frontiers, Springer, vol. 20(1), pages 1-6, February.
    6. Huseyin C. Ozmutlu, 2009. "Markovian analysis for automatic new topic identification in search engine transaction logs," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 25(6), pages 737-768, November.
    7. Thouraya Bouabana-Tebibel & Stuart H. Rubin, 2016. "Towards common reusable semantics," Information Systems Frontiers, Springer, vol. 18(5), pages 819-823, October.
    8. Chang-Gyu Yang & Hee-Jun Lee, 2016. "A study on the antecedents of healthcare information protection intention," Information Systems Frontiers, Springer, vol. 18(2), pages 253-263, April.
    9. Richard K. Lomotey & Ralph Deters, 2018. "Middleware for mobile medical data management with minimal latency," Information Systems Frontiers, Springer, vol. 20(6), pages 1281-1296, December.
    10. Sinziana Spiridon, 2010. "Patterns In Query Reformulation In Online Searching Behavior," Analele Stiintifice ale Universitatii "Alexandru Ioan Cuza" din Iasi - Stiinte Economice (1954-2015), Alexandru Ioan Cuza University, Faculty of Economics and Business Administration, vol. 2010, pages 407-416, july.
    11. Veda C. Storey & Andrew Burton-Jones & Vijayan Sugumaran & Sandeep Purao, 2008. "CONQUER: A Methodology for Context-Aware Query Processing on the World Wide Web," Information Systems Research, INFORMS, vol. 19(1), pages 3-25, March.
    12. Richard K. Lomotey & Ralph Deters, 0. "Middleware for mobile medical data management with minimal latency," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    13. Rajesh Chandwani & Rahul De’, 2017. "Doctor-patient interaction in telemedicine: Logic of choice and logic of care perspectives," Information Systems Frontiers, Springer, vol. 19(4), pages 955-968, August.
    14. Cédric Argenton & Jens Prüfer, 2012. "Search Engine Competition With Network Externalities," Journal of Competition Law and Economics, Oxford University Press, vol. 8(1), pages 73-105.
    15. Jens Weber-Jahnke & Liam Peyton & Thodoros Topaloglou, 2012. "eHealth system interoperability," Information Systems Frontiers, Springer, vol. 14(1), pages 1-3, March.
    16. Eric Golinko & Xingquan Zhu, 2019. "Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks," Information Systems Frontiers, Springer, vol. 21(1), pages 125-142, February.
    17. Rajesh Chandwani & Rahul De’, 0. "Doctor-patient interaction in telemedicine: Logic of choice and logic of care perspectives," Information Systems Frontiers, Springer, vol. 0, pages 1-14.
    18. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2017. "A multidisciplinary perspective of big data in management research," International Journal of Production Economics, Elsevier, vol. 191(C), pages 97-112.
    19. Youngseok Choi & Habin Lee, 0. "Data properties and the performance of sentiment classification for electronic commerce applications," Information Systems Frontiers, Springer, vol. 0, pages 1-20.
    20. Kevin Wong & Geoff Walton & Gavin Bailey, 2021. "Using information science to enhance educational preventing violent extremism programs," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(3), pages 362-376, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v::y::i::d:10.1007_s10796-020-10010-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.