IDEAS home Printed from https://ideas.repec.org/a/igg/jdwm00/v7y2011i4p21-42.html
   My bibliography  Save this article

HYBRIDJOIN for Near-Real-Time Data Warehousing

Author

Listed:
  • M. Asif Naeem

    (The University of Auckland, New Zealand)

  • Gillian Dobbie

    (The University of Auckland, New Zealand)

  • Gerald Weber

    (The University of Auckland, New Zealand)

Abstract

An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.

Suggested Citation

  • M. Asif Naeem & Gillian Dobbie & Gerald Weber, 2011. "HYBRIDJOIN for Near-Real-Time Data Warehousing," International Journal of Data Warehousing and Mining (IJDWM), IGI Global, vol. 7(4), pages 21-42, October.
  • Handle: RePEc:igg:jdwm00:v:7:y:2011:i:4:p:21-42
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/jdwm.2011100102
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Rashed Salem & Omar Boussaïd & Jérôme Darmont, 2013. "Active XML-based Web data integration," Information Systems Frontiers, Springer, vol. 15(3), pages 371-398, July.
    2. M. Asif Naeem, 2019. "Optimization and Extension of Stream-Relation Joins," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(04), pages 1289-1315, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jdwm00:v:7:y:2011:i:4:p:21-42. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.