IDEAS home Printed from https://ideas.repec.org/a/sae/intdis/v14y2018i11p1550147718812698.html
   My bibliography  Save this article

Storm-based distributed sampling system for multi-source stream environment

Author

Listed:
  • Wonhyeong Cho
  • Myeong-Seon Gil
  • Mi-Jung Choi
  • Yang-Sae Moon

Abstract

As a large amount of data streams occur rapidly in many recent applications such as social network service, Internet of Things, and smart factory, sampling techniques have attracted many attentions to handle such data streams efficiently. In this article, we address the performance improvement of binary Bernoulli sampling in the multi-source stream environment. Binary Bernoulli sampling has the n :1 structure where n sites transmit data to 1 coordinator. However, as the number of sites increases or the input stream explosively increases, the binary Bernoulli sampling may cause a severe bottleneck in the coordinator. In addition, bidirectional communication over different networks among the coordinator and sites may incur excessive communication overhead. In this article, we propose a novel distributed processing model of binary Bernoulli sampling to solve these coordinator bottleneck and communication overhead problems. We first present a multiple-coordinator structure to solve the coordinator bottleneck. We then present a new sampling model with an integrated framework and shared memory to alleviate the communication overhead. To verify the effectiveness and scalability of the proposed model, we perform its actual implementation in Apache Storm, a real-time distributed stream processing system. Experimental results show that our Storm-based binary Bernoulli sampling improves performance by up to 1.8 times compared with the legacy method and maintains high performance even when the input stream largely increases. These results indicate that the proposed distributed processing model is an excellent approach that solves the performance degradation problem of binary Bernoulli sampling and verifies its superiority through the actual implementation on Apache Storm.

Suggested Citation

  • Wonhyeong Cho & Myeong-Seon Gil & Mi-Jung Choi & Yang-Sae Moon, 2018. "Storm-based distributed sampling system for multi-source stream environment," International Journal of Distributed Sensor Networks, , vol. 14(11), pages 15501477188, November.
  • Handle: RePEc:sae:intdis:v:14:y:2018:i:11:p:1550147718812698
    DOI: 10.1177/1550147718812698
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/1550147718812698
    Download Restriction: no

    File URL: https://libkey.io/10.1177/1550147718812698?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Youngkuk Kim & Siwoon Son & Yang-Sae Moon, 2019. "SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm," International Journal of Distributed Sensor Networks, , vol. 15(7), pages 15501477198, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:intdis:v:14:y:2018:i:11:p:1550147718812698. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.