IDEAS home Printed from https://ideas.repec.org/a/sae/intdis/v14y2018i4p1550147718773999.html
   My bibliography  Save this article

Variable size sampling to support high uniformity confidence in sensor data streams

Author

Listed:
  • Hajin Kim
  • Myeong-Seon Gil
  • Yang-Sae Moon
  • Mi-Jung Choi

Abstract

In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance , and the continuous degradation by the sampling range increase . For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample , which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence.

Suggested Citation

  • Hajin Kim & Myeong-Seon Gil & Yang-Sae Moon & Mi-Jung Choi, 2018. "Variable size sampling to support high uniformity confidence in sensor data streams," International Journal of Distributed Sensor Networks, , vol. 14(4), pages 15501477187, April.
  • Handle: RePEc:sae:intdis:v:14:y:2018:i:4:p:1550147718773999
    DOI: 10.1177/1550147718773999
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/1550147718773999
    Download Restriction: no

    File URL: https://libkey.io/10.1177/1550147718773999?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. A. I. McLeod & D. R. Bellhouse, 1983. "A Convenient Algorithm for Drawing a Simple Random Sample," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 32(2), pages 182-184, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Youngkuk Kim & Siwoon Son & Yang-Sae Moon, 2019. "SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm," International Journal of Distributed Sensor Networks, , vol. 15(7), pages 15501477198, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ley, Eduardo & Steel, Mark F. J., 2007. "On the effect of prior assumptions in Bayesian model averaging with applications to growth regression," Policy Research Working Paper Series 4238, The World Bank.
    2. J. N. K. Rao, 2021. "On Making Valid Inferences by Integrating Data from Surveys and Other Sources," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 242-272, May.
    3. Eduardo Ley & Mark F.J. Steel, 2009. "On the effect of prior assumptions in Bayesian model averaging with applications to growth regression This article was published online on 30 March 2009. An error was subsequently identified. This not," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 24(4), pages 651-674.
    4. Park, Byung-Hoon & Ostrouchov, George & Samatova, Nagiza F., 2007. "Sampling streaming data with replacement," Computational Statistics & Data Analysis, Elsevier, vol. 52(2), pages 750-762, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:intdis:v:14:y:2018:i:4:p:1550147718773999. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.