Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

My bibliography Save this article

Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

Author

Listed:

Hahsler, Michael
Bolaños, Matthew
Forrest, John

Registered:

Abstract

In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.

Suggested Citation

Hahsler, Michael & Bolaños, Matthew & Forrest, John, 2017. "Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i14).

Handle: RePEc:jss:jstsof:v:076:i14
DOI: http://hdl.handle.net/10.18637/jss.v076.i14

Download full text from publisher

References listed on IDEAS

Xie, Yihui, 2013. "animation: An R Package for Creating Animations and Demonstrating Statistical Methods," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 53(i01).
Hahsler, Michael & Dunham, Margaret H., 2010. "rEMM: Extensible Markov Model for Data Stream Clustering in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 35(i05).
Hornik, Kurt, 2005. "A CLUE for CLUster Ensembles," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 14(i12).

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Krzysztof Gajowniczek & Marcin Bator & Tomasz Ząbkowski & Arkadiusz Orłowski & Chu Kiong Loo, 2020. "Simulation Study on the Electricity Data Streams Time Series Clustering," Energies, MDPI, vol. 13(4), pages 1-25, February.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Marcin Pełka, 2012. "Ensemble approach for clustering of interval-valued symbolic data," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 13(2), pages 335-342, June.
Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
Hornik, Kurt & Grün, Bettina, 2014. "movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 58(i10).
Wei, Kun & Zhang, Youxin & Luo, Yi, 2018. "Variance-mediated multifractal analysis of group participation in chasing a single dangerous prey," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 1275-1287.
Fišar, Miloš & Greiner, Ben & Huber, Christoph & Katok, Elena & Ozkes, Ali & Collaboration, Management Science Reproducibility, 2023. "Reproducibility in Management Science," OSF Preprints mydzv_v1, Center for Open Science.
Luis Lorenzo & Javier Arroyo, 2022. "Analysis of the cryptocurrency market using different prototype-based clustering techniques," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-46, December.
Varma, Jayanth R. & Virmani, Vineet, 2017. "Shiny Alternative for Finance in the Classroom," IIMA Working Papers WP 2017-03-05, Indian Institute of Management Ahmedabad, Research and Publication Department.
Boztug, Yasemin & Reutterer, Thomas, 2008. "A combined approach for segment-specific market basket analysis," European Journal of Operational Research, Elsevier, vol. 187(1), pages 294-312, May.
Hornik, Kurt & Feinerer, Ingo & Kober, Martin & Buchta, Christian, 2012. "Spherical k-Means Clustering," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 50(i10).
Fionn Murtagh, 2009. "The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 26(3), pages 249-277, December.
Pełka Marcin, 2018. "Analysis of Innovations in the European Union Via Ensemble Symbolic Density Clustering," Econometrics. Advances in Applied Data Analysis, Sciendo, vol. 22(3), pages 84-98, September.
Apostolos Bozikas & Georgios Pitselis, 2018. "An Empirical Study on Stochastic Mortality Modelling under the Age-Period-Cohort Framework: The Case of Greece with Applications to Insurance Pricing," Risks, MDPI, vol. 6(2), pages 1-34, April.
Fišar, Miloš & Greiner, Ben & Huber, Christoph & Katok, Elena & Ozkes, Ali & Management Science Reproducibility Collaboration, 2023. "Reproducibility in Management Science," Department for Strategy and Innovation Working Paper Series 03/2023, WU Vienna University of Economics and Business.
- Miloš Fišar & Ben Greiner & Christoph Huber & Elena Katok & Ali I Ozkes & The Management Science Reproducibility Collaboration, 2024. "Reproducibility in Management Science," Post-Print hal-04370984, HAL.
- Fišar, Miloš & Greiner, Ben & Huber, Christoph & Katok, Elena & Ozkes, Ali & Collaboration, Management Science Reproducibility, 2023. "Reproducibility in Management Science," OSF Preprints mydzv, Center for Open Science.
Pełka Marcin, 2019. "Analysis of Happiness in EU Countries Using the Multi-Model Classification based on Models of Symbolic Data," Econometrics. Advances in Applied Data Analysis, Sciendo, vol. 23(3), pages 15-25, September.
Axel Strauß & François Guilhaumon & Roger Daniel Randrianiaina & Katharina C Wollenberg Valero & Miguel Vences & Julian Glos, 2016. "Opposing Patterns of Seasonal Change in Functional and Phylogenetic Diversity of Tadpole Assemblages," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-18, March.
Meyer, Sebastian & Held, Leonhard & Höhle, Michael, 2017. "Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i11).
Thomas Reutterer & Kurt Hornik & Nicolas March & Kathrin Gruber, 2017. "A data mining framework for targeted category promotions," Journal of Business Economics, Springer, vol. 87(3), pages 337-358, April.
Juan José Fernández-Durán & María Mercedes Gregorio-Domínguez, 2021. "Consumer Segmentation Based on Use Patterns," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 72-88, April.
repec:jss:jstsof:25:i04 is not listed on IDEAS
repec:jss:jstsof:25:i05 is not listed on IDEAS
Pełka Marcin, 2019. "Assessment of the Development of the European Oecd Countries with the Application of Linear Ordering and Ensemble Clustering of Symbolic Data," Folia Oeconomica Stetinensia, Sciendo, vol. 19(2), pages 117-133, December.
repec:hum:wpaper:sfb649dp2006-006 is not listed on IDEAS
Linda Vidman & David Källberg & Patrik Rydén, 2019. "Cluster analysis on high dimensional RNA-seq data with applications to cancer research - An evaluation study," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-21, December.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:076:i14. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data