Author
Listed:
- Edson Ramiro Lucas Filho
(Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, Cyprus)
- George Savva
(Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, Cyprus)
- Lun Yang
(Huawei Technologies Co., Ltd., Shenzhen 518100, China)
- Kebo Fu
(Huawei Technologies Co., Ltd., Shenzhen 518100, China)
- Jianqiang Shen
(Huawei Technologies Co., Ltd., Shenzhen 518100, China)
- Herodotos Herodotou
(Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, Cyprus)
Abstract
Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).
Suggested Citation
Edson Ramiro Lucas Filho & George Savva & Lun Yang & Kebo Fu & Jianqiang Shen & Herodotos Herodotou, 2025.
"Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems,"
Future Internet, MDPI, vol. 17(4), pages 1-37, April.
Handle:
RePEc:gam:jftint:v:17:y:2025:i:4:p:170-:d:1633050
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:170-:d:1633050. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.