IDEAS home Printed from https://ideas.repec.org/a/eee/reensy/v164y2017icp55-65.html
   My bibliography  Save this article

Hard drive failure prediction using Decision Trees

Author

Listed:
  • Li, Jing
  • Stones, Rebecca J.
  • Wang, Gang
  • Liu, Xiaoguang
  • Li, Zhongwei
  • Xu, Ming

Abstract

This paper proposes two hard drive failure prediction models based on Decision Trees (DTs) and Gradient Boosted Regression Trees (GBRTs) which perform well in prediction performance as well as stability and interpretability. The models are evaluated on a real-world dataset containing 121,698 drives in total. Experimental results show the DT model predicts over 93% of failures at a false alarm rate under 0.01%, and the GBRT model can achieve about 90% failure detection rate without any false alarms. Moreover, the GBRT model evaluates drive health (or fault probability) which provides a quantitative indicator of failure urgency. This enables operators to allocate system resources accordingly for pre-warning migrations while maintaining the quality of user services.

Suggested Citation

  • Li, Jing & Stones, Rebecca J. & Wang, Gang & Liu, Xiaoguang & Li, Zhongwei & Xu, Ming, 2017. "Hard drive failure prediction using Decision Trees," Reliability Engineering and System Safety, Elsevier, vol. 164(C), pages 55-65.
  • Handle: RePEc:eee:reensy:v:164:y:2017:i:c:p:55-65
    DOI: 10.1016/j.ress.2017.03.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0951832016301569
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ress.2017.03.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ye, Zhi-Sheng & Xie, Min & Tang, Loon-Ching, 2013. "Reliability evaluation of hard disk drive failures based on counting processes," Reliability Engineering and System Safety, Elsevier, vol. 109(C), pages 110-118.
    2. Le Son, Khanh & Fouladirad, Mitra & Barros, Anne, 2016. "Remaining useful lifetime estimation and noisy gamma deterioration process," Reliability Engineering and System Safety, Elsevier, vol. 149(C), pages 76-87.
    3. Khorasgani, Hamed & Biswas, Gautam & Sankararaman, Shankar, 2016. "Methodologies for system-level remaining useful life prediction," Reliability Engineering and System Safety, Elsevier, vol. 154(C), pages 8-18.
    4. Liu, Jie & Zio, Enrico, 2017. "System dynamic reliability assessment and failure prognostics," Reliability Engineering and System Safety, Elsevier, vol. 160(C), pages 21-36.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Muhammad Zafran Muhammad Zaly Shah & Anazida Zainal & Taiseer Abdalla Elfadil Eisa & Hashim Albasheer & Fuad A. Ghaleb, 2023. "A Semisupervised Concept Drift Adaptation via Prototype-Based Manifold Regularization Approach with Knowledge Transfer," Mathematics, MDPI, vol. 11(2), pages 1-30, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Moradi, Ramin & Groth, Katrina M., 2020. "Modernizing risk assessment: A systematic integration of PRA and PHM techniques," Reliability Engineering and System Safety, Elsevier, vol. 204(C).
    2. Chen, Zhen & Li, Yaping & Xia, Tangbin & Pan, Ershun, 2019. "Hidden Markov model with auto-correlated observations for remaining useful life prediction and optimal maintenance policy," Reliability Engineering and System Safety, Elsevier, vol. 184(C), pages 123-136.
    3. Li, Rui & Verhagen, Wim J.C. & Curran, Richard, 2020. "A systematic methodology for Prognostic and Health Management system architecture definition," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
    4. Liu, Xingheng & Matias, José & Jäschke, Johannes & Vatn, Jørn, 2022. "Gibbs sampler for noisy Transformed Gamma process: Inference and remaining useful life estimation," Reliability Engineering and System Safety, Elsevier, vol. 217(C).
    5. Li, Naipeng & Gebraeel, Nagi & Lei, Yaguo & Fang, Xiaolei & Cai, Xiao & Yan, Tao, 2021. "Remaining useful life prediction based on a multi-sensor data fusion model," Reliability Engineering and System Safety, Elsevier, vol. 208(C).
    6. Lewis, Austin D. & Groth, Katrina M., 2022. "Metrics for evaluating the performance of complex engineering system health monitoring models," Reliability Engineering and System Safety, Elsevier, vol. 223(C).
    7. Belkacem, Lobna & Simeu-Abazi, Zineb & Dhouibi, Hedi & Gascard, Eric & Messaoud, Hassani, 2017. "Diagnostic and prognostic of hybrid dynamic systems: Modeling and RUL evaluation for two maintenance policies," Reliability Engineering and System Safety, Elsevier, vol. 164(C), pages 98-109.
    8. Huang, Xucong & Peng, Zhaoqin & Tang, Diyin & Chen, Juan & Zio, Enrico & Zheng, Zaiping, 2024. "A physics-informed autoencoder for system health state assessment based on energy-oriented system performance," Reliability Engineering and System Safety, Elsevier, vol. 242(C).
    9. Blancke, Olivier & Tahan, Antoine & Komljenovic, Dragan & Amyot, Normand & Lévesque, Mélanie & Hudon, Claude, 2018. "A holistic multi-failure mode prognosis approach for complex equipment," Reliability Engineering and System Safety, Elsevier, vol. 180(C), pages 136-151.
    10. Compare, Michele & Bellani, Luca & Zio, Enrico, 2017. "Reliability model of a component equipped with PHM capabilities," Reliability Engineering and System Safety, Elsevier, vol. 168(C), pages 4-11.
    11. Kim, Hyeonmin & Kim, Jung Taek & Heo, Gyunyoung, 2018. "Failure rate updates using condition-based prognostics in probabilistic safety assessments," Reliability Engineering and System Safety, Elsevier, vol. 175(C), pages 225-233.
    12. Hazra, Indranil & Pandey, Mahesh D. & Manzana, Noldainerick, 2020. "Approximate Bayesian computation (ABC) method for estimating parameters of the gamma process using noisy data," Reliability Engineering and System Safety, Elsevier, vol. 198(C).
    13. Xu, Ancha & Shen, Lijuan, 2018. "Improved on-line estimation for gamma process," Statistics & Probability Letters, Elsevier, vol. 143(C), pages 67-73.
    14. Wu, Shaomin, 2021. "Two methods to approximate the superposition of imperfect failure processes," Reliability Engineering and System Safety, Elsevier, vol. 207(C).
    15. Xiaojie Ke & Zhengguo Xu & Wenhai Wang & Youxian Sun, 2017. "Remaining useful life prediction for non-stationary degradation processes with shocks," Journal of Risk and Reliability, , vol. 231(5), pages 469-480, October.
    16. Arslan, Suayb S. & Peng, James & Goker, Turguy, 2020. "A data-assisted reliability model for carrier-assisted cold data storage systems," Reliability Engineering and System Safety, Elsevier, vol. 196(C).
    17. Slimacek, Vaclav & Lindqvist, Bo Henry, 2016. "Nonhomogeneous Poisson process with nonparametric frailty," Reliability Engineering and System Safety, Elsevier, vol. 149(C), pages 14-23.
    18. Lewis, Austin D. & Groth, Katrina M., 2023. "A comparison of DBN model performance in SIPPRA health monitoring based on different data stream discretization methods," Reliability Engineering and System Safety, Elsevier, vol. 236(C).
    19. Zhang, Nan & Fouladirad, Mitra & Barros, Anne & Zhang, Jun, 2020. "Condition-based maintenance for a K-out-of-N deteriorating system under periodic inspection with failure dependence," European Journal of Operational Research, Elsevier, vol. 287(1), pages 159-167.
    20. Tao, Xin & Mårtensson, Jonas & Warnquist, Håkan & Pernestål, Anna, 2022. "Short-term maintenance planning of autonomous trucks for minimizing economic risk," Reliability Engineering and System Safety, Elsevier, vol. 220(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:reensy:v:164:y:2017:i:c:p:55-65. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/reliability-engineering-and-system-safety .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.