IDEAS home Printed from https://ideas.repec.org/a/eee/reensy/v249y2024ics0951832024003028.html
   My bibliography  Save this article

Cost optimization and reliability analysis of fault tolerant system with service interruption and reboot

Author

Listed:
  • Jain, Madhu
  • Kumar, Pankaj
  • Singh, Mayank
  • Gupta, Ritu

Abstract

Due to widespread usage in many real time systems, reliability modeling and cost optimization of fault tolerance system have drawn attention of the practitioners. The fault tolerance in these systems can be provided by the support of maintenance and redundant components that help in smooth operation of the system in spite of failure of some active components. This investigation deals with the performance modeling of a fault-tolerant system consisting of a finite number of active (online) and standby components. During the switching from active to standby, the recovery procedure is performed, which may be imperfect. In case of imperfect recovery, the system reboot takes place. The maintenance of all the components is managed by a repairman (server) which is subject to failure. When the server is interrupted for rendering the service, functioning does not get stopped due to the system switch-over from perfect working to working breakdown mode. The system works even when the server is on working vacation and performs repair jobs of the failed components. The machine repair model based on Markovian process is developed to derive the transient probabilities and other performance indices of the fault tolerant system using Laplace transforms and matrix analytical method. Using the direct search strategy and particle swarm optimization, the cost-benefit analysis is done. The optimal design of the control parameters for the fault-tolerant system are presented by framing a cost-effective ratio function. The model is examined computationally by performing the numerical simulation and cost optimization.

Suggested Citation

  • Jain, Madhu & Kumar, Pankaj & Singh, Mayank & Gupta, Ritu, 2024. "Cost optimization and reliability analysis of fault tolerant system with service interruption and reboot," Reliability Engineering and System Safety, Elsevier, vol. 249(C).
  • Handle: RePEc:eee:reensy:v:249:y:2024:i:c:s0951832024003028
    DOI: 10.1016/j.ress.2024.110229
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0951832024003028
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ress.2024.110229?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yuan, Zixia & Xiong, Guojiang & Fu, Xiaofan & Mohamed, Ali Wagdy, 2023. "Improving fault tolerance in diagnosing power system failures with optimal hierarchical extreme learning machine," Reliability Engineering and System Safety, Elsevier, vol. 236(C).
    2. Ke, Jau-Chuan & Liu, Tzu-Hsin & Yang, Dong-Yuh, 2018. "Modeling of machine interference problem with unreliable repairman and standbys imperfect switchover," Reliability Engineering and System Safety, Elsevier, vol. 174(C), pages 12-18.
    3. Yen, Tseng-Chang & Wang, Kuo-Hsiung, 2020. "Cost benefit analysis of four retrial systems with warm standby units and imperfect coverage," Reliability Engineering and System Safety, Elsevier, vol. 202(C).
    4. Huang, Shuang & Zhou, Chunjie & Yang, Lili & Qin, Yuanqing & Huang, Xiongfeng & Hu, Bowen, 2016. "Transient fault tolerant control for vehicle brake-by-wire systems," Reliability Engineering and System Safety, Elsevier, vol. 149(C), pages 148-163.
    5. Shekhar, Chandra & Kumar, Amit & Varshney, Shreekant, 2020. "Load sharing redundant repairable systems with switching and reboot delay," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
    6. Liang, Qingzhu & Yang, Yinghao & Zhang, Hang & Peng, Changhong & Lu, Jianchao, 2022. "Analysis of simplification in Markov state-based models for reliability assessment of complex safety systems," Reliability Engineering and System Safety, Elsevier, vol. 221(C).
    7. Shekhar, Chandra & Kumar, Neeraj & Gupta, Amit & Kumar, Amit & Varshney, Shreekant, 2020. "Warm-spare provisioning computing network with switching failure, common cause failure, vacation interruption, and synchronized reneging," Reliability Engineering and System Safety, Elsevier, vol. 199(C).
    8. Kumar, Pankaj & Jain, Madhu, 2020. "Reliability analysis of a multi-component machining system with service interruption, imperfect coverage, and reboot," Reliability Engineering and System Safety, Elsevier, vol. 202(C).
    9. Gao, Shan & Wang, Jinting, 2021. "Reliability and availability analysis of a retrial system with mixed standbys and an unreliable repair facility," Reliability Engineering and System Safety, Elsevier, vol. 205(C).
    10. Juybari, Mohammad N. & Hamadani, Ali Zeinal & Ardakan, Mostafa Abouei, 2023. "Availability analysis and cost optimization of a repairable system with a mix of active and warm-standby components in a shock environment," Reliability Engineering and System Safety, Elsevier, vol. 237(C).
    11. Gao, Shan & Wang, Jinting & Zhang, Jie, 2023. "Reliability analysis of a redundant series system with common cause failures and delayed vacation," Reliability Engineering and System Safety, Elsevier, vol. 239(C).
    12. Liu, Baoliang & Cui, Lirong & Wen, Yanqing & Shen, Jingyuan, 2015. "A cold standby repairable system with working vacations and vacation interruption following Markovian arrival process," Reliability Engineering and System Safety, Elsevier, vol. 142(C), pages 1-8.
    13. Jia Kang & Linmin Hu & Rui Peng & Yan Li & Ruiling Tian, 2023. "Availability and cost-benefit evaluation for a repairable retrial system with warm standbys and priority," Statistical Theory and Related Fields, Taylor & Francis Journals, vol. 7(2), pages 164-175, April.
    14. Wang, Guanjun & Peng, Rui & Xing, Liudong, 2018. "Reliability evaluation of unrepairable k-out-of-n: G systems with phased-mission requirements based on record values," Reliability Engineering and System Safety, Elsevier, vol. 178(C), pages 191-197.
    15. Wang, Yan & Hu, Linmin & Yang, Li & Li, Jing, 2022. "Reliability modeling and analysis for linear consecutive-k-out-of-n: F retrial systems with two maintenance activities," Reliability Engineering and System Safety, Elsevier, vol. 226(C).
    16. Ritu Gupta & Divya Agarwal, 2021. "Cost analysis of N-policy vacation machine repair problem with optional repair," International Journal of Mathematics in Operational Research, Inderscience Enterprises Ltd, vol. 19(3), pages 354-374.
    17. Wu, Chia-Huang & Yen, Tseng-Chang & Wang, Kuo-Hsiung, 2021. "Availability and Comparison of Four Retrial Systems with Imperfect Coverage and General Repair Times," Reliability Engineering and System Safety, Elsevier, vol. 212(C).
    18. Hsu, Ying-Lin & Lee, Ssu-Lang & Ke, Jau-Chuan, 2009. "A repairable system with imperfect coverage and reboot: Bayesian and asymptotic estimation," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 79(7), pages 2227-2239.
    19. Yang, Dong-Yuh & Wu, Chia-Huang, 2021. "Evaluation of the availability and reliability of a standby repairable system incorporating imperfect switchovers and working breakdowns," Reliability Engineering and System Safety, Elsevier, vol. 207(C).
    20. Hsu, Ying-Lin & Ke, Jau-Chuan & Liu, Tzu-Hsin, 2011. "Standby system with general repair, reboot delay, switching failure and unreliable repair facility—A statistical standpoint," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 81(11), pages 2400-2413.
    21. Yang, Dong-Yuh & Tsao, Chih-Lung, 2019. "Reliability and availability analysis of standby systems with working vacations and retrial of failed components," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 46-55.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gao, Shan & Wang, Jinting & Zhang, Jie, 2023. "Reliability analysis of a redundant series system with common cause failures and delayed vacation," Reliability Engineering and System Safety, Elsevier, vol. 239(C).
    2. Yang, Dong-Yuh & Wu, Chia-Huang, 2021. "Evaluation of the availability and reliability of a standby repairable system incorporating imperfect switchovers and working breakdowns," Reliability Engineering and System Safety, Elsevier, vol. 207(C).
    3. Wang, Yan & Hu, Linmin & Yang, Li & Li, Jing, 2022. "Reliability modeling and analysis for linear consecutive-k-out-of-n: F retrial systems with two maintenance activities," Reliability Engineering and System Safety, Elsevier, vol. 226(C).
    4. Wu, Chia-Huang & Yen, Tseng-Chang & Wang, Kuo-Hsiung, 2021. "Availability and Comparison of Four Retrial Systems with Imperfect Coverage and General Repair Times," Reliability Engineering and System Safety, Elsevier, vol. 212(C).
    5. Wang, Kuo-Hsiung & Wu, Chia-Huang & Yen, Tseng-Chang, 2022. "Comparative cost-benefit analysis of four retrial systems with preventive maintenance and unreliable service station," Reliability Engineering and System Safety, Elsevier, vol. 221(C).
    6. Gao, Shan, 2023. "Reliability analysis and optimization for a redundant system with dependent failures and variable repair rates," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 208(C), pages 637-659.
    7. Li, Mingjia & Hu, Linmin & Peng, Rui & Bai, Zhuoxin, 2021. "Reliability modeling for repairable circular consecutive-k-out-of-n: F systems with retrial feature," Reliability Engineering and System Safety, Elsevier, vol. 216(C).
    8. Zhou, Siwei & Ye, Luyao & Xiong, Shengwu & Xiang, Jianwen, 2022. "Reliability analysis of dynamic fault trees with Priority-AND gates based on irrelevance coverage model," Reliability Engineering and System Safety, Elsevier, vol. 224(C).
    9. Shekhar, Chandra & Kumar, Neeraj & Gupta, Amit & Kumar, Amit & Varshney, Shreekant, 2020. "Warm-spare provisioning computing network with switching failure, common cause failure, vacation interruption, and synchronized reneging," Reliability Engineering and System Safety, Elsevier, vol. 199(C).
    10. Yu, Xiaoyun & Hu, Linmin & Ma, Mengrao, 2023. "Reliability measures of discrete time k-out-of-n: G retrial systems based on Bernoulli shocks," Reliability Engineering and System Safety, Elsevier, vol. 239(C).
    11. Cheng, Dawei & Lu, Zhong & Zhou, Jia & Liang, Xihui, 2023. "An optimizing maintenance policy for airborne redundant systems operating with faults by using Markov process and NSGA-II," Reliability Engineering and System Safety, Elsevier, vol. 236(C).
    12. Yang, Dong-Yuh & Tsao, Chih-Lung, 2019. "Reliability and availability analysis of standby systems with working vacations and retrial of failed components," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 46-55.
    13. Zhang, Changzhen & Yang, Jun & Li, Mingjia & Wang, Ning, 2024. "Reliability analysis of a two-dimensional linear consecutive-(r,s)-out-of-(m,n): F repairable system," Reliability Engineering and System Safety, Elsevier, vol. 242(C).
    14. Gao, Shan & Wang, Jinting, 2021. "Reliability and availability analysis of a retrial system with mixed standbys and an unreliable repair facility," Reliability Engineering and System Safety, Elsevier, vol. 205(C).
    15. de Araujo, Matheus Soares & da Silva, Leandro Dias & Sobrinho, Ã lvaro & Cunha, Paulo & Montecchi, Leonardo, 2022. "Reliability analysis of multi-parameter monitoring systems for Intensive Care Units," Reliability Engineering and System Safety, Elsevier, vol. 226(C).
    16. Kumar, Pankaj & Jain, Madhu, 2020. "Reliability analysis of a multi-component machining system with service interruption, imperfect coverage, and reboot," Reliability Engineering and System Safety, Elsevier, vol. 202(C).
    17. Rohit Patawa & Pramendra Singh Pundir & Alok Kumar Sigh & Abhinav Singh, 2022. "Some inferences on reliability measures of two-non-identical units cold standby system waiting for repair," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(1), pages 172-188, February.
    18. Finkelstein, Maxim & Cha, Ji Hwan & Langston, Amy, 2022. "Optimal preventive switching of components in degrading systems," Reliability Engineering and System Safety, Elsevier, vol. 219(C).
    19. Li, Yan & Zhang, Wei & Liu, Baoliang & Wang, Xiaofeng, 2024. "Availability and maintenance strategy under time-varying environments for redundant repairable systems with PH distributions," Reliability Engineering and System Safety, Elsevier, vol. 246(C).
    20. Quintanilha, Igor M. & Elias, Vitor R.M. & da Silva, Felipe B. & Fonini, Pedro A.M. & da Silva, Eduardo A.B. & Netto, Sergio L. & Apolinário, José A. & de Campos, Marcello L.R. & Martins, Wallace A., 2021. "A fault detector/classifier for closed-ring power generators using machine learning," Reliability Engineering and System Safety, Elsevier, vol. 212(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:reensy:v:249:y:2024:i:c:s0951832024003028. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/reliability-engineering-and-system-safety .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.