IDEAS home Printed from https://ideas.repec.org/a/eee/reensy/v193y2020ics0951832019303503.html
   My bibliography  Save this article

Cost minimization of real-time mission for software systems with rejuvenation

Author

Listed:
  • Levitin, Gregory
  • Xing, Liudong
  • Xiang, Yanping

Abstract

This paper considers the optimal rejuvenation policy problem for software systems performing real-time tasks. Due to software aging, the system performance deteriorates with time eventually leading to the system crash, which can be catastrophic for critical applications. To prevent the system crash or minimize its occurrence probability, software rejuvenation has been widely adopted for counteracting the software aging effect but at the cost of extra system overhead and downtime. We derive the optimal state-based rejuvenation policy minimizing the total expected mission cost for software aging systems subject to multiple performance degradation states. The solution encompasses an event transition-based numerical method that assesses the total expected mission cost of a real-time task, covering penalty cost from the mission failure, operation cost of running the mission software and the expected rejuvenation cost. The proposed cost evaluation model can accommodate arbitrary types of state transition time distributions. The suggested model also allows simultaneous evaluations of other system performance metrics including the probability of the successful task completion (i.e., mission reliability) and conditional expected mission completion time given a successful mission task. Examples are presented to demonstrate the proposed methodology and optimizations. Effects of different parameters on the rejuvenation optimization solution are also investigated.

Suggested Citation

  • Levitin, Gregory & Xing, Liudong & Xiang, Yanping, 2020. "Cost minimization of real-time mission for software systems with rejuvenation," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
  • Handle: RePEc:eee:reensy:v:193:y:2020:i:c:s0951832019303503
    DOI: 10.1016/j.ress.2019.106593
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0951832019303503
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ress.2019.106593?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Machida, Fumio & Miyoshi, Naoto, 2017. "Analysis of an optimal stopping problem for software rejuvenation in a deteriorating job processing system," Reliability Engineering and System Safety, Elsevier, vol. 168(C), pages 128-135.
    2. Dohi, Tadashi & Zheng, Junjun & Okamura, Hiroyuki & Trivedi, Kishor S., 2018. "Optimal periodic software rejuvenation policies based on interval reliability criteria," Reliability Engineering and System Safety, Elsevier, vol. 180(C), pages 463-475.
    3. Levitin, Gregory & Xing, Liudong & Huang, Hong-Zhong, 2019. "Optimization of partial software rejuvenation policy," Reliability Engineering and System Safety, Elsevier, vol. 188(C), pages 289-296.
    4. Levitin, Gregory & Xing, Liudong & Ben-Haim, Hanoch, 2018. "Optimizing software rejuvenation policy for real time tasks," Reliability Engineering and System Safety, Elsevier, vol. 176(C), pages 202-208.
    5. Levitin, Gregory & Xing, Liudong & Luo, Liang, 2019. "Joint optimal checkpointing and rejuvenation policy for real-time computing tasks," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 63-72.
    6. Levitin, Gregory & Xing, Liudong & Dai, Yuanshun, 2014. "Cold vs. hot standby mission operation cost minimization for 1-out-of-N systems," European Journal of Operational Research, Elsevier, vol. 234(1), pages 155-162.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Khastgir, Siddartha & Brewerton, Simon & Thomas, John & Jennings, Paul, 2021. "Systems Approach to Creating Test Scenarios for Automated Driving Systems," Reliability Engineering and System Safety, Elsevier, vol. 215(C).
    2. Levitin, Gregory & Xing, Liudong & Xiang, Yanping, 2020. "Optimizing software rejuvenation policy for tasks with periodic inspections and time limitation," Reliability Engineering and System Safety, Elsevier, vol. 197(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Levitin, Gregory & Xing, Liudong & Xiang, Yanping, 2020. "Optimizing software rejuvenation policy for tasks with periodic inspections and time limitation," Reliability Engineering and System Safety, Elsevier, vol. 197(C).
    2. Levitin, Gregory & Xing, Liudong & Huang, Hong-Zhong, 2019. "Optimization of partial software rejuvenation policy," Reliability Engineering and System Safety, Elsevier, vol. 188(C), pages 289-296.
    3. Junjun Zheng & Hiroyuki Okamura & Tadashi Dohi, 2021. "Availability Analysis of Software Systems with Rejuvenation and Checkpointing," Mathematics, MDPI, vol. 9(8), pages 1-15, April.
    4. Wen, Tao & Deng, Yong, 2020. "The vulnerability of communities in complex networks: An entropy approach," Reliability Engineering and System Safety, Elsevier, vol. 196(C).
    5. Nan Zhang & Sen Tian & Le Li & Zhongbin Wang & Jun Zhang, 2023. "Maintenance analysis of a partial observable K-out-of-N system with load sharing units," Journal of Risk and Reliability, , vol. 237(4), pages 703-713, August.
    6. Amirhossain Chambari & Javad Sadeghi & Fakhri Bakhtiari & Reza Jahangard, 2016. "A note on a reliability redundancy allocation problem using a tuned parameter genetic algorithm," OPSEARCH, Springer;Operational Research Society of India, vol. 53(2), pages 426-442, June.
    7. Chatwattanasiri, Nida & Coit, David W. & Wattanapongsakorn, Naruemon, 2016. "System redundancy optimization with uncertain stress-based component reliability: Minimization of regret," Reliability Engineering and System Safety, Elsevier, vol. 154(C), pages 73-83.
    8. Levitin, Gregory & Finkelstein, Maxim & Dai, Yuanshun, 2018. "Optimizing availability of heterogeneous standby systems exposed to shocks," Reliability Engineering and System Safety, Elsevier, vol. 170(C), pages 137-145.
    9. Wu, Shaomin & Do, Phuc, 2017. "Editorial," Reliability Engineering and System Safety, Elsevier, vol. 168(C), pages 1-3.
    10. Kim, Heungseob, 2018. "Maximization of system reliability with the consideration of component sequencing," Reliability Engineering and System Safety, Elsevier, vol. 170(C), pages 64-72.
    11. Kim, Heungseob & Kim, Pansoo, 2017. "Reliability models for a nonrepairable system with heterogeneous components having a phase-type time-to-failure distribution," Reliability Engineering and System Safety, Elsevier, vol. 159(C), pages 37-46.
    12. Chen, Ying & Wang, Ze & Li, YingYi & Kang, Rui & Mosleh, Ali, 2018. "Reliability analysis of a cold-standby system considering the development stages and accumulations of failure mechanisms," Reliability Engineering and System Safety, Elsevier, vol. 180(C), pages 1-12.
    13. Zhang, Nan & Fouladirad, Mitra & Barros, Anne, 2019. "Reliability-based measures and prognostic analysis of a K-out-of-N system in a random environment," European Journal of Operational Research, Elsevier, vol. 272(3), pages 1120-1131.
    14. Caserta, Marco & Voß, Stefan, 2015. "An exact algorithm for the reliability redundancy allocation problem," European Journal of Operational Research, Elsevier, vol. 244(1), pages 110-116.
    15. Levitin, Gregory & Xing, Liudong & Luo, Liang, 2019. "Joint optimal checkpointing and rejuvenation policy for real-time computing tasks," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 63-72.
    16. Ruiz-Castro, Juan Eloy & Dawabsha, Mohammed & Alonso, Francisco Javier, 2018. "Discrete-time Markovian arrival processes to model multi-state complex systems with loss of units and an indeterminate variable number of repairpersons," Reliability Engineering and System Safety, Elsevier, vol. 174(C), pages 114-127.
    17. Heping Jia & Rui Peng & Yi Ding & Yonghua Song, 2019. "Reliability of demand-based warm standby system with common bus performance sharing," Journal of Risk and Reliability, , vol. 233(4), pages 580-592, August.
    18. Coit, David W. & Zio, Enrico, 2019. "The evolution of system reliability optimization," Reliability Engineering and System Safety, Elsevier, vol. 192(C).
    19. Tadashi Dohi & Hiroyuki Okamura & Cun-Hua Qian, 2022. "Computation algorithms for workload-dependent optimal checkpoint placement," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(2), pages 788-796, June.
    20. Ruiz-Castro, Juan Eloy, 2016. "Complex multi-state systems modelled through marked Markovian arrival processes," European Journal of Operational Research, Elsevier, vol. 252(3), pages 852-865.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:reensy:v:193:y:2020:i:c:s0951832019303503. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/reliability-engineering-and-system-safety .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.