IDEAS home Printed from https://ideas.repec.org/a/eee/reensy/v197y2020ics0951832019309457.html
   My bibliography  Save this article

Optimizing software rejuvenation policy for tasks with periodic inspections and time limitation

Author

Listed:
  • Levitin, Gregory
  • Xing, Liudong
  • Xiang, Yanping

Abstract

Software aging has been observed in diverse types of software systems, causing gradual performance degradation with time and/or load and eventually system failures. To mitigate the aging effects and prevent serious losses caused by the system failure, software rejuvenations can be proactively performed to restore the system performance. This paper models and optimizes a state-based rejuvenation policy for software systems performing real-time computing tasks and undergoing periodic inspections. During each scheduled inspection, the system state is evaluated and the decision about the rejuvenation is made based on the evaluated system state and a rejuvenation decision function. The time of each rejuvenation procedure (corresponding to the system downtime) depends on the system state as well as on the amount of task operations accomplished before deciding to perform the rejuvenation. As the rejuvenation policy determines the time and number of rejuvenations performed during the task processing, it can affect the probability that the system can accomplish the real-time task by a certain deadline significantly. In this work, we optimize the state-based rejuvenation policy to maximize the probability of task completion (PTC) of periodically inspected software systems. The methodology encompasses an event transition-based iterative method proposed for quantifying the PTC and application of the Genetic Algorithm for deriving the optimal rejuvenation policy. Examples are presented to demonstrate the proposed methodology and influences of several parameters (e.g., inspection interval, rejuvenation time) on the optimization results.

Suggested Citation

  • Levitin, Gregory & Xing, Liudong & Xiang, Yanping, 2020. "Optimizing software rejuvenation policy for tasks with periodic inspections and time limitation," Reliability Engineering and System Safety, Elsevier, vol. 197(C).
  • Handle: RePEc:eee:reensy:v:197:y:2020:i:c:s0951832019309457
    DOI: 10.1016/j.ress.2019.106776
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0951832019309457
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ress.2019.106776?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Dohi, Tadashi & Zheng, Junjun & Okamura, Hiroyuki & Trivedi, Kishor S., 2018. "Optimal periodic software rejuvenation policies based on interval reliability criteria," Reliability Engineering and System Safety, Elsevier, vol. 180(C), pages 463-475.
    2. Levitin, Gregory & Xing, Liudong & Huang, Hong-Zhong, 2019. "Optimization of partial software rejuvenation policy," Reliability Engineering and System Safety, Elsevier, vol. 188(C), pages 289-296.
    3. Levitin, Gregory & Xing, Liudong & Xiang, Yanping, 2020. "Cost minimization of real-time mission for software systems with rejuvenation," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
    4. Machida, Fumio & Miyoshi, Naoto, 2017. "Analysis of an optimal stopping problem for software rejuvenation in a deteriorating job processing system," Reliability Engineering and System Safety, Elsevier, vol. 168(C), pages 128-135.
    5. Levitin, Gregory & Xing, Liudong & Dai, Yuanshun, 2018. "Heterogeneous 1-out-of-N warm standby systems with online checkpointing," Reliability Engineering and System Safety, Elsevier, vol. 169(C), pages 127-136.
    6. Levitin, Gregory & Xing, Liudong & Ben-Haim, Hanoch, 2018. "Optimizing software rejuvenation policy for real time tasks," Reliability Engineering and System Safety, Elsevier, vol. 176(C), pages 202-208.
    7. Levitin, Gregory & Xing, Liudong & Luo, Liang, 2019. "Joint optimal checkpointing and rejuvenation policy for real-time computing tasks," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 63-72.
    8. Levitin, Gregory & Xing, Liudong & Dai, Yuanshun, 2014. "Cold vs. hot standby mission operation cost minimization for 1-out-of-N systems," European Journal of Operational Research, Elsevier, vol. 234(1), pages 155-162.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Levitin, Gregory & Xing, Liudong & Xiang, Yanping, 2020. "Cost minimization of real-time mission for software systems with rejuvenation," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
    2. Levitin, Gregory & Xing, Liudong & Huang, Hong-Zhong, 2019. "Optimization of partial software rejuvenation policy," Reliability Engineering and System Safety, Elsevier, vol. 188(C), pages 289-296.
    3. Levitin, Gregory & Xing, Liudong & Ben-Haim, Hanoch, 2018. "Optimizing software rejuvenation policy for real time tasks," Reliability Engineering and System Safety, Elsevier, vol. 176(C), pages 202-208.
    4. Levitin, Gregory & Xing, Liudong & Dai, Yuanshun, 2018. "Co-residence based data vulnerability vs. security in cloud computing system with random server assignment," European Journal of Operational Research, Elsevier, vol. 267(2), pages 676-686.
    5. Levitin, Gregory & Xing, Liudong & Luo, Liang, 2019. "Joint optimal checkpointing and rejuvenation policy for real-time computing tasks," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 63-72.
    6. Junjun Zheng & Hiroyuki Okamura & Tadashi Dohi, 2021. "Availability Analysis of Software Systems with Rejuvenation and Checkpointing," Mathematics, MDPI, vol. 9(8), pages 1-15, April.
    7. Wen, Tao & Deng, Yong, 2020. "The vulnerability of communities in complex networks: An entropy approach," Reliability Engineering and System Safety, Elsevier, vol. 196(C).
    8. Amirhossain Chambari & Javad Sadeghi & Fakhri Bakhtiari & Reza Jahangard, 2016. "A note on a reliability redundancy allocation problem using a tuned parameter genetic algorithm," OPSEARCH, Springer;Operational Research Society of India, vol. 53(2), pages 426-442, June.
    9. Levitin, Gregory & Xing, Liudong & Dai, Yuanshun, 2022. "Optimal sequencing of elements activation in 1-out-of-n warm standby system with storage," Reliability Engineering and System Safety, Elsevier, vol. 221(C).
    10. Levitin, Gregory & Finkelstein, Maxim & Dai, Yuanshun, 2018. "Optimizing availability of heterogeneous standby systems exposed to shocks," Reliability Engineering and System Safety, Elsevier, vol. 170(C), pages 137-145.
    11. Wu, Hui & Li, Yan-Fu & Bérenguer, Christophe, 2020. "Optimal inspection and maintenance for a repairable k-out-of-n: G warm standby system," Reliability Engineering and System Safety, Elsevier, vol. 193(C).
    12. Levitin, Gregory & Xing, Liudong & Haim, Hanoch Ben & Dai, Yuanshun, 2019. "Optimal structure of series system with 1-out-of-n warm standby subsystems performing operation and rescue functions," Reliability Engineering and System Safety, Elsevier, vol. 188(C), pages 523-531.
    13. Kim, Heungseob & Kim, Pansoo, 2017. "Reliability models for a nonrepairable system with heterogeneous components having a phase-type time-to-failure distribution," Reliability Engineering and System Safety, Elsevier, vol. 159(C), pages 37-46.
    14. Caserta, Marco & Voß, Stefan, 2015. "An exact algorithm for the reliability redundancy allocation problem," European Journal of Operational Research, Elsevier, vol. 244(1), pages 110-116.
    15. Ruiz-Castro, Juan Eloy & Dawabsha, Mohammed & Alonso, Francisco Javier, 2018. "Discrete-time Markovian arrival processes to model multi-state complex systems with loss of units and an indeterminate variable number of repairpersons," Reliability Engineering and System Safety, Elsevier, vol. 174(C), pages 114-127.
    16. Heping Jia & Rui Peng & Yi Ding & Yonghua Song, 2019. "Reliability of demand-based warm standby system with common bus performance sharing," Journal of Risk and Reliability, , vol. 233(4), pages 580-592, August.
    17. Coit, David W. & Zio, Enrico, 2019. "The evolution of system reliability optimization," Reliability Engineering and System Safety, Elsevier, vol. 192(C).
    18. Tadashi Dohi & Hiroyuki Okamura & Cun-Hua Qian, 2022. "Computation algorithms for workload-dependent optimal checkpoint placement," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(2), pages 788-796, June.
    19. Zhenya Liu & Yuhao Mu, 2022. "Optimal Stopping Methods for Investment Decisions: A Literature Review," IJFS, MDPI, vol. 10(4), pages 1-23, October.
    20. Levitin, Gregory & Xing, Liudong & Peng, Sun & Dai, Yuanshun, 2015. "Optimal choice of standby modes in 1-out-of-N system with respect to mission reliability and cost," Applied Mathematics and Computation, Elsevier, vol. 258(C), pages 587-596.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:reensy:v:197:y:2020:i:c:s0951832019309457. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/reliability-engineering-and-system-safety .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.