IDEAS home Printed from https://ideas.repec.org/a/eee/reensy/v182y2019icp107-119.html
   My bibliography  Save this article

Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs

Author

Listed:
  • Hoque, Khaza Anuarul
  • Ait Mohamed, Otmane
  • Savaria, Yvon

Abstract

SRAM-based FPGAs are popular in the aerospace industry for their field programmability and low cost. However, they suffer from cosmic radiation-induced Single Event Upsets (SEUs). Triple Modular Redundancy (TMR) is a well-known technique to mitigate SEUs in FPGAs that is often used with another SEU mitigation technique known as configuration scrubbing. Traditional TMR provides protection against a single fault at a time, while partitioned TMR provides improved reliability and availability. In this paper, we present a methodology to analyze TMR partitioning at early design stage using probabilistic model checking. The proposed formal model can capture both single and multiple-cell upset scenarios, regardless of any assumption of equal partition sizes. Starting with a high-level description of a design, a Markov model is constructed from the Data Flow Graph (DFG) using a specified number of partitions, a component characterization library and a user defined scrub rate. Such a model and exhaustive analysis captures all the considered failures and repairs possible in the system within the radiation environment. Various reliability and availability properties are then verified automatically using the PRISM model checker exploring the relationship between the scrub frequency and the number of TMR partitions required to meet the design requirements. Also, the reported results show that based on a known voter failure rate, it is possible to find an optimal number of partitions at early design stages using our proposed method.

Suggested Citation

  • Hoque, Khaza Anuarul & Ait Mohamed, Otmane & Savaria, Yvon, 2019. "Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs," Reliability Engineering and System Safety, Elsevier, vol. 182(C), pages 107-119.
  • Handle: RePEc:eee:reensy:v:182:y:2019:i:c:p:107-119
    DOI: 10.1016/j.ress.2018.10.011
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0951832018304034
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ress.2018.10.011?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kretzschmar, U. & Gomez-Cornejo, J. & Astarloa, A. & Bidarte, U. & Ser, J. Del, 2016. "Synchronization of faulty processors in coarse-grained TMR protected partially reconfigurable FPGA designs," Reliability Engineering and System Safety, Elsevier, vol. 151(C), pages 1-9.
    2. Villalta, Igor & Bidarte, Unai & Gómez-Cornejo, Julen & Jiménez, Jaime & Lázaro, Jesús, 2018. "SEU emulation in industrial SoCs combining microprocessor and FPGA," Reliability Engineering and System Safety, Elsevier, vol. 170(C), pages 53-63.
    3. Prieto-Alfonso, H. & Del Peral, L. & Casolino, M. & Tsuno, K. & Ebisuzaki, T. & Rodríguez Frías, M.D., 2015. "Radiation Hardness Assurance for the JEM-EUSO Space Mission," Reliability Engineering and System Safety, Elsevier, vol. 133(C), pages 137-145.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ramezani, Reza & Ghavidel, Abolfazl & Sedaghat, Yasser, 2021. "Exact and efficient reliability and performance optimization of synchronous task graphs," Reliability Engineering and System Safety, Elsevier, vol. 205(C).
    2. Wang, Xiaoyue & Zhao, Xian & Wang, Siqi & Sun, Leping, 2020. "Reliability and maintenance for performance-balanced systems operating in a shock environment," Reliability Engineering and System Safety, Elsevier, vol. 195(C).
    3. Yang, Shunkun & Shao, Qi & Bian, Chong, 2022. "Reliability analysis of ensemble fault tolerance for soft error mitigation against complex radiation effect," Reliability Engineering and System Safety, Elsevier, vol. 217(C).
    4. Granig, Wolfgang & Faller, Lisa-Marie & Hammerschmidt, Dirk & Zangl, Hubert, 2019. "Dependability considerations of redundant sensor systems," Reliability Engineering and System Safety, Elsevier, vol. 190(C), pages 1-1.
    5. Cheng, Yao & Elsayed, E.A. & Chen, Xi, 2021. "Random Multi Hazard Resilience Modeling of Engineered Systems and Critical Infrastructure," Reliability Engineering and System Safety, Elsevier, vol. 209(C).
    6. Jung, Sejin & Yoo, Junbeom & Lee, Young-Jun, 2020. "A practical application of NUREG/CR-6430 software safety hazard analysis to FPGA software," Reliability Engineering and System Safety, Elsevier, vol. 202(C).
    7. Ramezani, Reza & Clemente, Juan Antonio & Franco, Francisco J., 2020. "Analytical reliability estimation of SRAM-based FPGA designs against single-bit and multiple-cell upsets," Reliability Engineering and System Safety, Elsevier, vol. 202(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jung, Seunghwa & Choi, Jihwan P., 2019. "Predicting system failure rates of SRAM-based FPGA on-board processors in space radiation environments," Reliability Engineering and System Safety, Elsevier, vol. 183(C), pages 374-386.
    2. Ramezani, Reza & Ghavidel, Abolfazl & Sedaghat, Yasser, 2021. "Exact and efficient reliability and performance optimization of synchronous task graphs," Reliability Engineering and System Safety, Elsevier, vol. 205(C).
    3. Yang, Shunkun & Shao, Qi & Bian, Chong, 2022. "Reliability analysis of ensemble fault tolerance for soft error mitigation against complex radiation effect," Reliability Engineering and System Safety, Elsevier, vol. 217(C).
    4. Ramezani, Reza & Sedaghat, Yasser & Naghibzadeh, Mahmoud & Clemente, Juan Antonio, 2018. "A decomposition-based reliability and makespan optimization technique for hardware task graphs," Reliability Engineering and System Safety, Elsevier, vol. 180(C), pages 13-24.
    5. Ramezani, Reza & Clemente, Juan Antonio & Franco, Francisco J., 2020. "Analytical reliability estimation of SRAM-based FPGA designs against single-bit and multiple-cell upsets," Reliability Engineering and System Safety, Elsevier, vol. 202(C).
    6. Villalta, Igor & Bidarte, Unai & Gómez-Cornejo, Julen & Jiménez, Jaime & Lázaro, Jesús, 2018. "SEU emulation in industrial SoCs combining microprocessor and FPGA," Reliability Engineering and System Safety, Elsevier, vol. 170(C), pages 53-63.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:reensy:v:182:y:2019:i:c:p:107-119. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/reliability-engineering-and-system-safety .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.