IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v112y2017i517p92-108.html
   My bibliography  Save this article

Spatiotemporal Modeling of Node Temperatures in Supercomputers

Author

Listed:
  • Curtis B. Storlie
  • Brian J. Reich
  • William N. Rust
  • Lawrence O. Ticknor
  • Amanda M. Bonnie
  • Andrew J. Montoya
  • Sarah E. Michalak

Abstract

Los Alamos National Laboratory is home to many large supercomputing clusters. These clusters require an enormous amount of power (∼500–2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently, a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this and other machine rooms. This article focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. This same approach can easily be applied to monitor and investigate cooling systems at other data centers, as well. Supplementary materials for this article are available online.

Suggested Citation

  • Curtis B. Storlie & Brian J. Reich & William N. Rust & Lawrence O. Ticknor & Amanda M. Bonnie & Andrew J. Montoya & Sarah E. Michalak, 2017. "Spatiotemporal Modeling of Node Temperatures in Supercomputers," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 92-108, January.
  • Handle: RePEc:taf:jnlasa:v:112:y:2017:i:517:p:92-108
    DOI: 10.1080/01621459.2016.1195271
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2016.1195271
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2016.1195271?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Håvard Rue, 2001. "Fast sampling of Gaussian Markov random fields," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 325-338.
    2. Frahm, Gabriel & Junker, Markus & Schmidt, Rafael, 2005. "Estimating the tail-dependence coefficient: Properties and pitfalls," Insurance: Mathematics and Economics, Elsevier, vol. 37(1), pages 80-100, August.
    3. Brian J. Reich, 2012. "Spatiotemporal quantile regression for detecting distributional changes in environmental processes," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 61(4), pages 535-553, August.
    4. Curtis B. Storlie & Sarah E. Michalak & Heather M. Quinn & Andrew J. Dubois & Steven A. Wender & David H. Dubois, 2013. "A Bayesian Reliability Analysis of Neutron-Induced Errors in High Performance Computing Hardware," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(502), pages 429-440, June.
    5. Jennifer L. Wadsworth & Jonathan A. Tawn, 2012. "Dependence modelling for spatial extremes," Biometrika, Biometrika Trust, vol. 99(2), pages 253-272.
    6. Kooperberg, Charles & Stone, Charles J., 1991. "A study of logspline density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 12(3), pages 327-347, November.
    7. Reich, Brian J. & Hodges, James S. & Carlin, Bradley P., 2007. "Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 44-55, March.
    8. Padoan, S. A. & Ribatet, M. & Sisson, S. A., 2010. "Likelihood-Based Inference for Max-Stable Processes," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 263-277.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Padoan, Simone A., 2013. "Extreme dependence models based on event magnitude," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 1-19.
    2. Samuel A. Morris & Brian J. Reich & Emeric Thibaud & Daniel Cooley, 2017. "A space-time skew-t model for threshold exceedances," Biometrics, The International Biometric Society, vol. 73(3), pages 749-758, September.
    3. Håvard Rue & Sara Martino & Nicolas Chopin, 2009. "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 319-392, April.
    4. A. Abu-Awwad & V. Maume-Deschamps & P. Ribereau, 2021. "Semiparametric estimation for space-time max-stable processes: an F-madogram-based approach," Statistical Inference for Stochastic Processes, Springer, vol. 24(2), pages 241-276, July.
    5. A. Abu-Awwad & V. Maume-Deschamps & P. Ribereau, 2020. "Fitting spatial max-mixture processes with unknown extremal dependence class: an exploratory analysis tool," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(2), pages 479-522, June.
    6. Raphaël Huser & Marc G. Genton, 2016. "Non-Stationary Dependence Structures for Spatial Extremes," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(3), pages 470-491, September.
    7. Gijbels, Irène & Sznajder, Dominik, 2013. "Testing tail monotonicity by constrained copula estimation," Insurance: Mathematics and Economics, Elsevier, vol. 52(2), pages 338-351.
    8. Ziqiang Xing & Denghua Yan & Cheng Zhang & Gang Wang & Dongdong Zhang, 2015. "Spatial Characterization and Bivariate Frequency Analysis of Precipitation and Runoff in the Upper Huai River Basin, China," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(9), pages 3291-3304, July.
    9. R de Fondeville & A C Davison, 2018. "High-dimensional peaks-over-threshold inference," Biometrika, Biometrika Trust, vol. 105(3), pages 575-592.
    10. Marta Ferreira & Helena Ferreira, 2017. "Analyzing the Gaver—Lewis Pareto Process under an Extremal Perspective," Risks, MDPI, vol. 5(3), pages 1-12, June.
    11. repec:jss:jstsof:21:i08 is not listed on IDEAS
    12. Liu, Wei-han, 2016. "A re-examination of maturity effect of energy futures price from the perspective of stochastic volatility," Energy Economics, Elsevier, vol. 56(C), pages 351-362.
    13. Robert, Christian Y., 2013. "Some new classes of stationary max-stable random fields," Statistics & Probability Letters, Elsevier, vol. 83(6), pages 1496-1503.
    14. Mohamad Haytham Klaho & Hamid R. Safavi & Mohammad H. Golmohammadi & Maamoun Alkntar, 2022. "Comparison between bivariate and trivariate flood frequency analysis using the Archimedean copula functions, a case study of the Karun River in Iran," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 112(2), pages 1589-1610, June.
    15. Chang, Meng-Shiuh & Wu, Ximing, 2015. "Transformation-based nonparametric estimation of multivariate densities," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 71-88.
    16. Lee Fawcett & David Walshaw, 2014. "Estimating the probability of simultaneous rainfall extremes within a region: a spatial approach," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(5), pages 959-976, May.
    17. Matteo Coronese & Francesco Lamperti & Francesca Chiaromonte & Andrea Roventini, 2018. "Natural Disaster Risk and the Distributional Dynamics of Damages," LEM Papers Series 2018/22, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    18. Krupskii, Pavel & Huser, Raphaël, 2024. "Max-convolution processes with random shape indicator kernels," Journal of Multivariate Analysis, Elsevier, vol. 203(C).
    19. Hongxiang Yan & Hamid Moradkhani, 2016. "Toward more robust extreme flood prediction by Bayesian hierarchical and multimodeling," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(1), pages 203-225, March.
    20. Tjøstheim, Dag & Hufthammer, Karl Ove, 2013. "Local Gaussian correlation: A new measure of dependence," Journal of Econometrics, Elsevier, vol. 172(1), pages 33-48.
    21. A Ford Ramsey, 2020. "Probability Distributions of Crop Yields: A Bayesian Spatial Quantile Regression Approach," American Journal of Agricultural Economics, John Wiley & Sons, vol. 102(1), pages 220-239, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:112:y:2017:i:517:p:92-108. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.