IDEAS home Printed from https://ideas.repec.org/a/gam/jeners/v15y2022i2p474-d721486.html
   My bibliography  Save this article

Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning

Author

Listed:
  • Dong-Ki Kang

    (Division of Electronic and Information, Department of Computer Science and Engineering, Jeonbuk National University, Jeonju 54896, Korea)

  • Ki-Beom Lee

    (Division of Electronic and Information, Department of Computer Science and Engineering, Jeonbuk National University, Jeonju 54896, Korea)

  • Young-Chon Kim

    (Division of Electronic and Information, Department of Computer Science and Engineering, Jeonbuk National University, Jeonju 54896, Korea)

Abstract

Expanding the scale of GPU-based deep learning (DL) clusters would bring not only accelerated AI services but also significant energy consumption costs. In this paper, we propose a cost efficient deep learning job allocation (CE-DLA) approach minimizing the energy consumption cost for the DL cluster operation while guaranteeing the performance requirements of user requests. To do this, we first categorize the DL jobs into two classes: training jobs and inference jobs. Through the architecture-agnostic modeling, our CE-DLA approach is able to conduct the delicate mapping of heterogeneous DL jobs to GPU computing nodes. Second, we design the electricity price-aware DL job allocation so as to minimize the energy consumption cost of the cluster. We show that our approach efficiently avoids the peak-rate time slots of the GPU computing nodes by using the sophisticated mixed-integer nonlinear problem (MINLP) formulation. We additionally integrate the dynamic right-sizing (DRS) method with our CE-DLA approach, so as to minimize the energy consumption of idle nodes having no running job. In order to investigate the realistic behavior of our approach, we measure the actual output from the NVIDIA-based GPU devices with well-known deep neural network (DNN) models. Given the real trace data of the electricity price, we show that the CE-DLA approach outperforms the competitors in views of both the energy consumption cost and the performance for DL job processing.

Suggested Citation

  • Dong-Ki Kang & Ki-Beom Lee & Young-Chon Kim, 2022. "Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning," Energies, MDPI, vol. 15(2), pages 1-20, January.
  • Handle: RePEc:gam:jeners:v:15:y:2022:i:2:p:474-:d:721486
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1996-1073/15/2/474/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1996-1073/15/2/474/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kwon, Soongeol, 2020. "Ensuring renewable energy utilization with quality of service guarantee for energy-efficient data center operations," Applied Energy, Elsevier, vol. 276(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jakub Suder & Kacper Podbucki & Tomasz Marciniak, 2023. "Power Requirements Evaluation of Embedded Devices for Real-Time Video Line Detection," Energies, MDPI, vol. 16(18), pages 1-20, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Fengjuan & Lv, Chengwei & Xu, Jiuping, 2023. "Carbon awareness oriented data center location and configuration: An integrated optimization method," Energy, Elsevier, vol. 278(C).
    2. Han, Ouzhu & Ding, Tao & Zhang, Xiaosheng & Mu, Chenggang & He, Xinran & Zhang, Hongji & Jia, Wenhao & Ma, Zhoujun, 2023. "A shared energy storage business model for data center clusters considering renewable energy uncertainties," Renewable Energy, Elsevier, vol. 202(C), pages 1273-1290.
    3. Jiawen Yu & Yanqiu Yan & Yiqiang Jiang & Jie Ge, 2022. "Renewable energy configuration scheme of data center in cold area. A case study [An overview of renewable energy resources and grid integration for commercial building applications]," International Journal of Low-Carbon Technologies, Oxford University Press, vol. 17, pages 411-420.
    4. Han, Ouzhu & Ding, Tao & Yang, Miao & Jia, Wenhao & He, Xinran & Ma, Zhoujun, 2024. "A novel 4-level joint optimal dispatch for demand response of data centers with district autonomy realization," Applied Energy, Elsevier, vol. 358(C).
    5. Yu, Chin-Hsien & Wu, Xiuqin & Lee, Wen-Chieh & Zhao, Jinsong, 2021. "Resource misallocation in the Chinese wind power industry: The role of feed-in tariff policy," Energy Economics, Elsevier, vol. 98(C).
    6. Chen, Xiaoyuan & Jiang, Shan & Chen, Yu & Lei, Yi & Zhang, Donghui & Zhang, Mingshun & Gou, Huayu & Shen, Boyang, 2022. "A 10 MW class data center with ultra-dense high-efficiency energy distribution: Design and economic evaluation of superconducting DC busbar networks," Energy, Elsevier, vol. 250(C).
    7. Wang, Kaifeng & Ye, Lin & Yang, Shihui & Deng, Zhanfeng & Song, Jieying & Li, Zhuo & Zhao, Yongning, 2023. "A hierarchical dispatch strategy of hybrid energy storage system in internet data center with model predictive control," Applied Energy, Elsevier, vol. 331(C).
    8. Ye, Guisen & Gao, Feng & Fang, Jingyang, 2022. "A mission-driven two-step virtual machine commitment for energy saving of modern data centers through UPS and server coordinated optimizations," Applied Energy, Elsevier, vol. 322(C).
    9. Li, Weiwei & Qian, Tong & Zhang, Yin & Shen, Yueqing & Wu, Chenghu & Tang, Wenhu, 2023. "Distributionally robust chance-constrained planning for regional integrated electricity–heat systems with data centers considering wind power uncertainty," Applied Energy, Elsevier, vol. 336(C).
    10. Bian, Yifan & Xie, Lirong & Ye, Jiahao & Ma, Lan, 2024. "A new shared energy storage business model for data center clusters considering energy storage degradation," Renewable Energy, Elsevier, vol. 225(C).
    11. Mustapha Mukhtar & Victor Adebayo & Nasser Yimen & Olusola Bamisile & Emmanuel Osei-Mensah & Humphrey Adun & Qinxiu Zhang & Gexin Luo, 2022. "Towards Global Cleaner Energy and Hydrogen Production: A Review and Application ORC Integrality with Multigeneration Systems," Sustainability, MDPI, vol. 14(9), pages 1-25, April.
    12. Lin, Boqiang & Huang, Chenchen, 2023. "Promoting variable renewable energy integration: The moderating effect of digitalization," Applied Energy, Elsevier, vol. 337(C).
    13. Chen, Xiaoyuan & Jiang, Shan & Chen, Yu & Zou, Zhice & Shen, Boyang & Lei, Yi & Zhang, Donghui & Zhang, Mingshun & Gou, Huayu, 2022. "Energy-saving superconducting power delivery from renewable energy source to a 100-MW-class data center," Applied Energy, Elsevier, vol. 310(C).
    14. Wang, Jiangjiang & Deng, Hongda & Liu, Yi & Guo, Zeqing & Wang, Yongzhen, 2023. "Coordinated optimal scheduling of integrated energy system for data center based on computing load shifting," Energy, Elsevier, vol. 267(C).
    15. Daria Gritsenko & Jon Aaen & Bent Flyvbjerg, 2024. "Rethinking Digitalization and Climate: Don't Predict, Mitigate," Papers 2407.15016, arXiv.org.
    16. Liu, Wenyu & Yan, Yuejun & Sun, Yimeng & Mao, Hongju & Cheng, Ming & Wang, Peng & Ding, Zhaohao, 2023. "Online job scheduling scheme for low-carbon data center operation: An information and energy nexus perspective," Applied Energy, Elsevier, vol. 338(C).
    17. Xihao Wang & Xiaojun Wang & Yuqing Liu & Chun Xiao & Rongsheng Zhao & Ye Yang & Zhao Liu, 2022. "A Sustainability Improvement Strategy of Interconnected Data Centers Based on Dispatching Potential of Electric Vehicle Charging Stations," Sustainability, MDPI, vol. 14(11), pages 1-19, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jeners:v:15:y:2022:i:2:p:474-:d:721486. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.