IDEAS home Printed from https://ideas.repec.org/a/spr/jsched/v21y2018i2d10.1007_s10951-017-0537-x.html
   My bibliography  Save this article

Multi-stage resource-aware scheduling for data centers with heterogeneous servers

Author

Listed:
  • Tony T. Tran

    (University of Toronto)

  • Meghana Padmanabhan

    (University of Toronto)

  • Peter Yun Zhang

    (Massachusetts Institute of Technology)

  • Heyse Li

    (University of Toronto)

  • Douglas G. Down

    (McMaster University)

  • J. Christopher Beck

    (University of Toronto)

Abstract

This paper presents a three-stage algorithm for resource-aware scheduling of computational jobs in a large-scale heterogeneous data center. The algorithm aims to allocate job classes to machine configurations to attain an efficient mapping between job resource request profiles and machine resource capacity profiles. The first stage uses a queueing model that treats the system in an aggregated manner with pooled machines and jobs represented as a fluid flow. The latter two stages use combinatorial optimization techniques to solve a shorter-term, more accurate representation of the problem using the first-stage, long-term solution for heuristic guidance. In the second stage, jobs and machines are discretized. A linear programming model is used to obtain a solution to the discrete problem that maximizes the system capacity given a restriction on the job class and machine configuration pairings based on the solution of the first stage. The final stage is a scheduling policy that uses the solution from the second stage to guide the dispatching of arriving jobs to machines. We present experimental results of our algorithm on both Google workload trace data and generated data and show that it outperforms existing schedulers. These results illustrate the importance of considering heterogeneity of both job and machine configuration profiles in making effective scheduling decisions.

Suggested Citation

  • Tony T. Tran & Meghana Padmanabhan & Peter Yun Zhang & Heyse Li & Douglas G. Down & J. Christopher Beck, 2018. "Multi-stage resource-aware scheduling for data centers with heterogeneous servers," Journal of Scheduling, Springer, vol. 21(2), pages 251-267, April.
  • Handle: RePEc:spr:jsched:v:21:y:2018:i:2:d:10.1007_s10951-017-0537-x
    DOI: 10.1007/s10951-017-0537-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10951-017-0537-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10951-017-0537-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sigrún Andradóttir & Hayriye Ayhan & Douglas G. Down, 2003. "Dynamic Server Allocation for Queueing Networks with Flexible Servers," Operations Research, INFORMS, vol. 51(6), pages 952-968, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emmett J. Lodree & Nezih Altay & Robert A. Cook, 2019. "Staff assignment policies for a mass casualty event queuing network," Annals of Operations Research, Springer, vol. 283(1), pages 411-442, December.
    2. Zhao, Yaping & Xu, Xiaoyun & Li, Haidong & Liu, Yanni, 2016. "Prioritized customer order scheduling to maximize throughput," European Journal of Operational Research, Elsevier, vol. 255(2), pages 345-356.
    3. Aili (Alice) Zou & Douglas G. Down, 2018. "Asymptotically Maximal Throughput in Tandem Systems with Flexible and Dedicated Servers," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 35(05), pages 1-15, October.
    4. Sigrún Andradóttir & Hayriye Ayhan & Douglas G. Down, 2007. "Compensating for Failures with Flexible Servers," Operations Research, INFORMS, vol. 55(4), pages 753-768, August.
    5. Eser Kırkızlar & Sigrún Andradóttir & Hayriye Ayhan, 2010. "Robustness of efficient server assignment policies to service time distributions in finite‐buffered lines," Naval Research Logistics (NRL), John Wiley & Sons, vol. 57(6), pages 563-582, September.
    6. Otis B. Jennings, 2008. "Heavy-Traffic Limits of Queueing Networks with Polling Stations: Brownian Motion in a Wedge," Mathematics of Operations Research, INFORMS, vol. 33(1), pages 12-35, February.
    7. Naumov, Valeriy & Martikainen, Olli, 2011. "Method for Throughput Maximization of Multiclass Networks with Flexible Servers," Discussion Papers 1261, The Research Institute of the Finnish Economy.
    8. J. G. Dai & Wuqin Lin, 2005. "Maximum Pressure Policies in Stochastic Processing Networks," Operations Research, INFORMS, vol. 53(2), pages 197-218, April.
    9. Yi‐Chun Tsai & Nilay Tanık Argon, 2008. "Dynamic server assignment policies for assembly‐type queues with flexible servers," Naval Research Logistics (NRL), John Wiley & Sons, vol. 55(3), pages 234-251, April.
    10. Eugene Furman & Adam Diamant & Murat Kristal, 2021. "Customer Acquisition and Retention: A Fluid Approach for Staffing," Production and Operations Management, Production and Operations Management Society, vol. 30(11), pages 4236-4257, November.
    11. S.M.R. Iravani & J.A. Buzacott & M.J.M. Posner, 2005. "A robust policy for serial agile production systems," Naval Research Logistics (NRL), John Wiley & Sons, vol. 52(1), pages 58-73, February.
    12. Yarmand, Mohammad H. & Down, Douglas G., 2013. "Server allocation for zero buffer tandem queues," European Journal of Operational Research, Elsevier, vol. 230(3), pages 596-603.
    13. Cong Shi & Yehua Wei & Yuan Zhong, 2019. "Process Flexibility for Multiperiod Production Systems," Operations Research, INFORMS, vol. 67(5), pages 1300-1320, September.
    14. Nilay Tanık Argon & Sigrún Andradóttir, 2017. "Pooling in tandem queueing networks with non-collaborative servers," Queueing Systems: Theory and Applications, Springer, vol. 87(3), pages 345-377, December.
    15. Sigrún Andradóttir & Hayriye Ayhan & Douglas G. Down, 2022. "Synchronous resource allocation: modeling, capacity, and optimization," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 44(4), pages 1287-1310, December.
    16. Tuğçe Işık & Sigrún Andradóttir & Hayriye Ayhan, 2016. "Optimal control of queueing systems with non-collaborating servers," Queueing Systems: Theory and Applications, Springer, vol. 84(1), pages 79-110, October.
    17. Peng Wang & Kai Pan & Zhenzhen Yan & Yun Fong Lim, 2022. "Managing Stochastic Bucket Brigades on Discrete Work Stations," Production and Operations Management, Production and Operations Management Society, vol. 31(1), pages 358-373, January.
    18. Eser Kırkızlar & Sigrún Andradóttir & Hayriye Ayhan, 2012. "Flexible Servers in Understaffed Tandem Lines," Production and Operations Management, Production and Operations Management Society, vol. 21(4), pages 761-777, July.
    19. J. G. Dai & O. B. Jennings, 2004. "Stabilizing Queueing Networks with Setups," Mathematics of Operations Research, INFORMS, vol. 29(4), pages 891-922, November.
    20. Gabriel Zayas-Cabán & Jingui Xie & Linda V. Green & Mark E. Lewis, 2016. "Dynamic control of a tandem system with abandonments," Queueing Systems: Theory and Applications, Springer, vol. 84(3), pages 279-293, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jsched:v:21:y:2018:i:2:d:10.1007_s10951-017-0537-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.