0
SPECIAL SECTION PAPERS

Failure Analysis of Direct Liquid Cooling System in Data Centers

[+] Author and Article Information
Sami Alkharabsheh

Department of Mechanical Engineering,
Binghamton University,
Binghamton, NY 13902
e-mail: salkhar1@binghamton.edu

Udaya L. N. Puvvadi, Kanad Ghose

Department of Computer Science,
Binghamton University,
Binghamton, NY 13902

Bharath Ramakrishnan, Bahgat Sammakia

Department of Mechanical Engineering,
Binghamton University,
Binghamton, NY 13902

1Corresponding author.

Contributed by the Electronic and Photonic Packaging Division of ASME for publication in the JOURNAL OF ELECTRONIC PACKAGING. Manuscript received September 23, 2017; final manuscript received December 4, 2017; published online May 9, 2018. Assoc. Editor: Reza Khiabani.

J. Electron. Packag 140(2), 020902 (May 09, 2018) (8 pages) Paper No: EP-17-1094; doi: 10.1115/1.4039137 History: Received September 23, 2017; Revised December 04, 2017

In this paper, the impact of direct liquid cooling (DLC) system failure on the information technology (IT) equipment is studied experimentally. The main factors that are anticipated to affect the IT equipment response during failure are the central processing unit (CPU) utilization, coolant set point temperature (SPT), and the server type. These factors are varied experimentally and the IT equipment response is studied in terms of chip temperature and power, CPU utilization, and total server power. It was found that failure of this cooling system is hazardous and can lead to data center shutdown in less than a minute. Additionally, the CPU frequency throttling mechanism was found to be vital to understand the change in chip temperature, power, and utilization. Other mechanisms associated with high temperatures were also observed such as the leakage power and the fans' speed change. Finally, possible remedies are proposed to reduce the probability and the consequences of the cooling system failure.

FIGURES IN THIS ARTICLE
<>
Copyright © 2018 by ASME
Your Session has timed out. Please sign back in to continue.

References

Iyengar, M. , 2010, “Energy Consumption of Information Technology Data Centers,” J. Electron. Cool., 16(4), epub. https://www.electronics-cooling.com/2010/12/energy-consumption-of-information-technology-data-centers/
Shehabi, A. , Smith, S. J. , Horner, N. , Azevedo, I. , Brown, R. , Koomey, J. , Masanet, E. , Sartor, D. , Herrlin, M. , and Lintner, W. , 2016, “United States Data Center Energy Usage Report,” Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA, Report No. LBNL-1005775. https://eta.lbl.gov/publications/united-states-data-center-energy
Stansberry, M. , and Kudritzki, J. , 2013, “Uptime Institute 2012 Data Center Industry Survey,” Uptime Institute, Seattle, WA, Report. http://www.etherworks.com.au/index.php?option=com_k2&Itemid=230&id=91_2276d1f6c67a40bd451926a7ab5ccc49&lang=en&task=download&view=item
Alkharabsheh, S. , Fernandes, J. , Gebrehiwot, B. , Agonafer, D. , Ghose, K. , Ortega, A. , Joshi, Y. , and Sammakia, B. , 2015, “A Brief Overview of Recent Developments in Thermal Management in Data Centers,” ASME J. Electron. Packag., 137(4), p. 040801. [CrossRef]
Chainer, T. J. , Schultz, M. D. , Parida, P. R. , and Gaynes, M. A. , 2017, “Improving Data Center Energy Efficiency With Advanced Thermal Management,” IEEE Trans. Compon. Packag. Manuf. Technol., 7(8), pp. 1228–1239. [CrossRef]
Beaty, D. L. , 2004, “Liquid Cooling–Friend or Foe,” ASHRAE Trans., 110(2), pp. 643–652. http://connection.ebscohost.com/c/articles/15565180/liquid-cooling-friend-foe
Ellsworth, M. J. , Campbell, L. A. , Simons, R. E. , Iyengar, M. , and Schmidt, R. R. , 2008, “The Evolution of Water Cooling for IBM Large Server Systems: Back to the Future,” 11th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Orlando, FL, May 28–31, pp. 266–274.
Uptime Institute, 2014, “2014 Data Center Industry Survey,” Uptime Institute, Seattle, WA, Report https://journal.uptimeinstitute.com/2014-data-center-industry-survey/.
Ellsworth, M. , and Iyengar, M. , 2009, “Energy Efficiency Analyses and Comparison of Air and Water Cooled High Performance Servers,” ASME Paper No. InterPACK2009-89248.
Iyengar, M. , David, M. , Parida, P. , Kamath, V. , Kochuparambil, B. , Graybill, D. , Schultz, M. , Gaynes, M. , Simons, R. , Schmidt, R. , and Chainer, T. , 2012, “Extreme Energy Efficiency Using Water Cooled Servers Inside a Chiller-Less Data Center,” 13th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), San Diego, CA, May 30–June 1, pp. 137–149.
David, M. , Iyengar, M. , Parida, P. , Simons, R. , Schultz, M. , Gaynes, M. , Schmidt, R. , and Chainer, T. , 2012, “Impact of Operating Conditions on a Chiller-Less Data Center Test Facility With Liquid Cooled Servers,” 13th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), San Diego, CA, May 30–June 1, pp. 562–573.
Miller, R. , 2014, “Rise of Direct Liquid Cooling in Data Centers Likely Inevitable,” Data Center Knowledge, San Francisco, CA. http://www.datacenterknowledge.com/archives/2014/12/09/rise-direct-liquid-cooling-data-centers-likely-inevitablee
Demetriou, D. W. , Kamath, V. , and Mahaney, H. , 2016, “A Holistic Evaluation of Data Center Water Cooling Total Cost of Ownership,” ASME J. Electron. Packag., 138(1), p. 010912. [CrossRef]
Electronics Cooling, 2016, “New Liquid Cooled HPC Cluster Launched,” Electron. Cool. Mag., epub https://www.electronics-cooling.com/2016/02/new-liquid-cooled-hpc-cluster-launched/.
Parnell, L. , Demetriou, D. , and Zhang, E. , 2016, “Combining Cooling Technology and Facility Design to Improve HPC Data Center Energy Efficiency,” 15th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Las Vegas, NV, May 31–June 3, pp. 417–425.
Coles, H. , and Greenberg, S. , 2014, “Direct Liquid Cooling for Electronic Equipment,” Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA, Report No. LBNL-6641E. https://eta.lbl.gov/sites/all/files/publications/direct_liquid_cooling.pdf
Cader, T. , Westra, L. , Marquez, A. , McAllister, H. , and Regimbal, K. , 2007, “Performance of a Rack of Liquid-Cooled Servers,” ASHRAE Trans., 13(1), pp. 101–114.
Alkharabsheh, S. , Ramakrishnan, B. , and Sammakia, B. , 2017, “Pressure Drop Analysis of Direct Liquid Cooled (DLC) Rack,” 16th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Orlando, FL, May 30–June 2, pp. 815–823.
Shrivastava, S. , and Ibrahim, M. , 2013, “Benefit of Cold Aisle Containment During Cooling Failure,” ASME Paper No. IPACK2013-73219.
Alkharabsheh, S. , Sammakia, B. , Shrivastava, S. , and Schmidt, R. , 2013, “A Numerical Study for Contained Cold Aisle Data Center Using CRAC and Server Calibrated Fan Curves,” ASME Paper No. IMECE2013-65145.
Alkharabsheh, S. , Sammakia, B. , Shrivastava, S. , and Schmidt, R. , 2014, “Dynamic Models for Server Rack and CRAH in a Room Level CFD Model of a Data Center,” IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronics Systems (ITHERM), Orlando, FL, May 27–30, pp. 1338–1345.
Electronics Cooling, 2012, “Direct Contact Liquid Cooling for the Datacenter- Can it be Simple, Low Cost, High Performance and Efficient?,” Electron. Cool. Mag., epub. https://www.electronics-cooling.com/2012/12/direct-contact-liquid-cooling-for-the-datacenter-can-it-be-simple-low-cost-high-performance-and-efficient/
Stachecki, T. J. , and Ghose, K. , 2015, “Short-Term Load Prediction and Energy-Aware Load Balancing for Data Centers Serving Online Requests,” 42nd International Symposium on Computer Architecture (ISCA), Portland, OR, June 13–17. https://pdfs.semanticscholar.org/6fa8/1c40ce225bc15a88804d02549711fd178d64.pdf?_ga=2.26888573.1921525668.1518377319-1231673732.1518377319
Shrivastava, V. , Zerfos, P. , Lee, K. W. , Jamjoom, H. , Liu, Y. H. , and Banerjee, S. , 2011, “Application-Aware Virtual Machine Migration in Data Centers,” IEEE INFOCOM, Shanghai, China, Apr. 10–15, pp. 66–70.

Figures

Grahic Jump Location
Fig. 1

The three modules of the secondary loop: (a) CDM, (b) manifold module, (c) server module, and (d) microchannel cold plate component

Grahic Jump Location
Fig. 2

Liquid cooled rack inside the Binghamton University data center laboratory

Grahic Jump Location
Fig. 3

Schematic of the cooling system and the installed sensors

Grahic Jump Location
Fig. 4

The flow rate scheme simulating failure in the secondary loop (a) complete loss of flow scenario (all pumps failure) and (b) partial loss of flow scenario (single pump failure) in the secondary loop

Grahic Jump Location
Fig. 5

Effect of failure at 45 °C SPT on (a) chip temperature, (b) chip power, (c) CPU utilization, (d) fan's speed, and (e) server power (line indicates when failure was initiated)

Grahic Jump Location
Fig. 6

Effect of failure at 20 °C SPT on (a) chip temperature, (b) chip power, (c) CPU utilization, (d) fan's speed, and (e) server power (line indicates when failure was initiated)

Grahic Jump Location
Fig. 7

Effect of failure on different types of servers (SPT = 45 °C, 100% utilization)

Grahic Jump Location
Fig. 8

Effect of partial failure on (a) chip temperature and utilization and (b) chip power, server power, and fan's speed

Grahic Jump Location
Fig. 9

(a) Sensible power measurement (TC: thermocouple, FM: microflow meter, BV: ball valve) and (b) the fraction of chip power removed by the liquid coolant and the chip junction temperature at different liquid coolant flow rates

Grahic Jump Location
Fig. 10

Proposed remedy for failure using load migration (a) chip temperature and (b) CPU utilization and chip power

Tables

Errata

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In