Managing Thermal Emergencies in Disk-Based Storage Systems

[+] Author and Article Information
Youngjae Kim

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802youkim@cse.psu.edu

Jeonghwan Choi

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802jechoi@cse.psu.edu

Sudhanva Gurumurthi

Department of Computer Science, University of Virginia, Charlottesville, VA 22904gurumurthi@cs.virginia.edu

Anand Sivasubramaniam

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802anand@cse.psu.edu

J. Electron. Packag 130(4), 041105 (Nov 14, 2008) (8 pages) doi:10.1115/1.2993152 History: Received September 30, 2007; Revised May 11, 2008; Published November 14, 2008

Thermal-aware design of disk-drives is important because high temperatures can cause reliability problems. Dynamic thermal management (DTM) techniques have been proposed to operate the disk at the average case temperature, rather than at the worst case by modulating the activities to avoid thermal emergencies caused by unexpected events, such as fan-breaks, increased inlet air temperature, etc. A delay-based approach to adjust the disk seek activities is one such DTM solution for disk-drives. Even if such a DTM approach could overcome thermal emergencies without stopping disk activity, it suffers from long delays when servicing the requests. In this paper, we investigate the possibility of using a multispeed disk-drive (called dynamic rotations per minute (DRPM)), which dynamically modulates the rotational speed of the platter for implementing the DTM technique. Using a detailed performance and thermal simulator of a storage system, we evaluate two possible DTM policies—time-based and watermark-based—with a DRPM disk-drive and observe that dynamic RPM modulation is effective in avoiding thermal emergencies. However, we find that the time taken to transition between different rotational speeds of the disk is critical for the effectiveness of this DTM technique.

Copyright © 2008 by American Society of Mechanical Engineers
Your Session has timed out. Please sign back in to continue.



Grahic Jump Location
Figure 1

(a) shows a side view of the mechanical components of a disk-drive and (b) shows a view from the top

Grahic Jump Location
Figure 2

Temperature distribution over IBM Ultrastar 146Z10. The dotted line denotes the ambient temperature (29°C). Each label in the X-axis indicates each component in the disk-drive. “Inlet Air” denotes external inlet air temperature. The description about all other labels can be found in Fig. 1.

Grahic Jump Location
Figure 3

Each bar denotes the average temperature of each component at the steady state for Max (where VCM is on all the times while the disk platters are spinning) and Idle (VCM just turns off). The horizontal line in each graph is the thermal envelope (60°C). (a) and (b) are the steady-state base temperatures of the disk for different disk dimensions (such as the size of platter and the rotational speed of platter) and different power consumption modes under various ambient temperatures. The horizontal line in each graph is the thermal envelope (60°C).

Grahic Jump Location
Figure 4

Performance degradation of DRPMsimple for the server workloads. The value in parentheses at each graph denotes a cooling unit time (which is given as a delay time, once it becomes close to the thermal envelope (60°C)).

Grahic Jump Location
Figure 5

Thermal profiles of the real workloads for DRPMopt under the scenarios of Table 1. They are all for the disk0 of disk arrays each of which is a 10–20 K multispeed disk with 7 s of RPM transition time and 400 s of a cooling unit time.

Grahic Jump Location
Figure 6

Correlation between cooling unit time, RPM transition time, and performance (i.e., response time). Each bar denotes an average value across the disks at a disk array in the unit of millisecond.

Grahic Jump Location
Figure 7

Experimental results of DRPMopt using watermark-based policy for HPL Openmail and TPC-C. “Thermal Safety Line” in the graphs denotes temperature at which the disk-drive sufficiently cools down to operate.




Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In