Why is thermal characterization of SSD important?

Thanks to faster boot-up times and enhanced reliability and performance, solid-state drives (SSD) have grown in popularity with consumers in the past decade. From laptops to portable hard drives to telecommunications applications, solid-state drives are eclipsing hard disk drives, particularly as the price of SSD technology lowers, making it more cost-effective for system designers.

What makes the SSD different? Solid-state drives have no moving parts. There is no mechanical arm to read and write data. Instead SSD use embedded processors to control the processes related to storing, retrieving, caching, encrypting, and cleaning up data.

Thermal Characterization of SSD

Apple is one of many laptop companies that have turned to solid-state drives for laptops because of improved boot times and better battery efficiency. (Wikimedia Commons)

As explained by Storage Review, “Conversely, a hard disk drive uses a mechanical arm with a read/write head to move around and read information from the right location on a storage platter. This difference is what makes SSD so much faster. As an analogy, what’s quicker? Having to walk across the room to retrieve a book to get information or simply magically having that book open in front of you when you need it? That’s how an HDD compares to an SSD; it simply requires more physical labor (mechanical movement) to get information.” [1]

Consumers and engineers alike are turning to SSD and that has made a number of companies jump into the market, although, according to a recent report that counted global sales through 2016, Samsung (21 percent) and Kingston (16) percent remain the largest retailers of SSD in the world. All the other companies listed, including Intel, SanDisk, and Toshiba, all had percentages in the single digits. [2]

The benefits of SSD are well-known: System boot times that are typically 1/3-1/4 of HDD, half of the power draw for longer battery life, much larger storage capacity, reduced noise during use, and greater mean time between failure (MTBF).

One area of importance for SSD, which is another byproduct of no moving parts, is that they generally produce less heat than HDD. For instance, you are less likely to burn your lap while working on your laptop if it has a solid-state drive. Thermal issues remain for SSD, as they do for any electronic device, but compared to HDD they have fewer cooling requirements. [3]

A recent study out of Carnegie Mellon University (Pittsburgh, Pa.) in collaboration with Facebook, Inc. analyzed the reliability of flash-based SSD. One of the external factors that the researchers considered was temperature. In examining three distinct groups of SSD in a Facebook data center, the study described similar failure rates at a range of 30-40°C, but the failure rates varied greatly as temperatures increased beyond that operating range. Failure rates are explained in the following tables. [4]

This chart from researchers at Carnegie Mellon demonstrates the instability that temperature spikes cause in SSD performance. [4]

The researchers concluded, “In general, we find techniques like throttling, which may be employed to reduce SSD temperature, to be effective at reducing the failure rate of SSDs. We also find that SSD temperature is correlated with the power used to transmit data across the PCIe bus, which can potentially be used as a proxy for temperature in the absence of SSD temperature sensors.”

Temperature is an increasing factor for SSD. Like the rest of the electronics industry, engineers are designing SSD to handle more chips, more channels, more cores, and more controllers to handle a greater level of processing capability. A study from the Computer Architecture and Memory Systems Lab at the University of Texas – Dallas (UT Dallas), presented at HotStorage 2014, reported that there were 64 times as many chips and channels in SSD as there were just 12 years before. Just like the Carnegie Mellon study, the UT-Dallas researchers determined that “device-level protection mechanisms dynamically reduce heat output.” [5]

This chart from UT-Dallas shows the performance degradation that comes from overheating of SSD. [5]

The UT-Dallas study concluded that overheating led to malfunctions in the SSD and that devices with larger data sizes reached the overheating point quicker. According to the report, there was “significant performance degradation at the overheating points” and that overheating and its requisite power throttling “hinder SSD from integrating more resources.” This problem, the study concluded, was “holding back state-of-the-art SSD from achieving potential performance gains.”

As SSD continues to gain a stronghold in the market, including Intel’s recent announcement that it was going to accelerate the deployment of its SSD technology throughout its product line to “enhance user productivity and mobility while reducing IT total cost of ownership,” [6] it is obvious that thermal characterization of SSD and thermal management of systems with SSD are primary concerns for the industry.

There is also a real cost to bad data and SSD are not removed from that risk. From the CPU to the PCB to SSD storage, inaccuracies and outright errors adversely impact device reliability and system design. One model that is commonly used in the characterization of SSD and other data storage products is the Arrhenius Model:

where:

This reliability model has temperature as a key component. These models can be used via pen and paper, computer modeling or on a spreadsheet. But is the Arrhenius model a good model for NAND (negative-AND) flash memory?

Data presented at the Flash Memory Summit in 2014 by IBM showed that reliability models such as Arrhenius are not necessarily accurate for characterizing SSD. Indeed, such a model can end up creating a correct match for just two data points. The presentation recommends that acceleration models be validated. In addition, it is recommended that it is best to test a full device and that doing so will best allow measurement of the total behavior. [7]

The Storage Networking Industry Association (SNIA) also released a test methodology, test suite, and reporting format for SSD to ensure the accuracy and reliability of the data being reported. [8]

Measuring the thermal characteristics of SSD is similar to the process of characterizing a standard semiconductor. A typical test setup for characterization would include a closed-loop wind tunnel, preferably with heater, sensors, thermocouples or RTD and an analog-to-digital capture system or hot wire anemometer. A closed-loop wind tunnel with heater provides an environment for controlled temperatures from ambient to 85°C or more (although most testing will take place lower than 70°C to avoid damaging the device). Thermocouples or sensors will give important data about the junction temperature of the machine as it is in operation within the system and instruments and sensors can be incorporated to see how the SSD reacts to external factors.

It is critical that thermal management is considered not only in the design phase, but that the products are tested to determine the impact of temperature on date storage to avoid errors, lost information and device failure. To make sure that the data is accurate and reliable and to save time and money in the long-run, it is imperative to use research-quality instruments during the test phase.

As an article from Qpedia Thermal eMagazine explained, “Small errors in temperature and air flow measurements can have a significant effect on reliability predictions. The origin of these errors lies in the measurement process or the use of inaccurate instruments.”

The article continued, “Accurate and high-quality instruments are not only essential for any engineering practice, their absence will adversely impact reliability predictions of a product at hand. No company wants to have its products returned, especially because of thermally induced failures.” [9]

Advanced Thermal Solutions, Inc. (ATS) has an array of state-of-the-art thermal instruments that can be used to study the impact of temperature on SSD performance from closed-loop and open-loop wind tunnels to highly-accurate, portable hot-wire anemometer systems, such as the ATVS-2020 (pictured below), as well as next-generation sensors, including the handheld surface probe that is designed for measuring the surface temperature of solids.

Thermal Characterization of SSD

The ATVS-2020™ Automatic Temperature & Velocity Scanner is a patented, multi-channel hot wire anemometer system for single or multi-point measuring of air temperature and velocity. (Advanced Thermal Solutions, Inc.)

Learn more about the instruments that ATS has to offer for SSD thermal characterization in the video below:

References
1. http://www.storagereview.com/ssd_vs_hdd
2. https://www.kitguru.net/components/ssd-drives/matthew-wilson/kingston-samsung-and-are-dominating-the-global-ssd-market/
3. http://www.tomsitpro.com/articles/enterprise-ssd-testing,2-863.html
4. https://users.ece.cmu.edu/~omutlu/pub/flash-memory-failures-in-the-field-at-facebook_sigmetrics15.pdf
5. https://www.usenix.org/sites/default/files/conference/protected-files/hotstorage14_slides_zhang.pdf
6. https://www.intel.com/content/dam/doc/white-paper/intel-it-mobile-computing-ssd-accelerating-deployment-paper.pdf
7. https://www.flashmemorysummit.com/English/Collaterals/
Proceedings/2014/20140806_T1_Hetzler.pdf

8. http://www.snia.org/sites/default/files/SSS_PTS_Enterprise_v1.1.pdf
9. https://www.qats.com/cms/2013/05/28/why-use-research-quality-instruments/

For more information about Advanced Thermal Solutions, Inc. (ATS) thermal characterization capabilities, visit https://www.qats.com/Consulting/Lab-Capabilities or contact ATS at 781.769.2800 or ats-hq@qats.com.

5 responses to “Why is thermal characterization of SSD important?

Leave a Reply

Your email address will not be published. Required fields are marked *