Tag Archives: instruments

Why is thermal characterization of SSD important?

Thanks to faster boot-up times and enhanced reliability and performance, solid-state drives (SSD) have grown in popularity with consumers in the past decade. From laptops to portable hard drives to telecommunications applications, solid-state drives are eclipsing hard disk drives, particularly as the price of SSD technology lowers, making it more cost-effective for system designers.

What makes the SSD different? Solid-state drives have no moving parts. There is no mechanical arm to read and write data. Instead SSD use embedded processors to control the processes related to storing, retrieving, caching, encrypting, and cleaning up data.

Thermal Characterization of SSD

Apple is one of many laptop companies that have turned to solid-state drives for laptops because of improved boot times and better battery efficiency. (Wikimedia Commons)

As explained by Storage Review, “Conversely, a hard disk drive uses a mechanical arm with a read/write head to move around and read information from the right location on a storage platter. This difference is what makes SSD so much faster. As an analogy, what’s quicker? Having to walk across the room to retrieve a book to get information or simply magically having that book open in front of you when you need it? That’s how an HDD compares to an SSD; it simply requires more physical labor (mechanical movement) to get information.” [1]

Consumers and engineers alike are turning to SSD and that has made a number of companies jump into the market, although, according to a recent report that counted global sales through 2016, Samsung (21 percent) and Kingston (16) percent remain the largest retailers of SSD in the world. All the other companies listed, including Intel, SanDisk, and Toshiba, all had percentages in the single digits. [2]

The benefits of SSD are well-known: System boot times that are typically 1/3-1/4 of HDD, half of the power draw for longer battery life, much larger storage capacity, reduced noise during use, and greater mean time between failure (MTBF).

One area of importance for SSD, which is another byproduct of no moving parts, is that they generally produce less heat than HDD. For instance, you are less likely to burn your lap while working on your laptop if it has a solid-state drive. Thermal issues remain for SSD, as they do for any electronic device, but compared to HDD they have fewer cooling requirements. [3]

A recent study out of Carnegie Mellon University (Pittsburgh, Pa.) in collaboration with Facebook, Inc. analyzed the reliability of flash-based SSD. One of the external factors that the researchers considered was temperature. In examining three distinct groups of SSD in a Facebook data center, the study described similar failure rates at a range of 30-40°C, but the failure rates varied greatly as temperatures increased beyond that operating range. Failure rates are explained in the following tables. [4]

This chart from researchers at Carnegie Mellon demonstrates the instability that temperature spikes cause in SSD performance. [4]

The researchers concluded, “In general, we find techniques like throttling, which may be employed to reduce SSD temperature, to be effective at reducing the failure rate of SSDs. We also find that SSD temperature is correlated with the power used to transmit data across the PCIe bus, which can potentially be used as a proxy for temperature in the absence of SSD temperature sensors.”

Temperature is an increasing factor for SSD. Like the rest of the electronics industry, engineers are designing SSD to handle more chips, more channels, more cores, and more controllers to handle a greater level of processing capability. A study from the Computer Architecture and Memory Systems Lab at the University of Texas – Dallas (UT Dallas), presented at HotStorage 2014, reported that there were 64 times as many chips and channels in SSD as there were just 12 years before. Just like the Carnegie Mellon study, the UT-Dallas researchers determined that “device-level protection mechanisms dynamically reduce heat output.” [5]

This chart from UT-Dallas shows the performance degradation that comes from overheating of SSD. [5]

The UT-Dallas study concluded that overheating led to malfunctions in the SSD and that devices with larger data sizes reached the overheating point quicker. According to the report, there was “significant performance degradation at the overheating points” and that overheating and its requisite power throttling “hinder SSD from integrating more resources.” This problem, the study concluded, was “holding back state-of-the-art SSD from achieving potential performance gains.”

As SSD continues to gain a stronghold in the market, including Intel’s recent announcement that it was going to accelerate the deployment of its SSD technology throughout its product line to “enhance user productivity and mobility while reducing IT total cost of ownership,” [6] it is obvious that thermal characterization of SSD and thermal management of systems with SSD are primary concerns for the industry.

There is also a real cost to bad data and SSD are not removed from that risk. From the CPU to the PCB to SSD storage, inaccuracies and outright errors adversely impact device reliability and system design. One model that is commonly used in the characterization of SSD and other data storage products is the Arrhenius Model:

where:

This reliability model has temperature as a key component. These models can be used via pen and paper, computer modeling or on a spreadsheet. But is the Arrhenius model a good model for NAND (negative-AND) flash memory?

Data presented at the Flash Memory Summit in 2014 by IBM showed that reliability models such as Arrhenius are not necessarily accurate for characterizing SSD. Indeed, such a model can end up creating a correct match for just two data points. The presentation recommends that acceleration models be validated. In addition, it is recommended that it is best to test a full device and that doing so will best allow measurement of the total behavior. [7]

The Storage Networking Industry Association (SNIA) also released a test methodology, test suite, and reporting format for SSD to ensure the accuracy and reliability of the data being reported. [8]

Measuring the thermal characteristics of SSD is similar to the process of characterizing a standard semiconductor. A typical test setup for characterization would include a closed-loop wind tunnel, preferably with heater, sensors, thermocouples or RTD and an analog-to-digital capture system or hot wire anemometer. A closed-loop wind tunnel with heater provides an environment for controlled temperatures from ambient to 85°C or more (although most testing will take place lower than 70°C to avoid damaging the device). Thermocouples or sensors will give important data about the junction temperature of the machine as it is in operation within the system and instruments and sensors can be incorporated to see how the SSD reacts to external factors.

It is critical that thermal management is considered not only in the design phase, but that the products are tested to determine the impact of temperature on date storage to avoid errors, lost information and device failure. To make sure that the data is accurate and reliable and to save time and money in the long-run, it is imperative to use research-quality instruments during the test phase.

As an article from Qpedia Thermal eMagazine explained, “Small errors in temperature and air flow measurements can have a significant effect on reliability predictions. The origin of these errors lies in the measurement process or the use of inaccurate instruments.”

The article continued, “Accurate and high-quality instruments are not only essential for any engineering practice, their absence will adversely impact reliability predictions of a product at hand. No company wants to have its products returned, especially because of thermally induced failures.” [9]

Advanced Thermal Solutions, Inc. (ATS) has an array of state-of-the-art thermal instruments that can be used to study the impact of temperature on SSD performance from closed-loop and open-loop wind tunnels to highly-accurate, portable hot-wire anemometer systems, such as the ATVS-2020 (pictured below), as well as next-generation sensors, including the handheld surface probe that is designed for measuring the surface temperature of solids.

Thermal Characterization of SSD

The ATVS-2020™ Automatic Temperature & Velocity Scanner is a patented, multi-channel hot wire anemometer system for single or multi-point measuring of air temperature and velocity. (Advanced Thermal Solutions, Inc.)

Learn more about the instruments that ATS has to offer for SSD thermal characterization in the video below:

References
1. http://www.storagereview.com/ssd_vs_hdd
2. https://www.kitguru.net/components/ssd-drives/matthew-wilson/kingston-samsung-and-are-dominating-the-global-ssd-market/
3. http://www.tomsitpro.com/articles/enterprise-ssd-testing,2-863.html
4. https://users.ece.cmu.edu/~omutlu/pub/flash-memory-failures-in-the-field-at-facebook_sigmetrics15.pdf
5. https://www.usenix.org/sites/default/files/conference/protected-files/hotstorage14_slides_zhang.pdf
6. https://www.intel.com/content/dam/doc/white-paper/intel-it-mobile-computing-ssd-accelerating-deployment-paper.pdf
7. https://www.flashmemorysummit.com/English/Collaterals/
Proceedings/2014/20140806_T1_Hetzler.pdf

8. http://www.snia.org/sites/default/files/SSS_PTS_Enterprise_v1.1.pdf
9. https://www.qats.com/cms/2013/05/28/why-use-research-quality-instruments/

For more information about Advanced Thermal Solutions, Inc. (ATS) thermal characterization capabilities, visit https://www.qats.com/Consulting/Lab-Capabilities or contact ATS at 781.769.2800 or ats-hq@qats.com.

In the ATS Labs – Where Thermal Solutions Advance to Meet Industry Demands

Thermal management innovations need to match the rapid pace at which the electronics industry is advancing. As consumers demand new and more powerful devices or greater amounts of information at faster speeds, cooling solutions of the past will not be enough. Today’s cooling solutions must be smaller, lighter, and offer higher performance, but also need to be cost-effective, meet demanding project specifications, and be reliable for many years.

Advanced Thermal Solutions, Inc. (ATS) understands the importance of creating cutting-edge thermal solutions for its customers and has geared its thermal design capability and its research and development to match the innovations taking place in electronics design.

ATS Labs

An ATS engineer assembles a rig for testing cold plates in one of ATS’ six state-of-the-art labs. (Advanced Thermal Solutions, Inc.)

To meet the need for innovative solutions, ATS engineers are hard at work in the company’s six state-of-the-art laboratories at the ATS headquarters, located in Norwood, Mass. (south of Boston). Thermal issues of all kinds are recognized, broken down, and resolved and cooling solutions are designed, simulated, prototyped, and rigorously tested in these research-grade facilities.

When someone thinks of a research lab, the initial picture is scientists in white coats working for major corporations, such as IBM, Microsoft, or Google, but the development of new ideas is an essential tool for any company in the technology field. Working with empirical tests in a lab environment pushes concepts from the white board or the computer screen to reality. There comes a time when engineers need to produce tangible data to ensure that a design works as planned.

ATS thermal engineers are no different. They use state-of-the-art instruments and software in each of the six labs to conduct a long list of characterization, quality-assurance, and validation tests. In addition to finding custom cooling solutions for customers, ATS engineers produce thermal management products for commercial uses, including a variety of next generation heat sink, heat pipe, vapor chamber, and liquid cooling designs.

Engineers test ATS instruments using a wind tunnel and sensors in the Characterization Lab. (Advanced Thermal Solutions, Inc.)

Among the most common tests performed in the ATS labs are:

• Measurements of air velocity, direction, pressure and temperature;
• Characterization of heat sink designs, fans and cold plates
• Flow visualization of liquid and air flow
• Image visualization characterization using infrared and liquid crystal thermography.

Many of the instruments that these tests are performed on were designed and fabricated by ATS. That includes open-loop, closed-loop, and bench-top wind tunnels; the award-winning iQ-200™, which measures air temperature, velocity, and pressure with one instrument; and the thermVIEW™ liquid crystal thermography system. Engineers also use specially-designed sensors, such as the ATS Candlestick Sensor, to get the most accurate analysis possible.

Smoke flow visualization tests run in ATS wind tunnels demonstrate how air flows through a system. (Advanced Thermal Solutions, Inc.)

Heat pipes and vapor chambers are increasingly common cooling solutions, particularly for mobile devices and other consumer electronics, and ATS engineers are working to expand the company’s offerings for these solutions and to develop next generation technology that optimizes the thermal performance of these products. This research involves advanced materials, new fabrication methods, performance testing, and innovative designs that are ready for mass production.

ATS engineer Vineet Barot sets up a thermal imaging camera for temperature mapping studies in the lab. (Advanced Thermal Solutions. Inc.)

ATS has also developed products to meet the growing demand across the electronics industry for liquid cooling systems. From new designs for recirculating and immersion chillers to multi-channel cold plates to tube-to-fin heat exchangers, ATS is continuing to expand its line of liquid cooling solutions to maximize the transfer of heat from liquid to air and researching new manufacturing methods, advanced materials, and other methods of enhancing the technology.

As liquid cooling technology has grown, ATS has met this demand with new instruments and lab capabilities, such as the iFLOW-200™, which measures a cold plate’s thermal and hydraulic characteristics, and full liquid loops to test ATS products under real-world conditions.

ATS Labs

ATS engineer Reza Azizian (right) works with intern Vladislav Blyakhman on a liquid cooling loop in the lab. (Advanced Thermal Solutions, Inc.)

The labs at ATS are up to even the toughest electronics cooling challenges that the company’s global customers present. Thanks to its extensive lab facilities, ATS has provided thousands of satisfied customers with the state-of-the-art thermal solutions that they demand.

For more information about Advanced Thermal Solutions, Inc. (ATS) thermal management consulting and design services, visit www.qats.com/consulting or contact ATS at 781.769.2800 or ats-hq@qats.com.

Why Use Research Quality Instruments?

The life expectancy of most products is estimated at some point prior to their introduction. Reliability analyses are an integral part of the design cycle of a product. In all reliability calculations, temperature is the key driver. The predicted life span from these calculations is often the deciding factor for introducing the product or investing more resources in redesign.

The questions that linger are: to what level of accuracy can we determine the temperature magnitude, and what is the impact of temperature uncertainty on the predicted reliability (i.e., the expected life of the product)?

When a system is operating, it incessantly experienc­es temperature and power-cycling. Such fluctuations, resulting from system design and operation or from complex thermal transport in electronic systems, create large bandwidths in temperature response. Whether it happens in the course of an analysis or a compliance/ stress testing, we often overlook the accuracy by which temperature is measured or calculated. Yet to truly obtain an adequate measure of a systems reliability in the field, such temperature data is essential.Why - Nomenclature

The CLWT-115 wind tunnel produces warm air flows for thermal studies

To demonstrate the impact of temperature on reliability, consider the two models commonly used in practice. The Arrhenius model [1], often referred to as Erroneous, is perhaps the most broadly used model in the field. Equation 1 shows the reaction rate (failure rate) k and the acceleration factor AT. KB is the Boltzmann constant (8.617 x 10-5 eV/K) and Ea is the activation energy. All temperatures are in Kelvin. Activation energy depends on the failure mechanism and the materials (for example, 0.3 – 0.5 for oxide defects, and 1.0 for contamination).

Why - 1

[1]

The second model, Eyring, often referred to as More Erroneous, is shown by Equation 2.

Why - 2

[2]

The data shows that the uncertainty band is between 7 to 51%. These numbers by themselves are alarming, yet they are commonly encountered in the field. In either case, Stand-Alone or Device-In-System, being able to accurately determine the temperature or air velocity in a highly three-dimensional thermal transport environment is not a task to be treated casually.

To measure the impact of such uncertainty on the reliability prediction, it’s best to calculate its impact on the Acceleration factor AT.

Let us consider the case when:

T1 = 40oC

T2 = 150oC

Ea = 0.4 eV

kB = 8.6×10-5 eV/K

This results in AT = 48. Now, let us impose a 10% and 35% uncertainty on the temperature measurement of T2. Table 1 shows the result of this error on the acceleration factor.

Why - Table 1

Table 1 clearly demonstrates how a small degree of uncertainty in temperature measurement can negatively impact the Acceleration Factor and, thus, the reliability predictions where AT is often used. The first row shows the correct temperature. The second row shows the result of a 10% error in temperature measurement (i.e., 165oC instead of 150oC). The last row shows the impact of a 35% error (i.e., 202oC vs. the 158.6oC that the device is actually experiencing). The end result of this error in measurement is a 230% error in the Acceleration Factor.

One may think such an error is rare, but the contrary is true! In a simple device-case-temperature measurement, the temperature gradient could be in excess of 20oC from the die to the edge of the device. Or the air temperature variation in a channel formed by two PCBs could exceed 30oC. Of course, there are variations due to geometry, material and power dissipation that are observed in any electronics system. If we add to these the effects of improperly designed instruments, the combination of physical variation and the instrument error could certainly be detrimental to a products launch.

Longevity and life cycle in the market are keys for a products success. Therefore, to determine system performance, a reliability analysis must be performed. Since time is of the essence, and first-to-market is advantageous, the quickest reliability prediction models (analysis in general) will continue to be popular. To make such models, the use of Equations 1 and 2, or others more meaningful, must include accurate component and fluid temperature data. Measurement is heavily relied upon for temperature and air velocity determination. It is imperative to employ instruments designed for use in electronics systems with the highest level of accuracy and repeatability. High-grade instruments with quality output will enhance the reliability of the product you are working on.

SUMMARY

Small errors in temperature and air flow measurements can have a significant effect on reliability predictions. The origin of these errors lies in the measurement process or the use of inaccurate instruments. The former depends on the knowledge-base of the experimenter. That is why a good experimentalist is even a better analyst. You must know where to measure and the variations that exist in the field of measurement. Electronics system environments are notorious for such variations. It is repeatedly seen that, in one square centimeter of air flow passage between two PCBs, you can have temperature variations in excess of 30oC. Therefore, measurement practices and instrument selection must address these changes and not introduce further errors because of inferior design. Besides its design, an instrument’s construction and calibration should not introduce more errors. Accurate and high-quality instruments are not only essential for any engineering practice, their absence will adversely impact reliability predictions of a product at hand. No company wants to have its products returned, especially because of thermally induced failures.

References:

1. Klinger, D., Nakada, Y., and Menendez, M., AT&T Reliability Manual, Van Nostrand Reinhold, 1990.

2. Azar, K., The Effect of Uncertainty Analysis on Temperature Prediction, Therminic Conference, 2002.