Why Temperature is Critical in Thermal Management

Last week in this series on How to Select a Heat Sink for an OEM Project, we talked about why thermal management is a challenge to design for. This week we want to talk about why temperature is critical in Thermal Management.

The fact that temperature is critical in thermal management would seem self-evident, but, as with many things in thermal management, there’s more here than meets the eyes.

Power dissipation from electronics is on the rise, and so is its consequence on temperature. As a result, accurate temperature measurement is playing a larger role in the successful launch of new products.

Figure 1 shows a typical flow chart for a products design cycle [1]. Thermal engineering is required in three distinct areas: Concept, Prototype, and Verification.

The Role of Thermal Management in the Product Design Cycle

At the Concept level, thermal analysis is performed to ascertain the design feasibility and move the product to the Prototype stage. Here, the product is assembled and temperature measurements are taken to ensure the design meets the intended specs while there is a level of system functionality. If the design passes this stage, the product is fabricated. Then, the fully-functional system is checked for adherence to its intended design features while it is stressed at elevated temperatures. This is part of the Verification stage.

At all three stages, accurate knowledge of temperature is needed to ensure the system meets expected performance levels. An engineer designing an MRI system certainly wouldn’t want to include non-existing, extra features in the digital image as a result of excess temperature. Nor, by analogy, would someone transferring funds in a bank transaction want their electronics to place extra zeros in the transfer amount because of temperature overshoots in the system.

Equally important is the expected life of an electronics system. Every company wants to ensure that its products will successfully function in their intended environment. And every company wants their products to be first on the market. To support these goals, a fair amount of reliability calculations must be performed.

To fully appreciate the role of temperature in reliability calculations, and how important it is to accurately measure or estimate temperature, consider the following equations. Arrhenius and Eyring models are commonly used in reliability calculations, as shown in Equations 1 and 2, respectively [2].

Arrhenius and Eyring modelsThe temperature acceleration factor is given by Equation 3:

Temperature Acceleration Factorwhere:

Key for Temperature Acceleration Equation

Experts in the field of reliability argue about the accuracy and consistency of these models. Nevertheless, they are used extensively for reliability predictions. More importantly, in all these equations temperature plays a pivotal role in the magnitude of the expected life. Temperature directly impacts an engineers decision on the cooling system selection, launch, or redesign of a product.

Temperature has a direct role in the fail-proof operation of electronic components, and subsequently the complete system. Thermally induced stresses will create failures in the system if they are not managed. Device and fluid temperature directly impact the magnitude of such stresses. This is best demonstrated by Equation 4, describing the stress in the lead-wire.

Equation for stress in a lead wireWhere:

Equation 4 Legend

Equation 4 shows a linear relationship between stress and temperature. Any mis-prediction of temperature will result in the under- or over-prediction of stresses that factor into an engineers decision making process.

 

Lets bring this issue to the device level. Consider the following devices:

 

  1. A stand-alone device, i.e., a single device residing on a board. This is typically what is arranged to perform device level characterization.
  2. A system in-situ device, i.e., a device that resides in a system on a PCB, is surrounded by other components, and the board has signal traffic going through it.

 

To effectively characterize these components we need to know the following temperatures:

  • Case
  • Junction
  • Approach fluid (e.g., the air temperature measured at half device length, upstream, at the channel center formed by two PCBs).

In the first situation, stand-alone device, while the heat transfer is highly three dimensional, the boundary condition for the device is very clear and easily quantified. In the in-situ example, the thermal boundary condition for the device is rather complex. The heat transport dynamics can be affected by the functionality and the traffic on the PCBs in the system. Therefore, the error in measurement can be very expansive when you compare these two cases. Table 1 shows the magnitude of such errors based on field observations.

Equation 4 shows a linear relationship between stress and temperature. Any mis-prediction of temperature will result in the under- or over-prediction of stresses that factor into an engineers decision making process.

Lets bring this issue to the device level. Consider the following devices:

  1. A stand-alone device, i.e., a single device residing on a board. This is typically what is arranged to perform device level characterization.
  2. A system in-situ device, i.e., a device that resides in a system on a PCB, is surrounded by other components, and the board has signal traffic going through it.

To effectively characterize these components we need to know the following temperatures:

  • Case
  • Junction
  • Approach fluid (e.g., the air temperature measured at half device length, upstream, at the channel center formed by two PCBs).

In the first situation, stand-alone device, while the heat transfer is highly three dimensional, the boundary condition for the device is very clear and easily quantified. In the in-situ example, the thermal boundary condition for the device is rather complex. The heat transport dynamics can be affected by the functionality and the traffic on the PCBs in the system. Therefore, the error in measurement can be very expansive when you compare these two cases. Table 1 shows the magnitude of such errors based on field observations.

Errors in Temperature Measurement and CalculationThe Temperature Variation column in Table 1 shows typical data seen in the field during device characterization. In both stand-alone and in-situ conditions, device case temperature measurement is very much a function of location on the case. The variations can range from a couple of degrees to several tens of degrees depending on device packaging and power dissipation. The same applies to the junction temperature measurement.

The reason for these wide temperature ranges stems from the structure of the die. The die is a miniature PCB with significant heat flux variations running across it. It is often very difficult to ascertain the location of the hottest point, i.e., the junction. Therefore, variations of 30oC or higher are readily seen.

The other common error when performing temperature characterization and measurement is found with the approach fluid temperature. In the case of a stand-alone device, because the work is typically done in a wind tunnel, the approach air temperature is impacted by the free ambient. Thus, 2oC is a reasonable variation. However, with the in-situ device, since the component resides on a PCB, the air temperature variations are rather large, depending on the upstream power dissipation, and on component layout and geometry. Because of the board layout, one can often see 20oC air temperature variations in an area less than 5 cm2.

The third and fourth columns in Table 1 are typical uncertainties observed in temperature calculation and measurement of the aforementioned parameters. As shown, this uncertainty is especially high when we focus on in-situ measurement with respect to approach fluid (air temperature and velocity). Considering the complex environment of a PCB, this variation clearly results from the geometry, power, heat flux density, and the tool type used for calculation and measurement.

As Table 1 shows, cumulative uncertainties for in-situ and stand-alone conditions can create a rather large variation in the resultant temperature. Lets look at an example.

Referring back to Equation 3, the temperature acceleration factor often used in reliability calculations, consider the following:

Equation 3 Legend

This results in a temperature acceleration factor (AT) of 48 for the normal, non-error condition. Now, based on Table 1 we will impose 10% and 35% uncertainties in the temperature measurement of T2. Table 2 below shows the results of these errors on the acceleration factor.

Uncertainty in temperature measurement on acceleration factorsTable 2: Effects of Uncertainty in Temperature Measurement
on Acceleration Factor A(t)

Despite the fact that the 10% and 35% errors in this example are significantly lower than the extreme shown in Table 1, the resulting temperature and the value of AT are rather daunting.

Taking this to the next level, an engineer may need to make a decision based on the numbers generated in Table 2 without realizing the potential errors in temperature determination, i.e., quick calculations or measurements.

Let us define as a factor of goodness in the design defined by Equation 5.

Where:

equation 5 legendEquation 5 implies that once the design is complete, at least a 10% margin should be allowed to ensure its safe function, as not all thermal contributions to the device can be accurately quantified. If we calculate the value for these cases, and if the T2,spec = 155oC and the Ta = 55oC, we see the results in Table 3 for

Critical temperature tableTable 3: Impact on Design Quality as a Result of
Uncertainty in Measurement Calculations

If there were no errors, the T2 calculation shows that the design is not satisfactory, but the margin of 5% is small enough that, with minimal additional effort, it can be remedied. The 10% and 35% uncertainties create results that are off target by 20-58%. If they were not the manifestation of bad engineering practices, they would require significant effort to fix. Conversely, they may force the engineer to consider a higher capacity cooling system with little-to-no market appeal and significantly larger cost or alternatively recommend a total redesign of the system!

It is thus shown that uncertainty and error in measurement or calculation (including numerical simulation, e.g., CFD) may lead to decisions with product-stopping consequences. In the case of temperature, a significantly more costly cooling solution may be considered. In the case of reliability calculations, more costly components may be specified so the product will meet the expected life. And this is all because of poor measurement practices, or inappropriate sensors or analysis tools. If being first-to-market with reliable and functional products is important, it is prudent to understand the tools we use for our analyses (measurement or simulations) and consider the uncertainties associated with such tools before making a decision.

References

  1. Azar, K., Reliability Analysis and Uncertainty in Temperature Prediction, Therminic Conference, 2002.
  2. Klinger, D., Nakada, Y., and Menendez, M., AT&T Reliability Manual, Van Nostrand Reinhold, 1990.

10 responses to “Why Temperature is Critical in Thermal Management

Leave a Reply

Your email address will not be published. Required fields are marked *