This is part 2 of our 2 part series on server cooling. To read part 1 click to this link: Thermal Solutions for Servers Part 1
Part Two – Racks, Cards and Components
When the components inside a server get too hot, onboard logic may turn them off to avoid damage to the server. But not all components are protected like this. Along with the effects of an unscheduled server shutdown, if and when a hot server does not power down, its output could be compromised and its lifespan could be shorter than expected.
While much infrastructure and wide area cooling is employed in server installations, there is often a need for additional cooling of individual cards or select components on some cards. There are different ways to provide this localized cooling. Some are proven and traditional, and other ways are less conventional but have shown to be effective.
The following are some current examples of cooling solutions at the rack, card and individual component level.
Liquid Cooling for Racks
Direct Contact Liquid Cooling (DCLC) is a cooling solution from CoolIT Systems that uses the thermal conductivity of liquid to provide concentrated cooling to select surface areas, such as server racks and their components. DCLC systems can replace or minimize the need for system air conditioners and fans, and instead allow higher rack densities. Any server in any rack can be liquid cooled by DCLC.
Figure 1. A Liquid-to-Liquid heat exchanger module for cooling board components in cabinets. [CoolIT Systems]
An example of the DCLC technology is CoolIT’s CHx40 Liquid-to-Liquid heat exchanger for single cabinets. The 2U CHx40 module distributes clean, treated coolant, and can manage 40kW of processor load per rack, or 50 servers per rack.
The benefits from this system are increased density availability for CPUs and GPUs. CoolIT Systems server modules can cool any combination of CPU, GPU and memory components, with customization available for VR, ASIC, FPGA and other devices. Servers with these modules remain hot-swappable and simple to service.
The Fujitsu Liquid Loop Cooling technology uses a hybrid cooling model. To remove heat, there are six pumps for each processor, which circulate coolant over hot spots and back to an air-cooled radiator. The cooling system is sealed and maintenance-free.
Figure 2. Liquid Loop Cooling Allows Heat Sinks to be Smaller and Located Away from Hot Spots. [Fujitsu]
In order to cool hot spots with air, heat sinks and fans must be located at the hot spots. However this can result in components like memory or IO being further away from the processors. The space between a CPU and memory chips largely affects memory access time, degrading overall performance. In liquid cooling, heat sinks can be located away from hot spots because heat is moved via cool liquid tubes to heat sinks. This allows processors and other components like memory and IO to be close enough to reduce access time.
By not only eliminating server design restrictions, but also integrated memory controllers to the processor, memory access time is reduced to one-fifth of the previous SPARC server. The Liquid Loop Cooling system has reduced heat sink sizes. Further, this efficient cooling helps reduce the noise which may be created by lower fan rotation.
A not as new, but quickly developing method for thermally managing racks of server cards is by submersion cooling. Typically, a series of boards and all their components are immersed in a non-conductive liquid. The liquid absorbs the component heat and transfers it away from the card.
Figure 3. ElectroSafe coolant flowing over servers in a CarnoJet System also protects components from environmental contaminants. [Green Revolution Cooling]
An example is the CarnoJet System from Green Revolution Cooling which has been proven to provide an overall increase in a server’s performance. The system features inert ElectroSafe coolant, which keeps server components an average of 20°C lower than obtainable by an air-cooled environment. This improved cooling allows for faster processor clock speeds and higher density racks. Servers immersed in the coolant are protected from dust, moisture, and oxidation, which are major sources of equipment failure. The coolant is constantly filtered and in motion. This provides a rinsing effect that prevents any kind of particulate accumulation.
Other benefits of submersion cooling include cost savings on large scale air conditioning systems and their maintenance, and on power requirements. But despite many advantages, liquid cooling still faces obstacles to becoming more widely used. The concept itself can cause apprehension with some engineers. But at high densities of server hardware, liquid cooling may be a smart solution.
There are many active server rack accessories to help improve cooling. Among them are enclosure blowers, rack air conditioners, and cooling fans. Here we’ll look at some fans.
Fans can be used to cool rack set ups where larger AC units may not be practical. Internal fans typically take up as little as 1U of rack space and can install directly into a rack or enclosure.
Figure 4. A rack mounted fan unit that provides 90 cubic feet of air per minute to cool hotspots in a server cabinet. [Data Comms Direct]
For lower density racks, fans can be a viable solution when increased cooling is needed. Fans facilitate the movement of ambient air through the rack. They can be mounted on a cabinet’s ceiling, sides, and doors, or incorporated directly in the rack as slide out fan trays. To improve the effectiveness of any fans, proper airflow management is needed. This may need the use of blanking panels, gaskets or sealing tape.
Figure 5. An integral fan helps cool hot components on a PCIe card. [Pentek]
Many server processors require active heat sinks – where flowing air or liquid enhances the sink’s cooling performance. A common example is incorporating a small fan within a sink’s cooling fins. As an example, a PCIe carrier card, found in some server schemes, has a cutout opening directly under the warmest components. This allows the use of a fan-equipped heat sink that removes heat effectively from those components on the inside surface of an attached XMC module. The embedded fan forces air across the fins for active cooling.
Figure 6. A Candlestick sensor measures air temperature and velocity at different locations on server cards. [Advanced Thermal Solutions, Inc.]
Sensors can be helpful for finding dead spots in cooling airflow. They can help determine if moving a fan or using a different fan with a higher CFM rating improves cooling results. Since air flow is hard to visualize, the best solution is often found by trial and error.
Component Level Cooling
A new active heat sink solution for server CPUs, including the Intel Xeon (Haswell and Skylake), AMD Opteron series and Cavium ThunderX ARM CPUs. The thermal solution integrates a heat sink, heat pipes or vapor chamber, and blower. The DualFLOW heat sink is designed in low profile for 1U and 2U rack servers. It has rows of embedded heat pipes in its base to help transfer large volumes of component heat into a dense fin field. The blower on top draws air inward, across the many fin surfaces, then pushes this warmed air out and away.
Figure 7. A DualFLOW heat sink with heat pipes embedded in its base and a PWM-enabled blower on top. [Advanced Thermal Solutions, Inc.]
The DualFLOW heat sink can be mounted onto PCBs using PEMs or screws inside springs. Both kinds of hardware allow adjusting so the heat sink is mounted for optimum performance. Its initial mounting force is 9.2 PSI. A thermal grease interface assures excellent thermal transfer from component to the heat sink’s base. For added cooling performance, a PWM controllable fan can be mounted above the heat sink fin field.