Category Archives: Liquid Cooling

Cooling AI Data Centers

How important are AI data centers? In just months, Elon Musk’s xAI team converted a factory outside Memphis into a cutting-edge, 100,000-GPU center for training the Colossus supercomputer—home to the Grok chatbot.

Initially powered by temporary gas turbines (later replaced by grid power), Colossus installed its first 100,000 chips in only 19 days, drawing praise from NVIDIA CEO Jensen Huang. Today, it operates 200,000 GPUs, with plans to reach 1 million GPUs by the end of 2025. [1]

Figure 1 – Elon Musk’s 1 Million Sq Ft xAI Colossus Supercomputer Facility near Memphis, TN. [1]

There are about 12,000 data centers throughout the world, nearly half of them in the United States. Now, more and more of these are being built or retrofitted for AI-specific workloads. Leaders include Musk’s xAI, Microsoft, Meta, Google, Amazon, OpenAI, and others.

High power is essential for such operations, and like computational electronics of all sizes heat issues need to be resolved.

GenAI

A key driver of data center growth is Generative AI (GenAI)—AI that creates text, images, audio, video, and code using deep learning. Chatbots and large language model ChatGPT are examples of GenAI, along with text-to-image models that generate images from written descriptions.

Managing all this is possible from new generations of processors, mainly GPUs. They all draw on higher levels of power and generate higher amounts of heat.

Figure 2 – Advanced AI Processor, the NVIDIA GH200 Grace Hopper Superchip with Integrated CPU to Increase Speed and Performance. [2,3]

AI data centers prioritize HPC hardware: GPUs, FPGAs, ASICs, and ultra-fast networking. Compared to CPUs (150–200 W), today’s AI GPUs often run >1,000 W.  . To handle massive datasets and complex computations in real-time they need significant power and cooling infrastructure.

Data Center Cooling Basics

Traditional HVAC was sufficient for older CPU-driven data centers. Today’s AI GPUs demand far more cooling, both at the chip level and facility-wide. This has propelled a need for more efficient thermal management systems at both the micro (server board and chip) and macro (server rack and facility) levels. [4]

Figure 3 – The Colossus AI Supercomputer Now Runs 200,000 GPUs. It Operates at 150MW Power, Equivalent to 80,000 Households. [5]

At Colossus, Supermicro 4U servers house NVIDIA Hopper GPUs cooled by:

  • Cold plates
  • Coolant distribution manifolds (1U between each server)
  • Coolant distribution units (CDUs) with redundant pumps at each rack base [6]

Each 4U server is equipped with eight NVIDIA H100 Tensor Core GPUs. Each rack contains eight 4U servers, totaling 64 GPUs per rack.

Between every server is a 1U manifold for liquid cooling. They connect with CDUs, heat-exchanging Coolant Distribution Units at the bottom of each rack that include a redundant pumping system. The choice of coolant is determined by a range of hardware and environmental factors.

Figure 4 – Each Colossus Rack Contains Eight 4U Servers, Totaling 64 GPUs Per Rack. Between Each Server is a 1U Manifold for Liquid Cooling. [7]
Figure 5 – The Base of Each Rack Has a 4U CDU Pumping System with Redundant Liquid Cooling. [7]

Role of Cooling Fans

Fans remain essential for DIMMs, power supplies, controllers, and NICs.

Figure 6 – Rear Door Liquid-Cooled Heat Exchangers. [7]

At Colossus, fans in the servers pull cooler air from the front of the rack, and exhaust the air at the rear of the server. From there, the air is pulled through rear door heat exchangers. The heat exchangers pass warm air through a liquid-cooled, finned heat exchanger/radiator, lowering its temperature before it exits the rack.

Direct-to-Chip Cooling

NVIDIA’s DGX H100 and H200 server systems feature eight GPUs and two CPUs that must run between 5°C and 30°C. An AI data center with a high rack density houses thousands of these systems performing HPC tasks at maximum load. Direct liquid cooling solutions are required.

Figure 7 – An NVIDIA DGX H100/H200 System Featuring Eight GPUs [8]
Figure 8 – The NVIDIA H100 SmartPlate Connects to a Liquid Cooling System to Bring Microconvective Chip-Level Cooling That Outperforms Air Cooling by 82%. [9]

Direct liquid cooling (cold plates contacting the GPU die) is the most effective method—outperforming air cooling by 82%. It is preferred for high-density deployments of the H100 or GH200.

Scalable Cooling Modules

Colossus represents the world’s largest liquid-cooled AI cluster, using NVIDIA + Supermicro technology. For smaller AI data centers, Cooling Distribution Modules (CDMs) provide a compact, self-contained solution.

Figure 9 – The iCDM-X Cooling Distribution Module from ATS Includes Pumps, Heat Exchanger and Liquid Coolant for Managing Heat from AI GPUs and Other Components. [10]

Most AI data centers are smaller, and power and cooling needs are lower, but essential. Many heat issues can be resolved using self-contained Cooling Distribution Modules.

The compact iCDM-X cooling distribution module provides up to 1.6MW of cooling for a wide range of AI GPUs and other chips. The module measures and logs all important liquid cooling parameters. It uses using just 3kW of power, and no external coolant is required.

These modules include:

•         Pumps

•         Heat exchangers

•         Cold plates

•         Digital monitoring (temp, pressure, flow)

Their sole external component is one or more cold plates removing heat from AI chips. ATS provides an industry-leading selection of custom and standard cold plates, including the high-performing ICEcrystal series.

Figure 10 – The ICEcrystal Cold Plates Series from ATS Provide 1.5 kW of Jet Impingement Liquid Cooling Directly onto AI Chip Hotspots.

Cooling Edge AI and Embedded Applications

AI isn’t just for big data centers—edge AI, robotics, and embedded systems (e.g., NVIDIA Jetson Orin, AMD Kria K26) use processors running under 100 W. These are effectively cooled with heat sinks and fan sinks from suppliers like Advanced Thermal Solutions. [11]

Figure 11 – High Performance Heat Sinks for NVIDIA and AMD AI Processors in Embedded and Edge Applications. [11]

NVIDIA also partners with Lenovo, whose 6th-gen Neptune cooling system enables full liquid cooling (fanless) across its ThinkSystem SC777 V4 servers—targeting enterprise deployments with NVIDIA Blackwell + GB200 GPUs. [12]

Figure 12 – Lenovo’s Neptune Direct Water Cooling Removes Heat from Power Supplies, for Completely Fanless Operation. [12]

Benefits gained from the Neptune system include:

  • Full system cooling (GPUs, CPUs, memory, I/O, storage, regulators)
  • Efficient for 10-trillion-parameter models
  • Improved performance, energy efficiency, and reliability

Conclusion

With surging demand, AI data centers are now a major construction focus. Historically, cooling problems are the #2 cause of data center downtime (behind power issues). With the high power needed for AI computing, these builds should carefully fit with their local communities in terms of electrical needs and sources, and water consumption. [13]

AI workloads will increase U.S. data center power demand by 165% by 2030 (Goldman Sachs), with nearly double 2022 levels (IBM/Newmark). Sustainable design and resource-conscious cooling are essential for the next wave of AI infrastructure. [14,15]

References

1. The Guardian, https://www.theguardian.com/technology/2025/apr/24/elon-musk-xai-memphis

2. Fibermall, https://www.fibermall.com/blog/gh200-nvidia.htm

3. NVIDA, https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip?ncid=no-ncid

4. ID Tech Ex, https://www.idtechex.com/en/research-report/thermal-management-for-data-centers-2025-2035-technologies-markets-and-opportunities/1036

5. Data Center Frontier, https://www.datacenterfrontier.com/machine-learning/article/55244139/the-colossus-ai-supercomputer-elon-musks-drive-toward-data-center-ai-technology-domination

6. Supermicro, https://learn-more.supermicro.com/data-center-stories/how-supermicro-built-the-xai-colossus-supercomputer

7. Serve The Home, https://www.servethehome.com/inside-100000-nvidia-gpu-xai-colossus-cluster-supermicro-helped-build-for-elon-musk/2/

8. Naddod, https://www.naddod.com/blog/introduction-to-nvidia-dgx-h100-h200-system

9. Flex, https://flex.com/resources/flex-and-jetcool-partner-to-develop-liquid-cooling-ready-servers-for-ai-and-high-density-workloads

10. Advanced Thermal Solutions, https://www.qats.com/Products/Liquid-Cooling/iCDM

11. Advanced Thermal Solutions, https://www.qats.com/Heat-Sinks/Device-Specific-Freescale

12. Lenovo, https://www.lenovo.com/us/en/servers-storage/neptune/?orgRef=https%253A%252F%252Fwww.google.com%252F

13. Deloitte, https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2025/genai-power-consumption-creates-need-for-more-sustainable-data-centers.html

14.GoldmanSachs, https://www.goldmansachs.com/insights/articles/ai-to-drive-165-increase-in-data-center-power-demand-by-2030

15. Newmark, https://www.nmrk.com/insights/market-report/2023-u-s-data-center-market-overview-market-clusters

Manufacturing services for cold plates for electronics cooling

Many companies we work with prefer to design their own cold plates and have another company do the DFMA review and the manufacturing. ATS’ extensive capability for manufacturing cold plates is demonstrated in the wide variety of tubed cold plates it produces to meet each customer’s specific requirements. Copper, stainless steel, aluminum, 4 or 48 passes, ATS has the manufacturing capability to make nearly any tubed cold plate. Different examples of these custom cold plates, made in the USA, are described on the ATS website at the link: https://www.qats.com/Products/Liquid-Cooling/Custom-Cold-Plates

cold  plate  manufacturing for OEM and ODM applications. Both tubed and finned cold plates in copper, stainless steel and other materials

Understanding coolants and components’ materials of construction and their interaction.

CPC has a very helpful white paper and funny but very useful video on the topic of understanding coolants and components’ materials of construction and their interaction. This is one of those niche topic areas that some may not consider, but, is key to avoiding performance and reliability issues in liquid cooling systems.  These are no cost, no email registration required.

==> Click here for the technical guide: https://lnkd.in/e73BvyFM

==> Click here for the video: https://lnkd.in/eWFet_BY


Nanoparticles to Enhance the Thermal Management of Electronics

The addition of nanoparticles to a coolant are an alternative approach that can be considered to improve the performance of a liquid cooled system or perhaps to further reduce the size of such a system. But nanoparticles are not necessarily well known by engineers engaged in thermal management. This list of material may help.

The addition of nanoparticles to a coolant are an alternative approach that can be considered to improve the performance of a liquid cooled system or perhaps to further reduce the size of such a system.  But nanoparticles are not necessarily well known by engineers engaged in thermal management. This list of material may help.

First, a new paper by Moita, Moreira and Pereira, does an excellent job of reviewing nanofluids for the next generation of thermal management. This paper contributes to the body of knowledge in this space by looking at typical nanoparticle/base fluid mixtures used and combined in technical and functional solutions. It covers the science of nanofluids and their practical application. You can download this open-access paper from the Multidisciplinary Digital Publishing Institute, at this link (download is a PDF): Nanofluids for the Next Generation Thermal Management of Electronics: A Review

Second, ATS was fortunate enough to have had on our research staff, Dr. Reza Azizian. He and others authored a white paper titled “Nanofluids in Electronics Cooling Applications”. This piece discusses the theory and use of nanofluids for thermal management. We’ve posted that paper on the ATS blog here: Nanofluids in Electronics Cooling Applications.

We hope you find these resources helpful. Like always, if you have trouble accessing them, drop us a comment and we’ll get you a copy.

Nanoparticles Shapes & Forms Image used by permission from the artist normaals

How Chillers Are Used in the Liquid Loop and How to Choose the Right Fluid

Chillers can be a key component in the liquid loop. They serve the function of conditioning the coolant before it heads back into the cold plate in a liquid loop. The standard refrigeration cycle of recirculating chillers is displayed below in Fig. 1.

Heat Exchanger
An example of a standard liquid cooling loop using a heat exchanger to transfer heat from the liquid to the ambient. (Advanced Thermal Solutions, Inc.)

The choice of the chiller and the fluid are an important part of the creation of the liquid loop. ATS has some resources to help engineers in this work.

First, engineers can get some help identifying the right fluid to use in their liquid loop with our article “Engineering How-To: Choosing the Right Fluid to Use with Cold Plates“. While water is the most common fluid, our article helps engineers with a specification grid on which fluid to use given different applications.

Another helpful resource for engineers is our article, “Cold Plates and Recirculating Chillers for Liquid Cooling Systems“. This article helps engineers understand the use of both cold plates and chillers deployed in the liquid loop. We also include a comparison of ATS and other industry chillers for quick reference for engineers.

But what if your new to how the liquid cooling loop works? Our 2 min. video walks engineers through. The video “What is a Cold Plate and How Does it Work” is a 2 minute video on the ATS YouTube Channel showing how the liquid loop works.

The Liquid Cooling Loop for thermal management of electronics
The liquid cooling loop and some key features of

Finally, ATS has a line or recirculating, immersion and TEC based chillers that engineers can deploy in their liquid loop to efficiently cool high power electronics. You can learn about them on our web sit here: “ATS Family of
recirculating, immersion and TEC based chillers
“.

ATS Family of Recirculating, TEC Based and Immersion Chillers
ATS Family of Recirculating, TEC Based and Immersion Chillers