Discussion of Thermal Solution for Stratix 10 FPGA

An Advanced Thermal Solutions, Inc. (ATS) client was planning on upgrading an existing board by adding Altera’s high-powered Stratix 10 FPGAs, with estimates of as many as 90 watts of power being dissipated by two of the components and 40 watts from a third. The client was using ATS heat sinks on the original iteration of the board and wanted ATS to test whether or not the same heat sinks would work with higher power demands.

In the end, the original heat sinks proved to be effective and lowered the case temperature below the required maximum. Through a combination of analytical modeling and CFD simulations, ATS was able to demonstrate that the heat sinks would be able to cool the new, more powerful components.

ATS Field Application Engineer Vineet Barot recently spoke with Marketing Director John O’Day and Marketing Communications Specialist Josh Perry about the process he undertook to meet the requirements of the client and to test the heat sinks under these new conditions.

JP: Thanks again for sitting down with us to talk about the project Vineet. What was the challenge that this client presented to us?
VB: They had a previous-generation PCB on which they were using ATS heat sinks, ATS 1634-C2-R1, and they wanted to know if they switched to the next-gen design with three Altera Stratix 10 FPGAs, two of them being relatively high-powered, could they still use the same heat sinks?

Stratix 10 FPGA

The board that was given to ATS engineers to determine whether the original ATS heat sinks would be effective with new, high-powered Stratix 10 FPGA from Altera. (Advanced Thermal Solutions, Inc.)

They don’t even know what the power of the FPGAs is exactly, but they gave us these parameters: 40°C ambient with the junction temperatures to be no more than 100°C. Even though the initial package is capable of going higher, they wanted this limit. That translates to a 90°C case temperature. You have the silicon chip, the actual component with the gates and everything, and you have a package that puts all that together and there’s typically a thermal path that it follows to the lid that has either metal or plastic. So, there’s some amount of temperature lost from the junction to the case.

The resistance is constant so you know for any given power what the max will be. The power that they wanted for FPGAs 1 and 2, which are down at the bottom, was 90 watts, again this is an estimate, and the third one was 40 watts.

JP: How did you get started working towards a solution?
VB: Immediately we tried to identify the worst-case scenario. Overall the board lay-out is pretty well done because you have nice, linear flow. The fans are relatively powerful, lots of good flow going through there. It’s a well-designed board and they wanted to know what we could do with it.

I said, let’s start with the heat sinks that you’re already using, which are the 1634s, and then go from there. Here are the fan specs. They wanted to use the most powerful fan here in this top curve here. This is flow rate versus pressure. The more pressure you have in front of a fan, the slower it can pump out the air and this is the curve that determines that.

Stratix 10 FPGA

Fan operating points on the board, determined by CFD simulations. (Advanced Thermal Solutions, Inc.)

This little area here is sometime called the knee of the fan curve. Let’s say we’re in this area, the flow rate and pressure is relatively linear, so if I increase my pressure, if I put my hand in front of the fan, the flow rate goes down. If I have no pressure, I have my maximum flow rate. If I increase my pressure then the flow rate goes down. What happens in this part, the same thing. In the knee, a slight increase in pressure, so from .59 to .63, reduces the flow rate quite a bit.

Stratix 10 FPGA

CFD simulations showed that the fans were operating in the “knee” where it is hard to judge the impact of pressure changes on flow rate and vice versa. (Advanced Thermal Solutions, Inc.)

So, for a 0.1 difference in flow rate (in cubic meters per second) it took 0.4 inches of water pressure difference, whereas here for a 0.1 difference in flow rate it only took a .04 increase in pressure. That’s why there’s a circle there. It’s a danger area because if you’re in that range it gets harder to predict what the flow will be because any pressure-change, any dust build-up, any change in estimated open area might change your flow rate.

The 1634 is what they were using previously. It’s a copper heat pipe, straight-fin, mounted with a hardware kit and a backing plate that they have. It’s a custom heat sink that we made for them and actually the next –gen, C2-R1, we also made for them for the previous-gen of their board, they originally wanted us to add heat pipes to this copper heat sink, but I took the latest version and said, let’s see what this one will do. For the third heat sink, I went and did some analytical modeling to see what kind of requirement would be needed and I chose one of our off-the-shelf pushPIN™ heat sinks to work because it was 40 watts.

JO: Is the push pin heat sink down flow from the 1634, so it’s getting preheated air?
VB: Yes. This is a pull system, so the air is going out towards the fans.

Stratix 10 FPGA

CFD simulations done with FloTherm, which uses a recto-linear grid. (Advanced Thermal Solutions, Inc.)

This is the CFD modeling that ATS thermal engineer Sridevi Iyengar did in FloTherm. This is a big board. There are a lot of different nodes, a lot of different cells and FloTherm uses recto-linear grids to avoid waviness. You can change the shape of the lines depending on where you need to be. Sri’s also really good at modeling. She was able to turn it around in a day.

Stratix 10 FPGA

Flow vectors at the cut plane, as determined by CFD simulations. (Advanced Thermal Solutions, Inc.)

These are the different fans and she pointed out what the different fan operating curves. Within this curve, she’s able to point out where the different fans are and she’s pointing out that fan 5 is operating around the knee. If you look at all the different fans they all operate around this are, which is not the best area to operate around. You want to operate down here so that you have a lot of flow. If you look at the case temperatures, remember the max was 90°C, we’re at 75°C. We’re 15°C below, 15° margin of error. This was a push pin heat sink on this one up here and 1634s on the high-powered FPGAs down here.

Stratix 10 FPGA

JP: Was there more analysis that you did before deciding the original heat sinks were the solution?
VB: I calculated analytical models using the flow and the fan operating curves from CFD because it’s relatively hard to predict what the flow is going to be. Using that flow and doing a thermal analysis using HSM (heat sink modeling tool), we were within five percent. What Sri simulated with FloTherm was if a copper heat sink with the heat pipe was working super well, let’s try copper without the heat pipe and you can see the temperature increased from 74° to 76°C here, still way under the case temperature. Aluminum with the heat pipe was 77°; aluminum without the heat pipe was 81°, so you’re still under.

Basically there were enough margins for error, so you could go to smaller fans because there’s some concern about operating in the knee region, or you can downgrade the heat sink if the customer wanted. We presented this and they were very happy with the results. They weren’t super worried about operating in the knee region because there’s going to be some other things that might shift the curve a little bit and they didn’t want to downgrade the heat sink because of the power being dissipated.

Stratix 10 FPGA

Final case temperatures determined by CFD simulations and backed up by analytical modeling. (Advanced Thermal Solutions, Inc.)

JO: What were some of the challenges in this design work that surprised you?
VB: The biggest challenges were translating their board into a board that’s workable for CFD. It’s tricky to simplify it without really removing all of the details. We had to decide what are the details that are important that we need to simulate. The single board computer and power supply, this relatively complex looking piece here with the heat sink, and we simplified that into one dummy heat sink to sort of see if it’s going to get too hot. It all comes with it, so we didn’t have to work on it.

The power supply is even harder, so I didn’t put it in there because I didn’t know what power it would be, didn’t know how hot it would be. I put a dummy component in there to make sure it doesn’t affect the air flow too much but that it does have some effect so you can see the pressure drop from it but thermally it’s not going to affect anything.

JO: It really shows that we know how to cool Stratix FPGAs from Altera, we have clear solutions for that both custom and off-the-shelf and that we understand how to model them in two different ways. We can model them with CFD and analytical modeling. We have pretty much a full complement of capabilities when dealing with this technology.

JP: Are there times when we want to create a TLB (thermal load board) or prototype and test this in a wind tunnel or in our lab?
VB: For the most part, customers will do that part themselves. They have the capability, they have the rack and if it’s a thing where they have the fans built into the rack then they can just test it. On a single individual heat sink basis, it’s not necessary because CFD and analytical modeling are so established. You want two independent solutions to make sure you’re in the right ballpark but it’s not something you’re too concerned that the result will be too far off of the theoretical. For another client, for example, we had to make load boards, but even then they did all the testing.

To learn more about Advanced Thermal Solutions, Inc. consulting services, visit https://www.qats.com/consulting or contact ATS at 781.769.2800 or ats-hq@qats.com.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.