Technology: Cascade Lake-AP, Rome, and why all is not well

In a recent tweet storm, I decided to run some numbers against Cascade Lake-AP to determine the sweet spot for density, cores, and power.  After a quick “back of the napkin” run at the figures, some clandestine confirmations of power figures, and a reminder that my tweets should probably be a blog post…well, here we are.  And it’s time to figure out how the sausage is made, folks.  Let’s begin with some baselines.

Cascade Lake-AP: A TDW Monster

Somewhere along the line, Intel decided they needed to scale up core counts per socket.  Not content with the XCC-enabled Xeon SP 8180 Platinum (28c/56t), Intel decided to take a page from AMD’s book and build a multichip module (MCM) using two of these XCC cores, a bit of UPI interconnect trickery, a new socket (>LGA3647), and presto!  48 cores, 96 threads all in a socket that comes in at a hefty 350w TDW. Wait, what?!  Yes, you read that correctly. Three hundred and fifty watts.  

Now, for perspective, let’s quickly determine HOW they got to this number (and why the alternative would’ve been much, much worse).  A single XCC-enabled Xeon SP 8180 Platinum is rated at 205w.  That’s 28 cores at 205w.  So, simply cojoining two of those cores together would have equalled over 410w (linear scaling) of power consumption and requisite thermal dissapation.  That’s a LOT to deal with in a single socket, especially since your suddenly nimble competitor, AMD, can offer 64 cores  in a single socket for considerably less. In an ideal world, then, you need some way to buffer out the power requirements with the serviceability of the platform.  In this case, by simply dropping frequency, Intel would be able to lower overall consumption with measurable impact without causing harm to the relative IPC power of their XCC cores.  In the case of Cascade Lake-AP, it is entirely within reason that core clocks could be aroun 1.8-2.0GHz versus the standard 2.5GHz of the Xeon SP 8180 Platinum.  

When you start to do the baseline math, then, Cascade Lake-AP starts to look like this:

  • 7.29w per core
  • 3.64w per thread (assuming 1:1 core/thread)

On the surface, this doesn’t seem too bad.  Again, 350w per socket (almost assuredly meaning only single socket boards due to power/thermal constraints).  But, how about AMD’s Rome?

  • 3.51w per core
  • 1.75w per thread

Notice anything special there?  First off, Rome is operating at an assumed 225w TDW per socket.  This is only a slight uptick from 205w and doesn’t measurably impact power/cooling like Cascade Lake-AP does (and will). Secondly, Rome is 64 cores and 128 threads per socket.  This is a 50% increase in cores over Cascade and a 100% increase over their previous generation, Naples (nee Epyc).  Of course, these figures are pre-release so, until Rome is released, they’re to be taken with a grain of salt.  But, it’s already telling a story that you should pay attention to.

Why does this matter?

Not everything inside of the data center is about Instructions per Clock (IPC).  Arguably, the EFFICIENCY of a computational object measured in power consumption has a very real and tangible return on investment.  With that in mind, let’s set up a hypothetical situation using the numbers above.  

A Dell C6420 compute node with 2 x Intel Xeon SP processors

Our sample solution is comprised of the Dell C6420 (and a hypothetical AMD alternative C6425) using 1,600w, 2,000w, and 2,400w redundant power supplies.  The C6420 can typically support two Xeon SP Platinum processors at a limit of 205w TDP.  Consequently, the Cascade Lake-AP’s power envelope at 305w exceeds what a single sled can support from a dual socket standpoint.  So, we will be limiting the C6420 to one Cascade Lake-AP socket apiece. On the AMD side, since the footprint of Rome is identical to Naples/Epyc, we will use the standard 2 socket configuration.

From a power supply rating standpoint, the Cascade Lake-AP will be given the advantage of using the 2,000w power supply and the Rome unit, 2,400w.   This isn’t a noticable disadvantage, but it does inhibit the overall scale per rack somewhat based on the exercise I’m going to put in front of you.

Finally, we are assuming the server power values are “all in.”  This includes disk, memory, etc.   Let’s take a look at what the numbers show, then.

Rome vs Cascade Lake-AP Power to core ration
The Power to Core Ratio

Before we begin the numbers discussion, let me AGAIN clarify something.  These products do NOT exist in such a state.  The Dell C6420 is used as a baseline (and the C6425 as a hypothetical AMD build, nothing more.  It doesn’t exist).  OK?  

From the start, Intel is at a considerable disadvantage on cores per power.  While Intel can scale to more SERVERS in a given rack footprint, they’re constrained by the actual POWER they consume on a per-sled basis. Not only is AMD providing almost 3x’s as many cores per server (actual cores, not threads), it’s doing it with 100% more efficiency that Intel is within the SAME footprint.  Now, this isn’t an IPC conversation and one could reasonably argue that the devil is in the details there.  Potentially, those Cascade Lake-AP cores could be massive IPC monsters.  However, if AMD manages a 1:1 IPC parity with Intel’s Skylake (e.g. Purley) Xeons,  they’re still in a performance per watt lead versus Intel.  And the math?  well, it’s damning.

What are your thoughts on this conversation?  Does power matter as much as I’ve made it out to be here?  What are your experiences in your data centers?  Let me know.  Respectful comments welcome!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.