By: someone (someone.delete@this.somewhere.com), November 24, 2010 8:59 am
Room: Moderated Discussions
AM (myname4rwt@jee-male.com) on 11/24/10 wrote:
---------------------------
>
>Much higher mtp vs Tuk is a must; it would be beyond ridiculous otherwise. Tell
>us one thing though: are you saying it will have much higher stp as well? How much higher then?
That is hard to say. Simply going from 2 bundle issue
to 4 bundle issue width would likely give single thread
IPC increase of 5-10% for SPECint type apps and maybe
20 to 40% for SPECfp apps with recompilation presuming
execution resources also double.
However changes to the pipeline, cache latencies, and
issue rules to improve frequency headroom, multithreading,
and power efficiency will likely cost Poulson some IPC
compared to the I2 core. Perhaps on the order of 10% on
the SPECint side and 5% on the SPECfp side.
Clock frequency is a tough one. With Tukwila the cores
get only about half the power budget of 185W. Going to
a ring bus, reduction in cache leakage from HK/MG and
far greater process headroom to implement the QPI/SMI
interface circuits will likely lead to a larger portion of the
185W budget going to the cores but the number of cores
double to 8. So how does going from ~23W 65 nm 2 banger
core to ~15W 32 nm 4 banger core affect clock frequency?
Depends strongly on the changes to the pipeline. My WAG
is the pipeline won't grow too much and Poulson top bin
base frequency will probably be on the order of 2.5 GHz.
Put it all together and my *uncertainty cubed* WAG for
Poulson single thread performance increase over Tukwila
with recompilation is 35% for SPECint like apps and 60
to 90% for SPECfp like apps. Not recompiling probably
costs 5-10% in performance.
Keep in mind that with promiscuous use of auto-parallel
across many cores within a socket or even across many
sockets these days SPECint2006 and SPECfp2006 scores
are increasingly poor measurements of single thread CPU
performance. Hopefully SPEC CPU 201X will address this
issue in its run rules.
---------------------------
>
>Much higher mtp vs Tuk is a must; it would be beyond ridiculous otherwise. Tell
>us one thing though: are you saying it will have much higher stp as well? How much higher then?
That is hard to say. Simply going from 2 bundle issue
to 4 bundle issue width would likely give single thread
IPC increase of 5-10% for SPECint type apps and maybe
20 to 40% for SPECfp apps with recompilation presuming
execution resources also double.
However changes to the pipeline, cache latencies, and
issue rules to improve frequency headroom, multithreading,
and power efficiency will likely cost Poulson some IPC
compared to the I2 core. Perhaps on the order of 10% on
the SPECint side and 5% on the SPECfp side.
Clock frequency is a tough one. With Tukwila the cores
get only about half the power budget of 185W. Going to
a ring bus, reduction in cache leakage from HK/MG and
far greater process headroom to implement the QPI/SMI
interface circuits will likely lead to a larger portion of the
185W budget going to the cores but the number of cores
double to 8. So how does going from ~23W 65 nm 2 banger
core to ~15W 32 nm 4 banger core affect clock frequency?
Depends strongly on the changes to the pipeline. My WAG
is the pipeline won't grow too much and Poulson top bin
base frequency will probably be on the order of 2.5 GHz.
Put it all together and my *uncertainty cubed* WAG for
Poulson single thread performance increase over Tukwila
with recompilation is 35% for SPECint like apps and 60
to 90% for SPECfp like apps. Not recompiling probably
costs 5-10% in performance.
Keep in mind that with promiscuous use of auto-parallel
across many cores within a socket or even across many
sockets these days SPECint2006 and SPECfp2006 scores
are increasingly poor measurements of single thread CPU
performance. Hopefully SPEC CPU 201X will address this
issue in its run rules.