By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), November 17, 2010 8:27 am
Room: Moderated Discussions
someone (someone@somewhere.com) on 11/17/10 wrote:
>
>I think Atom may be a preview to the type of MT used
>in Poulson. Atom is a two-issue wide in-order design
>that employs SMT. It will try to issue two instructions
>from the primary thread but if it can't it will try to fill in
>the second slot from the secondary thread.
That would clearly work much better than the braindamaged
SoEMT in current Itanium implementations that is almost
entirely useless and wouldn't fill in any extra units at
all. But if the L1 latency has increased I don't think
it's really enough.
Because with just two threads, and each of them usually
only being able to schedule two bundles each, and a
cycle or two of L1 latency added into it, the four-banger
looks like it's idle a big portion of the time.
Now, if they have four threads, that might be more
reasonable. That way there really is enough "extra"
bundles to schedule that you can not only fill the units,
but also fill the dead slots due to waiting for L1. Then
you could still stay in-order.
Of course, I still think that would still be a pretty
boring design.
I'd much rather hear something more radical than "more
threads, and done better". With OoO being top of my list.
But I think "four in-order threads and increased frequency
by making the L1 latencies more realistic" at least sounds
like a halfway sane design.
And sanity has long been lacking in the Itanium world.
Linus
>
>I think Atom may be a preview to the type of MT used
>in Poulson. Atom is a two-issue wide in-order design
>that employs SMT. It will try to issue two instructions
>from the primary thread but if it can't it will try to fill in
>the second slot from the secondary thread.
That would clearly work much better than the braindamaged
SoEMT in current Itanium implementations that is almost
entirely useless and wouldn't fill in any extra units at
all. But if the L1 latency has increased I don't think
it's really enough.
Because with just two threads, and each of them usually
only being able to schedule two bundles each, and a
cycle or two of L1 latency added into it, the four-banger
looks like it's idle a big portion of the time.
Now, if they have four threads, that might be more
reasonable. That way there really is enough "extra"
bundles to schedule that you can not only fill the units,
but also fill the dead slots due to waiting for L1. Then
you could still stay in-order.
Of course, I still think that would still be a pretty
boring design.
I'd much rather hear something more radical than "more
threads, and done better". With OoO being top of my list.
But I think "four in-order threads and increased frequency
by making the L1 latencies more realistic" at least sounds
like a halfway sane design.
And sanity has long been lacking in the Itanium world.
Linus