By: Dummond D. Slow (mental.delete@this.protozoa.us), November 19, 2020 4:10 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on November 19, 2020 9:13 am wrote:
> Adrian (a.delete@this.acm.org) on November 19, 2020 1:50 am wrote:
> > Maynard Handley (name99.delete@this.name99.org) on November 18, 2020 4:46 pm wrote:
> > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 3:17 pm wrote:
> > > > Jon Masters (jcm.delete@this.jonmasters.org) on November 18, 2020 11:46 am wrote:
> > > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > > >
> > > > > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > > > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > > > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > > > > manufacturing node" disadvantage pretty much thanks to SMT.
> > > > > >
> > > > > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > > > > up with state of the art x86's single core performance AND has process advantage.
> > > > > > It could shoot ahead in performance in two areas if it chose to:
> > > > > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > > > > gains (end energy efficiency gains, more importantly) on the table.
> > > > >
> > > > > SMT is a *terrible* idea for new designs. Sure, it gets you performance uplift, but it comes
> > > > > at a cost, particularly in terms of side-channel exposure. That is a game of whack-a-mole that
> > > > > doesn't end any time in the near future. Real cores that can be fed properly give you more
> > > > > deterministic performance without all of the other downsides from sharing resources.
> > > > >
> > > > > Jon.
> > > > >
> > > >
> > > > Everything has side channels, I stopped counting and stopped reporting on new attacks described. So I
> > > > think it makes no sense to ditch SMT on the grounds of side channel vulns because there is 10-20x more
> > > > in other parts of the CPU, discovered each year. You just sacrifice something good and then you find
> > > > it didn't help in anything. The solution for that is to handle it in CPU scheduling (don't give two threads
> > > > from one core to different VMs on server and so on + whatever more is needed for security).
> > > > You don't want shared L3 cache or multicore CPUs being abandoned either, do you?
> > >
> > > So now you're starting to concede my point.
> > > If you're going to do SMT right, you don't share threads between unrelated entities.
> > > That means
> > > (a) you don't have to waste any effort worrying about security
> > > (b) you don't have to worry at all about fairness
> > >
> > > But then if the entities are related, why do you need the OS (except to do the context swapping correctly).
> > > Provide user-level primitives to launch, kill, and synchronize with the other thread. At which point
> > > - the two threads are closely related, so resource contention is PROBABLY not an issue and
> > > - the app is in control, so to the extent that contention is an
> > > issue, the app can choose not to launch a companion thread
> > >
> > > There is a right way and a wrong way of doing things. The wrong way, in this case, is treating
> > > SMT as a fake core, rather than something like a "co-routine acceleration engine".
> >
> >
> >
> > Yes, I completely agree that the method to extract maximum performance from SMT is to not
> > treat "SMT as a fake core", but as 'something like a "co-routine acceleration engine"'.
> >
> >
> > Nevertheless, there are some kinds of applications, not many, but quite
> > important, where using the SMT threads as fake cores works very well.
> >
> > These applications have threads that do many I/O or large memory access operations
> > and the threads are well decorrelated in their access patterns.
> >
> > The typical examples are the compilation of large projects, using many concurrent
> > threads and database servers that serve many concurrent requests.
> >
> >
> > These applications, where SMT fake cores are OK, rely on randomness
> > to increase the average occupancy of the execution units.
> >
> > Optimized applications, which use SMT as a "co-routine acceleration engine", can achieve
> > much better occupancies, while other applications, whose threads are correlated in the
> > needs for resources, achieve worse performance with SMT, due to contention.
>
> At this point we hit the constant running through every aspect of this discussion:
> - do you make the one-time effort to do a job properly, or do you just half-ass it this
> year, then half-ass it again next year, then half-ass it again the following year?
>
> Yes, you can half-ass SMT. Just like you can half-ass your CPU design for ten years.
> It appears that 90% of the people on this forum prefer to half-ass everything. Thank god there
> are a few companies like ARM and Apple that don't think this way. I just want to ensure that same
> level of not "half-assing, just because that's what everyone else did" applies to Apple's implementation
> of SMT (and, in an ideal world, to ARM's canonical version, when that comes).
>
And that "half-assed" narrative is based on ...?
>
> To add to the list of ARM SMT implementations (using the half-assed worldview)
> for people really interested in this, look at the MIPS MT ASE:
> https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00452-2B-MIPSMT-WHP-01.02.pdf
>
> MIPS got a lot of this wrong (in particular they too were in love with that eternal darling of the young computer
> engineer, the promise of virtualization) but they at least approached closer to ideal than anyone else;
> with the *possibility* that what they were providing were not
> in fact vCPUs, and with some adequate user-level instructions.
> Adrian (a.delete@this.acm.org) on November 19, 2020 1:50 am wrote:
> > Maynard Handley (name99.delete@this.name99.org) on November 18, 2020 4:46 pm wrote:
> > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 3:17 pm wrote:
> > > > Jon Masters (jcm.delete@this.jonmasters.org) on November 18, 2020 11:46 am wrote:
> > > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > > >
> > > > > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > > > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > > > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > > > > manufacturing node" disadvantage pretty much thanks to SMT.
> > > > > >
> > > > > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > > > > up with state of the art x86's single core performance AND has process advantage.
> > > > > > It could shoot ahead in performance in two areas if it chose to:
> > > > > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > > > > gains (end energy efficiency gains, more importantly) on the table.
> > > > >
> > > > > SMT is a *terrible* idea for new designs. Sure, it gets you performance uplift, but it comes
> > > > > at a cost, particularly in terms of side-channel exposure. That is a game of whack-a-mole that
> > > > > doesn't end any time in the near future. Real cores that can be fed properly give you more
> > > > > deterministic performance without all of the other downsides from sharing resources.
> > > > >
> > > > > Jon.
> > > > >
> > > >
> > > > Everything has side channels, I stopped counting and stopped reporting on new attacks described. So I
> > > > think it makes no sense to ditch SMT on the grounds of side channel vulns because there is 10-20x more
> > > > in other parts of the CPU, discovered each year. You just sacrifice something good and then you find
> > > > it didn't help in anything. The solution for that is to handle it in CPU scheduling (don't give two threads
> > > > from one core to different VMs on server and so on + whatever more is needed for security).
> > > > You don't want shared L3 cache or multicore CPUs being abandoned either, do you?
> > >
> > > So now you're starting to concede my point.
> > > If you're going to do SMT right, you don't share threads between unrelated entities.
> > > That means
> > > (a) you don't have to waste any effort worrying about security
> > > (b) you don't have to worry at all about fairness
> > >
> > > But then if the entities are related, why do you need the OS (except to do the context swapping correctly).
> > > Provide user-level primitives to launch, kill, and synchronize with the other thread. At which point
> > > - the two threads are closely related, so resource contention is PROBABLY not an issue and
> > > - the app is in control, so to the extent that contention is an
> > > issue, the app can choose not to launch a companion thread
> > >
> > > There is a right way and a wrong way of doing things. The wrong way, in this case, is treating
> > > SMT as a fake core, rather than something like a "co-routine acceleration engine".
> >
> >
> >
> > Yes, I completely agree that the method to extract maximum performance from SMT is to not
> > treat "SMT as a fake core", but as 'something like a "co-routine acceleration engine"'.
> >
> >
> > Nevertheless, there are some kinds of applications, not many, but quite
> > important, where using the SMT threads as fake cores works very well.
> >
> > These applications have threads that do many I/O or large memory access operations
> > and the threads are well decorrelated in their access patterns.
> >
> > The typical examples are the compilation of large projects, using many concurrent
> > threads and database servers that serve many concurrent requests.
> >
> >
> > These applications, where SMT fake cores are OK, rely on randomness
> > to increase the average occupancy of the execution units.
> >
> > Optimized applications, which use SMT as a "co-routine acceleration engine", can achieve
> > much better occupancies, while other applications, whose threads are correlated in the
> > needs for resources, achieve worse performance with SMT, due to contention.
>
> At this point we hit the constant running through every aspect of this discussion:
> - do you make the one-time effort to do a job properly, or do you just half-ass it this
> year, then half-ass it again next year, then half-ass it again the following year?
>
> Yes, you can half-ass SMT. Just like you can half-ass your CPU design for ten years.
> It appears that 90% of the people on this forum prefer to half-ass everything. Thank god there
> are a few companies like ARM and Apple that don't think this way. I just want to ensure that same
> level of not "half-assing, just because that's what everyone else did" applies to Apple's implementation
> of SMT (and, in an ideal world, to ARM's canonical version, when that comes).
>
And that "half-assed" narrative is based on ...?
>
> To add to the list of ARM SMT implementations (using the half-assed worldview)
> for people really interested in this, look at the MIPS MT ASE:
> https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00452-2B-MIPSMT-WHP-01.02.pdf
>
> MIPS got a lot of this wrong (in particular they too were in love with that eternal darling of the young computer
> engineer, the promise of virtualization) but they at least approached closer to ideal than anyone else;
> with the *possibility* that what they were providing were not
> in fact vCPUs, and with some adequate user-level instructions.