By: Brendan (btrotter.delete@this.gmail.com), April 11, 2021 4:18 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on April 11, 2021 3:34 am wrote:
> Brendan (btrotter.delete@this.gmail.com) on April 10, 2021 11:27 pm wrote:
> >
> > As the number of CPUs involved increases (and as transaction get larger),
> > the risk of contention/aborts increases (and the risk of wasting time/power
> > doing work that must be thrown away increases) so the benefits decrease.
> >
>
> No, it does not.
> When # of cores is small, "one global lock" type of design works very well, so fancier solutions
> for scalability are not needed. Even when 100% of AcquireLock operations cause cache line
> bouncing, on CPU with Skylake-client cache hierarchy is not really that costly.
>
> And if you somehow ended up in situation of high lock contention with just 4-6 cores, it means
> that your whole design is non-scalable and replacement of lock by TM is not going to help you.
Erm. It looks like you're trying to say I'm wrong while providing examples that prove that I'm right.
When # of cores/threads is small "one global lock" can work well (but it sucks badly when you increase the number of cores/threads fighting for that one global lock). When # of cores/threads is medium (4 to 8 I guess) "100 locks" can work well (but "100 locks" will probably start to suck when you increase the number of cores/threads to 100 or more). Do you see any kind of pattern here?
"As the number of CPUs involved increases (but the number of locks that those CPUs are fighting for stays the same) the chance of contention increases" is like saying "water is wet".
Note: I also have no idea why you think cache line bouncing is relevant while CPUs are wasting hundreds of cycles spinning on a spinlock, or wasting thousands of cycles doing task switches waiting for a mutex, or wasting who knows how much time calculating the discarded result of an aborted transaction. I load one cache line, then spend 200+ cycles calculating results (in registers), then store one cache line; but the transaction gets aborted so I'm supposed to ignore the 200+ cycles of wasted time that had nothing to do with memory accesses at all?
- Brendan
> Brendan (btrotter.delete@this.gmail.com) on April 10, 2021 11:27 pm wrote:
> >
> > As the number of CPUs involved increases (and as transaction get larger),
> > the risk of contention/aborts increases (and the risk of wasting time/power
> > doing work that must be thrown away increases) so the benefits decrease.
> >
>
> No, it does not.
> When # of cores is small, "one global lock" type of design works very well, so fancier solutions
> for scalability are not needed. Even when 100% of AcquireLock operations cause cache line
> bouncing, on CPU with Skylake-client cache hierarchy is not really that costly.
>
> And if you somehow ended up in situation of high lock contention with just 4-6 cores, it means
> that your whole design is non-scalable and replacement of lock by TM is not going to help you.
Erm. It looks like you're trying to say I'm wrong while providing examples that prove that I'm right.
When # of cores/threads is small "one global lock" can work well (but it sucks badly when you increase the number of cores/threads fighting for that one global lock). When # of cores/threads is medium (4 to 8 I guess) "100 locks" can work well (but "100 locks" will probably start to suck when you increase the number of cores/threads to 100 or more). Do you see any kind of pattern here?
"As the number of CPUs involved increases (but the number of locks that those CPUs are fighting for stays the same) the chance of contention increases" is like saying "water is wet".
Note: I also have no idea why you think cache line bouncing is relevant while CPUs are wasting hundreds of cycles spinning on a spinlock, or wasting thousands of cycles doing task switches waiting for a mutex, or wasting who knows how much time calculating the discarded result of an aborted transaction. I load one cache line, then spend 200+ cycles calculating results (in registers), then store one cache line; but the transaction gets aborted so I'm supposed to ignore the 200+ cycles of wasted time that had nothing to do with memory accesses at all?
- Brendan