Article: Knights Landing CPU Speculation
By: Patrick Chase (patrickjchase.delete@this.gmail.com), November 25, 2013 11:52 am
Room: Moderated Discussions
Bhima (Bhima.Pandava.delete@this.gmail.com) on November 25, 2013 8:01 am wrote:
> Patrick Chase (patrickjchase.delete@this.hp.com) on November 23, 2013 3:37 pm wrote:
> > The real benefit is that with TM you can often get close to that performance with
> > a "naive", coarse-grained locking implementation that takes an order of magnitude
> > less effort to develop. Real world HPC folks care deeply about that sort of thing...
>
> I'm really glad you brought this up because when I read this I was a bit confused with
> that statement as it was my understanding that the motivations to include transactional
> memory in the other Xeons were in part to simplify memory management for the many
> parallel processes that having all those cores & threads (in the regular Xeons) enables.
I think that's about right, though I would change "simplify memory management" to "simplify synchronization". The difference in complexity between:
1. A single big lock that protects everything
and:
2. Fine-grained locking such that no two processes take the same lock unless they're actually going to perform conflicting accesses
Is *huge*. If you can use TM-based speculation to get adequate performance from (1) then you're way ahead in the game, both in terms of development effort and likelihood of correctness.
For example the a single lock is inherently deadlock-free, while fine-grained locking tends to be a disaster. If you always know what set of locks you'll need ahead of time then there's a simple fix (impose a fixed order on all locks in the system and always take the set of locks you need in that order - you can't have a cycle in an ordered graph if you never go backwards) but that's often not the case.
The catch is of course that TM has limited "capacity" (you can only read/write so much data before you're forced to abort), so you probably couldn't use it to mitigate the impact of really "huge" locks such as the dearly departed Linux BKL or the Python GIL.
> Hopefully, this topic will be included in the upcoming article and more fully discussed.
> I find the concept of transactional memory intriguing but clearly I must be
> misunderstanding something important.
I don't see anything in your comments that indicates lack of understanding. I think that a lot of OTHER people made the mistake of evaluating it as though it were a performance enhancement rather than a simplification.
> Patrick Chase (patrickjchase.delete@this.hp.com) on November 23, 2013 3:37 pm wrote:
> > The real benefit is that with TM you can often get close to that performance with
> > a "naive", coarse-grained locking implementation that takes an order of magnitude
> > less effort to develop. Real world HPC folks care deeply about that sort of thing...
>
> I'm really glad you brought this up because when I read this I was a bit confused with
> that statement as it was my understanding that the motivations to include transactional
> memory in the other Xeons were in part to simplify memory management for the many
> parallel processes that having all those cores & threads (in the regular Xeons) enables.
I think that's about right, though I would change "simplify memory management" to "simplify synchronization". The difference in complexity between:
1. A single big lock that protects everything
and:
2. Fine-grained locking such that no two processes take the same lock unless they're actually going to perform conflicting accesses
Is *huge*. If you can use TM-based speculation to get adequate performance from (1) then you're way ahead in the game, both in terms of development effort and likelihood of correctness.
For example the a single lock is inherently deadlock-free, while fine-grained locking tends to be a disaster. If you always know what set of locks you'll need ahead of time then there's a simple fix (impose a fixed order on all locks in the system and always take the set of locks you need in that order - you can't have a cycle in an ordered graph if you never go backwards) but that's often not the case.
The catch is of course that TM has limited "capacity" (you can only read/write so much data before you're forced to abort), so you probably couldn't use it to mitigate the impact of really "huge" locks such as the dearly departed Linux BKL or the Python GIL.
> Hopefully, this topic will be included in the upcoming article and more fully discussed.
> I find the concept of transactional memory intriguing but clearly I must be
> misunderstanding something important.
I don't see anything in your comments that indicates lack of understanding. I think that a lot of OTHER people made the mistake of evaluating it as though it were a performance enhancement rather than a simplification.
Topic | Posted By | Date |
---|---|---|
Knights Landing CPU Speculation | David Kanter | 2013/11/18 02:03 AM |
Knights Landing CPU Speculation | none | 2013/11/18 02:59 AM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 03:18 PM |
Knights Landing CPU Speculation | 2013/11/26 01:20 AM | |
Over 2,000 mm^2 of eDRAM??? | Mark Roulo | 2013/11/26 09:28 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 11:09 AM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 11:21 AM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 11:50 AM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 01:07 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 01:09 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/26 03:03 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 11:42 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/27 10:31 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 04:25 PM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 07:01 PM |
Over 2,000 mm^2 of eDRAM??? | Eric | 2013/11/27 02:54 AM |
eDRAM is DRAM in a logic-oriented process | Paul A. Clayton | 2013/11/27 07:10 AM |
Knights Landing CPU Speculation | James | 2013/11/18 05:26 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/18 02:57 PM |
Knights Landing CPU Speculation | Urban Novak | 2013/11/19 12:49 AM |
Knights Landing CPU Speculation | none | 2013/11/19 01:19 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 07:48 PM |
Total GPGPU/Xeon Phi market maybe ~ $500M/year ... | Mark Roulo | 2013/11/20 10:35 AM |
Knights Landing CPU Speculation | Wes Felter | 2013/11/19 12:06 PM |
Knights Landing CPU Speculation | Michael S | 2013/11/19 12:49 PM |
Knights Landing CPU Speculation | Eric | 2013/11/18 12:17 PM |
Knights Landing CPU Speculation | Daniel | 2013/11/19 02:28 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 07:36 PM |
HPC guys score FLOPS non-obviously | Mark Roulo | 2013/11/20 10:43 AM |
3-TFlops-DGEMM | Michael S | 2013/11/20 10:59 AM |
3-TFlops-DGEMM | Mark Roulo | 2013/11/20 12:22 PM |
3-TFlops-DGEMM | Daniel | 2013/11/20 01:04 PM |
3-TFlops-DGEMM | Eric | 2013/11/21 01:28 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 05:48 AM |
3-TFlops-DGEMM | RecessionCone | 2013/11/21 11:13 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 02:34 PM |
3-TFlops-DGEMM | Eric | 2013/11/22 02:10 AM |
3-TFlops-DGEMM | Michael S | 2013/11/22 04:41 AM |
A (not very sensible) alternative: FMADD + FADD | Paul A. Clayton | 2013/11/22 08:19 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 02:37 AM |
3-TFlops-DGEMM | Michael S | 2013/11/24 06:06 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 09:28 AM |
HPC guys score FLOPS non-obviously | Patrick Chase | 2013/11/23 02:58 PM |
Knights Landing CPU Speculation | Paul Caheny | 2013/11/18 01:25 PM |
Knights Landing CPU Speculation | Konrad Schwarz | 2013/11/19 12:24 AM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/19 07:36 PM |
Knights Landing CPU Speculation | David Kanter | 2013/11/20 09:52 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/21 02:12 PM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/21 05:14 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 03:33 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 11:29 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 12:05 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/25 12:22 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/26 10:11 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 03:05 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 03:15 AM |
Knights Landing CPU Speculation | none | 2013/11/26 03:33 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 06:30 PM |
Knights Landing CPU Speculation | Eric | 2013/11/26 06:34 PM |
What is MCDRAM? | anon | 2013/11/26 08:58 PM |
What is MCDRAM? | none | 2013/11/27 01:00 AM |
What is MCDRAM? | Klimax | 2013/11/27 02:19 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/26 11:06 PM |
Knights Landing CPU Speculation | Klimax | 2013/11/26 11:05 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 05:53 AM |
Knights Landing CPU Speculation | none | 2013/11/26 06:20 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/26 08:06 AM |
Knights Landing CPU Speculation | none | 2013/11/26 09:18 AM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 01:21 PM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 01:27 PM |
Knights Landing CPU Speculation | none | 2013/11/26 02:26 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 05:42 PM |
Knights Landing CPU Speculation | none | 2013/11/27 01:08 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 01:50 AM |
Knights Landing CPU Speculation | none | 2013/11/27 01:58 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 01:25 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 02:32 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 03:08 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 01:28 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 02:53 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 01:03 PM |
BiG.LiTTLe for KNL? | Jeff K | 2013/11/22 06:17 AM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 02:54 PM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 03:01 PM |
Transactional memory | Patrick Chase | 2013/11/23 02:37 PM |
Transactional memory | Bhima | 2013/11/25 07:01 AM |
Transactional memory | Patrick Chase | 2013/11/25 11:52 AM |
Knights Landing CPU Speculation | Daniel | 2013/11/25 02:17 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 03:12 AM |
Knights Landing CPU Speculation | none | 2013/11/25 04:05 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 04:45 AM |
Knights Landing CPU Speculation | none | 2013/11/25 04:55 AM |
Knights Landing CPU Speculation | gmb | 2013/11/25 07:21 AM |