By: dmcq (dmcq.delete@this.fano.co.uk), September 1, 2018 4:28 am
Room: Moderated Discussions
hobold (hobold.delete@this.vectorizer.org) on August 31, 2018 1:50 am wrote:
> Adrian (a.delete@this.acm.org) on August 30, 2018 11:37 am wrote:
> > Konrad Schwarz (no.spam.delete@this.no.spam) on August 30, 2018 6:37 am wrote:
> > > Maynard Handley (name99.delete@this.name99.org) on August 18, 2018 4:22 pm wrote:
> > > > - you offer an atomic remote zero (or remote fill) instruction that is given a line address
> > > > plus count. The first thing it does is flush the line from all other caches, then it marks
> > > > the set of lines locally as in some sort of intermediate state, then, fast as possible, it
> > > > starts zeroing each line and, once zeroed, it goes back to a standard "valid" state.
> > >
> > > Power(PC) Data Cache Block Allocate (dcba) and Data Cache Block
> > > Zero (dcbz) instructions do this, albeit on a single cache line.
> >
> >
> > Now also AMD Zen has introduced CLZERO (i.e. the equivalent of dcbz).
> >
>
> For Apple, dcbz turned into a liability when they moved to 64 bits. It wasn't the 64 bit ISA that
> was at fault, but the cache line size of the 64 bit capable CPU model(s) differed. All prior 32
> bit PPC processors had used 32 byte sized cache blocks, but the new one used 128 bytes.
>
> The dcbz instruction was architected to clear a cache block at whatever native size the machine
> had, but programmers everywhere simply assumed the instruction to clear 32 bytes.
>
> Thus the new 64 bit machine re-defined the instruction to mean "clear 32 bytes",
> and it was rather slow and inefficient. Additionally, a new instruction dcbzl
> was introduced to _really_ mean "clear a cache line of native size".
>
> I guess a future extension would have had to redefine the meaning of dcbzl to "clear 128 bytes".
>
>
> Lesson learned: to safely deploy a user-land instruction which manipulates a cache block, you have
> to pair it with a user-land instruction to obtain the current actual block size. And you have to
> have varying hardware models concurrently out in the field widely available for testing.
>
> Otherwise programmers will use the instruction not for what
> you intended it to do, but for whatever they want.
>
> Chances are that CLZERO will trigger another wave of painful learning.
Or have an instruction where you give the start and limit and it increments the start address if it doesn't do the whole range. Or something like one of the x86 string instructions.
> Adrian (a.delete@this.acm.org) on August 30, 2018 11:37 am wrote:
> > Konrad Schwarz (no.spam.delete@this.no.spam) on August 30, 2018 6:37 am wrote:
> > > Maynard Handley (name99.delete@this.name99.org) on August 18, 2018 4:22 pm wrote:
> > > > - you offer an atomic remote zero (or remote fill) instruction that is given a line address
> > > > plus count. The first thing it does is flush the line from all other caches, then it marks
> > > > the set of lines locally as in some sort of intermediate state, then, fast as possible, it
> > > > starts zeroing each line and, once zeroed, it goes back to a standard "valid" state.
> > >
> > > Power(PC) Data Cache Block Allocate (dcba) and Data Cache Block
> > > Zero (dcbz) instructions do this, albeit on a single cache line.
> >
> >
> > Now also AMD Zen has introduced CLZERO (i.e. the equivalent of dcbz).
> >
>
> For Apple, dcbz turned into a liability when they moved to 64 bits. It wasn't the 64 bit ISA that
> was at fault, but the cache line size of the 64 bit capable CPU model(s) differed. All prior 32
> bit PPC processors had used 32 byte sized cache blocks, but the new one used 128 bytes.
>
> The dcbz instruction was architected to clear a cache block at whatever native size the machine
> had, but programmers everywhere simply assumed the instruction to clear 32 bytes.
>
> Thus the new 64 bit machine re-defined the instruction to mean "clear 32 bytes",
> and it was rather slow and inefficient. Additionally, a new instruction dcbzl
> was introduced to _really_ mean "clear a cache line of native size".
>
> I guess a future extension would have had to redefine the meaning of dcbzl to "clear 128 bytes".
>
>
> Lesson learned: to safely deploy a user-land instruction which manipulates a cache block, you have
> to pair it with a user-land instruction to obtain the current actual block size. And you have to
> have varying hardware models concurrently out in the field widely available for testing.
>
> Otherwise programmers will use the instruction not for what
> you intended it to do, but for whatever they want.
>
> Chances are that CLZERO will trigger another wave of painful learning.
Or have an instruction where you give the start and limit and it increments the start address if it doesn't do the whole range. Or something like one of the x86 string instructions.
Topic | Posted By | Date |
---|---|---|
ARM turns to a god and a hero | AM | 2018/08/16 08:32 AM |
ARM turns to a god and a hero | Maynard Handley | 2018/08/16 08:41 AM |
ARM turns to a god and a hero | Doug S | 2018/08/16 10:11 AM |
ARM turns to a god and a hero | Geoff Langdale | 2018/08/16 10:59 PM |
ARM turns to a god and a hero | dmcq | 2018/08/17 04:12 AM |
ARM is somewhat misleading | Adrian | 2018/08/16 10:56 PM |
It's marketing material | Gabriele Svelto | 2018/08/17 12:00 AM |
It's marketing material | Michael S | 2018/08/17 02:13 AM |
It's marketing material | dmcq | 2018/08/17 04:23 AM |
It's marketing material | Andrei Frumusanu | 2018/08/17 06:25 AM |
It's marketing material | Linus Torvalds | 2018/08/17 10:20 AM |
It's marketing material | Groo | 2018/08/17 12:44 PM |
It's marketing material | Doug S | 2018/08/17 01:14 PM |
promises and deliveries | AM | 2018/08/17 01:32 PM |
promises and deliveries | Passing Through | 2018/08/17 02:02 PM |
Just by way of clarification | Passing Through | 2018/08/17 02:15 PM |
Just by way of clarification | AM | 2018/08/18 11:49 AM |
Just by way of clarification | Passing Through | 2018/08/18 12:34 PM |
This ain't the nineties any longer | Passing Through | 2018/08/18 12:54 PM |
This ain't the nineties any longer | Maynard Handley | 2018/08/18 01:50 PM |
This ain't the nineties any longer | Passing Through | 2018/08/18 02:57 PM |
This ain't the nineties any longer | Passing Through | 2018/09/06 01:42 PM |
This ain't the nineties any longer | Maynard Handley | 2018/09/07 03:10 PM |
This ain't the nineties any longer | Passing Through | 2018/09/07 03:48 PM |
This ain't the nineties any longer | Maynard Handley | 2018/09/07 04:22 PM |
Just by way of clarification | Wilco | 2018/08/18 12:26 PM |
Just by way of clarification | Passing Through | 2018/08/18 12:39 PM |
Just by way of clarification | none | 2018/08/18 09:52 PM |
Just by way of clarification | dmcq | 2018/08/19 07:32 AM |
Just by way of clarification | none | 2018/08/19 07:54 AM |
Just by way of clarification | dmcq | 2018/08/19 10:24 AM |
Just by way of clarification | none | 2018/08/19 10:52 AM |
Just by way of clarification | Gabriele Svelto | 2018/08/19 05:41 AM |
Just by way of clarification | Passing Through | 2018/08/19 08:25 AM |
Whiteboards at Gatwick airport anyone? | Passing Through | 2018/08/20 03:24 AM |
It's marketing material | Michael S | 2018/08/18 10:12 AM |
It's marketing material | Brett | 2018/08/18 04:22 PM |
It's marketing material | Brett | 2018/08/18 04:33 PM |
It's marketing material | Adrian | 2018/08/19 12:21 AM |
A76 | AM | 2018/08/17 01:45 PM |
A76 | Michael S | 2018/08/18 10:20 AM |
A76 | AM | 2018/08/18 11:39 AM |
A76 | Michael S | 2018/08/18 11:49 AM |
A76 | AM | 2018/08/18 12:06 PM |
A76 | Doug S | 2018/08/18 12:43 PM |
A76 | Maynard Handley | 2018/08/18 01:42 PM |
A76 | Maynard Handley | 2018/08/18 03:22 PM |
Why write zeros when one can use metadata? | Paul A. Clayton | 2018/08/18 05:19 PM |
Why write zeros when one can use metadata? | Maynard Handley | 2018/08/19 10:12 AM |
Dictionary compress might apply to memcopy | Paul A. Clayton | 2018/08/19 12:45 PM |
Instructions for zeroing | Konrad Schwarz | 2018/08/30 05:37 AM |
Instructions for zeroing | Maynard Handley | 2018/08/30 07:41 AM |
Instructions for zeroing | Adrian | 2018/08/30 10:37 AM |
dcbz -> dcbzl (was: Instructions for zeroing) | hobold | 2018/08/31 12:50 AM |
dcbz -> dcbzl (was: Instructions for zeroing) | dmcq | 2018/09/01 04:28 AM |
A76 | Travis | 2018/08/19 10:36 AM |
A76 | Maynard Handley | 2018/08/19 11:22 AM |
A76 | Travis | 2018/08/19 01:07 PM |
A76 | Maynard Handley | 2018/08/19 05:24 PM |
Remote atomics | matthew | 2018/08/19 11:51 AM |
Remote atomics | Michael S | 2018/08/19 12:58 PM |
Remote atomics | matthew | 2018/08/19 01:32 PM |
Remote atomics | Michael S | 2018/08/19 01:36 PM |
Remote atomics | matthew | 2018/08/19 01:48 PM |
Remote atomics | Michael S | 2018/08/19 02:16 PM |
Remote atomics | Ricardo B | 2018/08/20 09:05 AM |
Remote atomics | dmcq | 2018/08/19 01:33 PM |
Remote atomics | Travis | 2018/08/19 01:32 PM |
Remote atomics | Michael S | 2018/08/19 01:46 PM |
Remote atomics | Travis | 2018/08/19 04:35 PM |
Remote atomics | Michael S | 2018/08/20 02:29 AM |
Remote atomics | matthew | 2018/08/19 06:58 PM |
Remote atomics | anon | 2018/08/19 11:59 PM |
Remote atomics | Travis | 2018/08/20 09:26 AM |
Remote atomics | Travis | 2018/08/20 08:57 AM |
Remote atomics | Linus Torvalds | 2018/08/20 03:29 PM |
Fitting time slices to execution phases | Paul A. Clayton | 2018/08/21 08:09 AM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 01:34 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 02:31 PM |
Fitting time slices to execution phases | Gabriele Svelto | 2018/08/21 02:54 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 03:26 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 03:21 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 03:39 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 03:59 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 04:13 PM |
Fitting time slices to execution phases | anon | 2018/08/21 03:27 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 05:02 PM |
Fitting time slices to execution phases | Etienne | 2018/08/22 01:28 AM |
Fitting time slices to execution phases | Gabriele Svelto | 2018/08/22 02:07 PM |
Fitting time slices to execution phases | Travis | 2018/08/22 03:00 PM |
Fitting time slices to execution phases | anon | 2018/08/22 05:52 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 03:37 PM |
Is preventing misuse that complex? | Paul A. Clayton | 2018/08/23 04:42 AM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/23 11:46 AM |
Is preventing misuse that complex? | Travis | 2018/08/23 12:29 PM |
Is preventing misuse that complex? | Travis | 2018/08/23 12:33 PM |
Is preventing misuse that complex? | Jeff S. | 2018/08/24 06:57 AM |
Is preventing misuse that complex? | Travis | 2018/08/24 07:47 AM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/23 01:30 PM |
Is preventing misuse that complex? | Travis | 2018/08/23 02:11 PM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/24 12:00 PM |
Is preventing misuse that complex? | Gabriele Svelto | 2018/08/24 12:25 PM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/24 12:33 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 02:54 PM |
rseq: holy grail rwlock? | Travis | 2018/08/21 02:18 PM |
rseq: holy grail rwlock? | Linus Torvalds | 2018/08/21 02:59 PM |
rseq: holy grail rwlock? | Travis | 2018/08/21 03:27 PM |
rseq: holy grail rwlock? | Linus Torvalds | 2018/08/21 04:10 PM |
rseq: holy grail rwlock? | Travis | 2018/08/21 05:21 PM |
ARM design houses | Michael S | 2018/08/21 04:07 AM |
ARM design houses | Wilco | 2018/08/22 11:38 AM |
ARM design houses | Michael S | 2018/08/22 01:21 PM |
ARM design houses | Wilco | 2018/08/22 02:23 PM |
ARM design houses | Michael S | 2018/08/29 12:58 AM |
Qualcomm's core naming scheme really, really sucks | Heikki Kultala | 2018/08/29 01:19 AM |
A76 | Maynard Handley | 2018/08/18 01:07 PM |
A76 | Michael S | 2018/08/18 01:32 PM |
A76 | Maynard Handley | 2018/08/18 01:52 PM |
A76 | Michael S | 2018/08/18 02:04 PM |
ARM is somewhat misleading | juanrga | 2018/08/17 12:20 AM |
Surprised?? | Alberto | 2018/08/17 12:52 AM |
Surprised?? | Alberto | 2018/08/17 01:10 AM |
Surprised?? | none | 2018/08/17 01:46 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 06:30 AM |
Garbage talk | Michael S | 2018/08/17 06:43 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 08:51 AM |
Garbage talk | Michael S | 2018/08/18 10:29 AM |
Garbage talk | Adrian | 2018/08/17 07:28 AM |
Garbage talk | Alberto | 2018/08/17 08:20 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 08:48 AM |
Garbage talk | Adrian | 2018/08/17 09:17 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 09:36 AM |
Garbage talk | Adrian | 2018/08/17 01:53 PM |
Garbage talk | Andrei Frumusanu | 2018/08/17 11:17 PM |
More like a religion he?? ARM has an easy life :) | Alberto | 2018/08/17 08:13 AM |
More like a religion he?? ARM has an easy life :) | Andrei Frumusanu | 2018/08/17 08:34 AM |
More like a religion he?? ARM has an easy life :) | Alberto | 2018/08/17 09:03 AM |
More like a religion he?? ARM has an easy life :) | Andrei Frumusanu | 2018/08/17 09:43 AM |
More like a religion he?? ARM has an easy life :) | Doug S | 2018/08/17 01:17 PM |
15W phone SoCs | AM | 2018/08/17 02:04 PM |
More like a religion he?? ARM has an easy life :) | Maynard Handley | 2018/08/17 11:29 AM |
my future stuff will be better than your old stuff, hey I'm a god at last (NT) | Eric Bron | 2018/08/18 02:34 AM |
my future stuff will be better than your old stuff, hey I'm a god at last | none | 2018/08/18 07:34 AM |