By: Wilco (Wilco.dijkstra.delete@this.ntlworld.com), December 22, 2018 8:22 am
Room: Moderated Discussions
Travis Downs (travis.downs.delete@this.gmail.com) on December 22, 2018 7:03 am wrote:
> Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 22, 2018 4:58 am wrote:
> > Travis Downs (travis.downs.delete@this.gmail.com) on December 21, 2018 5:49 pm wrote:
> > > Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 21, 2018 3:33 pm wrote:
> > >
> > > > You have a dependency in both cases. However autoincrement actually gives fewer
> > > > dependencies and enables more reordering between different accesses.
> > >
> > > Can you elaborate? Yes, there is (at least one) dependency chain in both cases, but in the loop
> > > counter + indexing case there is only a single dependency chain associated with the loop counter,
> > > and all the accesses hang off that chain (they aren't part of any carried chain).
> > >
> > > In the auto-increment case, each access that uses auto-increment forms a new dependency chain,
> > > so for a loop with N accesses you'll have N carried dependency chains (and possibly the loop
> > > counter chain as well if you are still using a loop counter). I'm struggling to see that
> > > is "fewer dependencies and enables more reordering between different accesses".
> >
> > If you have N accesses and say 2 autoincrements split evenly you'd get 2 chains of size N/2
> > which would be independent of each other if there are no other dependencies. Now this would
> > be equally fast on most OoO cores. However on a core with partitioned resources (eg. POWER
> > 9) it would be able to run the 2 chains in the partitions independently while the single increment
> > case has more dependencies and slows down due to cross-partition penalties.
>
> Sure, introducing additional dependencies for the sake of partitioning on such uarches might speed
> things up if everything gets grouped correctly, but that's very different than the original claim
> of fewer dependencies and more reordering.
There are fewer dependencies and more reordering just like I said. Each of the 2 chains is executed independently and they can be reordered.
> You can of course use two loop counters or whatever
> if you want two separate (but just as long) dependency chains on such an architecture.
>
> Certainly it doesn't help on most OoO arches.
Absolutely, but the claim was that autoincrement creates dependencies between iterations which loop counters don't. And that's just false.
Wilco
> Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 22, 2018 4:58 am wrote:
> > Travis Downs (travis.downs.delete@this.gmail.com) on December 21, 2018 5:49 pm wrote:
> > > Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 21, 2018 3:33 pm wrote:
> > >
> > > > You have a dependency in both cases. However autoincrement actually gives fewer
> > > > dependencies and enables more reordering between different accesses.
> > >
> > > Can you elaborate? Yes, there is (at least one) dependency chain in both cases, but in the loop
> > > counter + indexing case there is only a single dependency chain associated with the loop counter,
> > > and all the accesses hang off that chain (they aren't part of any carried chain).
> > >
> > > In the auto-increment case, each access that uses auto-increment forms a new dependency chain,
> > > so for a loop with N accesses you'll have N carried dependency chains (and possibly the loop
> > > counter chain as well if you are still using a loop counter). I'm struggling to see that
> > > is "fewer dependencies and enables more reordering between different accesses".
> >
> > If you have N accesses and say 2 autoincrements split evenly you'd get 2 chains of size N/2
> > which would be independent of each other if there are no other dependencies. Now this would
> > be equally fast on most OoO cores. However on a core with partitioned resources (eg. POWER
> > 9) it would be able to run the 2 chains in the partitions independently while the single increment
> > case has more dependencies and slows down due to cross-partition penalties.
>
> Sure, introducing additional dependencies for the sake of partitioning on such uarches might speed
> things up if everything gets grouped correctly, but that's very different than the original claim
> of fewer dependencies and more reordering.
There are fewer dependencies and more reordering just like I said. Each of the 2 chains is executed independently and they can be reordered.
> You can of course use two loop counters or whatever
> if you want two separate (but just as long) dependency chains on such an architecture.
>
> Certainly it doesn't help on most OoO arches.
Absolutely, but the claim was that autoincrement creates dependencies between iterations which loop counters don't. And that's just false.
Wilco
Topic | Posted By | Date |
---|---|---|
RISC-V Summit Proceedings | Gabriele Svelto | 2018/12/19 09:36 AM |
RISC-V gut feelings | Konrad Schwarz | 2018/12/20 05:30 AM |
RISC-V inferior to ARMv8 | Heikki Kultala | 2018/12/20 08:36 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/20 02:31 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/20 03:18 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 04:43 AM |
RISC-V inferior to ARMv8 | Ronald Maas | 2018/12/21 10:35 AM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/21 11:28 AM |
RISC-V inferior to ARMv8 | Maynard Handley | 2018/12/21 03:39 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/21 04:38 PM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/23 05:39 AM |
With similar logic nor do frequency (NT) | Megol | 2018/12/23 10:45 AM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/23 05:44 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/23 07:21 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/20 04:24 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/20 05:22 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 07:16 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/22 04:53 AM |
Execution runtimes and Spectre | Foo_ | 2018/12/22 07:02 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/20 09:51 PM |
RISC-V inferior to ARMv8 | Doug S | 2018/12/21 12:10 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 12:38 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/21 03:31 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 04:23 AM |
RISC-V inferior to ARMv8 | random person | 2018/12/21 03:04 AM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/21 05:27 AM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/21 11:36 AM |
RISC-V inferior to ARMv8 | Doug S | 2018/12/21 01:02 PM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/21 11:23 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 12:21 AM |
RISC-V inferior to ARMv8 | anon | 2018/12/21 02:48 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 04:44 AM |
RISC-V inferior to ARMv8 | anon | 2018/12/21 06:24 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 05:09 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 05:28 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/21 03:27 AM |
RISC-V inferior to ARMv8 | Gabriele Svelto | 2018/12/21 02:09 PM |
RISC-V inferior to ARMv8 | Maynard Handley | 2018/12/21 03:58 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 04:43 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 06:45 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 05:37 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 07:54 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 11:32 AM |
Cracking is not free | Gabriele Svelto | 2018/12/22 03:09 AM |
Cracking is not free | Wilco | 2018/12/22 05:32 AM |
Cracking is not free | Travis Downs | 2018/12/22 08:07 AM |
Cracking is not free | Wilco | 2018/12/22 08:38 AM |
Cracking is not free | Travis Downs | 2018/12/22 08:47 AM |
Cracking is not free | Wilco | 2018/12/22 11:24 AM |
Cracking is not free | Travis Downs | 2018/12/25 04:41 PM |
Cracking is not free | anon.1 | 2018/12/25 09:14 PM |
multi-instruction decode and rename | Paul A. Clayton | 2018/12/22 07:45 PM |
Cracking is not free | Gabriele Svelto | 2018/12/22 01:30 PM |
Cracking is not free | Wilco | 2018/12/23 07:48 AM |
Cracking is not free | Michael S | 2018/12/23 09:09 AM |
Cracking is not free | Gabriele Svelto | 2018/12/26 03:53 PM |
RISC-V inferior to ARMv8 | rwessel | 2018/12/21 02:13 PM |
RISC-V inferior to ARMv8 | Seni | 2018/12/21 03:33 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 04:33 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 06:49 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 05:58 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 08:03 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 08:22 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 08:40 AM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/21 04:57 AM |
RISC-V inferior to ARMv8 | Konrad Schwarz | 2018/12/21 03:25 AM |
RISC-V inferior to ARMv8 | j | 2018/12/21 11:46 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 07:08 PM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/22 08:45 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 08:50 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/22 09:15 AM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/22 11:41 AM |
RISC-V inferior to ARMv8 | AnonQ | 2018/12/22 09:13 AM |
RISC-V gut feelings | dmcq | 2018/12/20 08:41 AM |
RISC-V initial take | Konrad Schwarz | 2018/12/21 03:17 AM |
RISC-V initial take | dmcq | 2018/12/21 04:23 AM |
RISC-V gut feelings | Montaray Jack | 2018/12/22 03:56 PM |
RISC-V gut feelings | dmcq | 2018/12/23 05:38 AM |
RISC-V Summit Proceedings | juanrga | 2018/12/21 11:47 AM |
RISC-V Summit Proceedings | dmcq | 2018/12/22 07:21 AM |
RISC-V Summit Proceedings | Montaray Jack | 2018/12/22 03:03 PM |
RISC-V Summit Proceedings | dmcq | 2018/12/23 05:39 AM |
RISC-V Summit Proceedings | anon2 | 2018/12/21 11:57 AM |
RISC-V Summit Proceedings | Michael S | 2018/12/22 09:36 AM |
RISC-V Summit Proceedings | Anon | 2018/12/22 06:51 PM |
Not Stanford MIPS but commercial MIPS | Paul A. Clayton | 2018/12/23 04:05 AM |
Not Stanford MIPS but commercial MIPS | Michael S | 2018/12/23 04:49 AM |
Not Stanford MIPS but commercial MIPS | dmcq | 2018/12/23 05:52 AM |