By: Travis Downs (travis.downs.delete@this.gmail.com), December 22, 2018 7:03 am
Room: Moderated Discussions
Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 22, 2018 4:58 am wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on December 21, 2018 5:49 pm wrote:
> > Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 21, 2018 3:33 pm wrote:
> >
> > > You have a dependency in both cases. However autoincrement actually gives fewer
> > > dependencies and enables more reordering between different accesses.
> >
> > Can you elaborate? Yes, there is (at least one) dependency chain in both cases, but in the loop
> > counter + indexing case there is only a single dependency chain associated with the loop counter,
> > and all the accesses hang off that chain (they aren't part of any carried chain).
> >
> > In the auto-increment case, each access that uses auto-increment forms a new dependency chain,
> > so for a loop with N accesses you'll have N carried dependency chains (and possibly the loop
> > counter chain as well if you are still using a loop counter). I'm struggling to see that
> > is "fewer dependencies and enables more reordering between different accesses".
>
> If you have N accesses and say 2 autoincrements split evenly you'd get 2 chains of size N/2
> which would be independent of each other if there are no other dependencies. Now this would
> be equally fast on most OoO cores. However on a core with partitioned resources (eg. POWER
> 9) it would be able to run the 2 chains in the partitions independently while the single increment
> case has more dependencies and slows down due to cross-partition penalties.
Sure, introducing additional dependencies for the sake of partitioning on such uarches might speed things up if everything gets grouped correctly, but that's very different than the original claim of fewer dependencies and more reordering. You can of course use two loop counters or whatever if you want two separate (but just as long) dependency chains on such an architecture.
Certainly it doesn't help on most OoO arches.
> Travis Downs (travis.downs.delete@this.gmail.com) on December 21, 2018 5:49 pm wrote:
> > Wilco (Wilco.dijkstra.delete@this.ntlworld.com) on December 21, 2018 3:33 pm wrote:
> >
> > > You have a dependency in both cases. However autoincrement actually gives fewer
> > > dependencies and enables more reordering between different accesses.
> >
> > Can you elaborate? Yes, there is (at least one) dependency chain in both cases, but in the loop
> > counter + indexing case there is only a single dependency chain associated with the loop counter,
> > and all the accesses hang off that chain (they aren't part of any carried chain).
> >
> > In the auto-increment case, each access that uses auto-increment forms a new dependency chain,
> > so for a loop with N accesses you'll have N carried dependency chains (and possibly the loop
> > counter chain as well if you are still using a loop counter). I'm struggling to see that
> > is "fewer dependencies and enables more reordering between different accesses".
>
> If you have N accesses and say 2 autoincrements split evenly you'd get 2 chains of size N/2
> which would be independent of each other if there are no other dependencies. Now this would
> be equally fast on most OoO cores. However on a core with partitioned resources (eg. POWER
> 9) it would be able to run the 2 chains in the partitions independently while the single increment
> case has more dependencies and slows down due to cross-partition penalties.
Sure, introducing additional dependencies for the sake of partitioning on such uarches might speed things up if everything gets grouped correctly, but that's very different than the original claim of fewer dependencies and more reordering. You can of course use two loop counters or whatever if you want two separate (but just as long) dependency chains on such an architecture.
Certainly it doesn't help on most OoO arches.
Topic | Posted By | Date |
---|---|---|
RISC-V Summit Proceedings | Gabriele Svelto | 2018/12/19 08:36 AM |
RISC-V gut feelings | Konrad Schwarz | 2018/12/20 04:30 AM |
RISC-V inferior to ARMv8 | Heikki Kultala | 2018/12/20 07:36 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/20 01:31 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/20 02:18 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 03:43 AM |
RISC-V inferior to ARMv8 | Ronald Maas | 2018/12/21 09:35 AM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/21 10:28 AM |
RISC-V inferior to ARMv8 | Maynard Handley | 2018/12/21 02:39 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/21 03:38 PM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/23 04:39 AM |
With similar logic nor do frequency (NT) | Megol | 2018/12/23 09:45 AM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/23 04:44 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/23 06:21 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/20 03:24 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/20 04:22 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 06:16 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/22 03:53 AM |
Execution runtimes and Spectre | Foo_ | 2018/12/22 06:02 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/20 08:51 PM |
RISC-V inferior to ARMv8 | Doug S | 2018/12/20 11:10 PM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/20 11:38 PM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/21 02:31 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 03:23 AM |
RISC-V inferior to ARMv8 | random person | 2018/12/21 02:04 AM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/21 04:27 AM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/21 10:36 AM |
RISC-V inferior to ARMv8 | Doug S | 2018/12/21 12:02 PM |
RISC-V inferior to ARMv8 | juanrga | 2018/12/21 10:23 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/20 11:21 PM |
RISC-V inferior to ARMv8 | anon | 2018/12/21 01:48 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 03:44 AM |
RISC-V inferior to ARMv8 | anon | 2018/12/21 05:24 AM |
RISC-V inferior to ARMv8 | Adrian | 2018/12/21 04:09 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 04:28 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/21 02:27 AM |
RISC-V inferior to ARMv8 | Gabriele Svelto | 2018/12/21 01:09 PM |
RISC-V inferior to ARMv8 | Maynard Handley | 2018/12/21 02:58 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 03:43 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 05:45 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 04:37 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 06:54 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 10:32 AM |
Cracking is not free | Gabriele Svelto | 2018/12/22 02:09 AM |
Cracking is not free | Wilco | 2018/12/22 04:32 AM |
Cracking is not free | Travis Downs | 2018/12/22 07:07 AM |
Cracking is not free | Wilco | 2018/12/22 07:38 AM |
Cracking is not free | Travis Downs | 2018/12/22 07:47 AM |
Cracking is not free | Wilco | 2018/12/22 10:24 AM |
Cracking is not free | Travis Downs | 2018/12/25 03:41 PM |
Cracking is not free | anon.1 | 2018/12/25 08:14 PM |
multi-instruction decode and rename | Paul A. Clayton | 2018/12/22 06:45 PM |
Cracking is not free | Gabriele Svelto | 2018/12/22 12:30 PM |
Cracking is not free | Wilco | 2018/12/23 06:48 AM |
Cracking is not free | Michael S | 2018/12/23 08:09 AM |
Cracking is not free | Gabriele Svelto | 2018/12/26 02:53 PM |
RISC-V inferior to ARMv8 | rwessel | 2018/12/21 01:13 PM |
RISC-V inferior to ARMv8 | Seni | 2018/12/21 02:33 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/21 03:33 PM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 05:49 PM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 04:58 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 07:03 AM |
RISC-V inferior to ARMv8 | Wilco | 2018/12/22 07:22 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 07:40 AM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/21 03:57 AM |
RISC-V inferior to ARMv8 | Konrad Schwarz | 2018/12/21 02:25 AM |
RISC-V inferior to ARMv8 | j | 2018/12/21 10:46 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/21 06:08 PM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/22 07:45 AM |
RISC-V inferior to ARMv8 | Travis Downs | 2018/12/22 07:50 AM |
RISC-V inferior to ARMv8 | Michael S | 2018/12/22 08:15 AM |
RISC-V inferior to ARMv8 | dmcq | 2018/12/22 10:41 AM |
RISC-V inferior to ARMv8 | AnonQ | 2018/12/22 08:13 AM |
RISC-V gut feelings | dmcq | 2018/12/20 07:41 AM |
RISC-V initial take | Konrad Schwarz | 2018/12/21 02:17 AM |
RISC-V initial take | dmcq | 2018/12/21 03:23 AM |
RISC-V gut feelings | Montaray Jack | 2018/12/22 02:56 PM |
RISC-V gut feelings | dmcq | 2018/12/23 04:38 AM |
RISC-V Summit Proceedings | juanrga | 2018/12/21 10:47 AM |
RISC-V Summit Proceedings | dmcq | 2018/12/22 06:21 AM |
RISC-V Summit Proceedings | Montaray Jack | 2018/12/22 02:03 PM |
RISC-V Summit Proceedings | dmcq | 2018/12/23 04:39 AM |
RISC-V Summit Proceedings | anon2 | 2018/12/21 10:57 AM |
RISC-V Summit Proceedings | Michael S | 2018/12/22 08:36 AM |
RISC-V Summit Proceedings | Anon | 2018/12/22 05:51 PM |
Not Stanford MIPS but commercial MIPS | Paul A. Clayton | 2018/12/23 03:05 AM |
Not Stanford MIPS but commercial MIPS | Michael S | 2018/12/23 03:49 AM |
Not Stanford MIPS but commercial MIPS | dmcq | 2018/12/23 04:52 AM |