By: Maynard Handley (name99.delete@this.name99.org), August 18, 2020 3:46 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 18, 2020 1:48 pm wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on August 18, 2020 1:00 pm wrote:
> > hobold (hobold.delete@this.vectorizer.org) on August 18, 2020 11:58 am wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on August 18, 2020 4:48 am wrote:
> > > [...]
> > > > BTW, Power ISA™ Version 3.1 manual is here https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0
> > > >
> > >
> > > Thanks for the link!
> > >
> > > Looks like they are really serious about autovectorization. Quite a few SIMD
> > > bits to round out functionality, and more interoperation between scalar and
> > > SIMD. They even have "vector centrifuge" now ... that is ... interesting.
> >
> > Gah! You can't split a prefixed instruction over a 64 byte boundary.
> > I wonder why they felt it necessary to impose that.
> >
>
> Yes, quite strange, esp. considering that the 1st implementation likely has 128-byte cache lines.
Given what I said (marking pairs in pre-decode) possibly
(a) they want the flexibility (for who knows what future reason) to be able to use 64-byte lines
(b) their 128 lines are implemented as pulling in half a line at a time, critical half first, and they want to be able to predecode that half while they wait on the other half
(c) they want to run two predecoders in parallel on both halves on the line
?
> > I like that centrifuge! It'd be an interesting little design
> > challenge to make something that implements it efficiently.
>
> If I am not mistaken, it's the same as x86 PDEP. On Intel it is reasonably fast (lat=3, thr=1).
> People complained, including on this board, that on AMD it is slow.
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=525,3401,4155,2719,4152&othertechs=BMI2
>
> dmcq (dmcq.delete@this.fano.co.uk) on August 18, 2020 1:00 pm wrote:
> > hobold (hobold.delete@this.vectorizer.org) on August 18, 2020 11:58 am wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on August 18, 2020 4:48 am wrote:
> > > [...]
> > > > BTW, Power ISA™ Version 3.1 manual is here https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0
> > > >
> > >
> > > Thanks for the link!
> > >
> > > Looks like they are really serious about autovectorization. Quite a few SIMD
> > > bits to round out functionality, and more interoperation between scalar and
> > > SIMD. They even have "vector centrifuge" now ... that is ... interesting.
> >
> > Gah! You can't split a prefixed instruction over a 64 byte boundary.
> > I wonder why they felt it necessary to impose that.
> >
>
> Yes, quite strange, esp. considering that the 1st implementation likely has 128-byte cache lines.
Given what I said (marking pairs in pre-decode) possibly
(a) they want the flexibility (for who knows what future reason) to be able to use 64-byte lines
(b) their 128 lines are implemented as pulling in half a line at a time, critical half first, and they want to be able to predecode that half while they wait on the other half
(c) they want to run two predecoders in parallel on both halves on the line
?
> > I like that centrifuge! It'd be an interesting little design
> > challenge to make something that implements it efficiently.
>
> If I am not mistaken, it's the same as x86 PDEP. On Intel it is reasonably fast (lat=3, thr=1).
> People complained, including on this board, that on AMD it is slow.
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=525,3401,4155,2719,4152&othertechs=BMI2
>
Topic | Posted By | Date |
---|---|---|
IBM introduces POWER10 | Crystal S. Diamond | 2020/08/16 10:20 PM |
"New ISA Prefix Fusion" | QAnon | 2020/08/16 11:21 PM |
"New ISA Prefix Fusion" | Anon3 | 2020/08/17 06:59 AM |
"New ISA Prefix Fusion" | Kevin G | 2020/08/17 10:51 AM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/17 11:51 AM |
"New ISA Prefix Fusion" | Anon3 | 2020/08/17 04:10 PM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/17 04:34 PM |
"New ISA Prefix Fusion" | Anon3 | 2020/08/17 05:34 PM |
"New ISA Prefix Fusion" | Adrian | 2020/08/17 06:39 PM |
"New ISA Prefix Fusion" | anon2 | 2020/08/17 09:24 PM |
"New ISA Prefix Fusion" | Doug S | 2020/08/17 09:58 PM |
"New ISA Prefix Fusion" | hobold | 2020/08/18 01:47 AM |
"New ISA Prefix Fusion" | Michael S | 2020/08/18 04:48 AM |
"New ISA Prefix Fusion" | hobold | 2020/08/18 11:58 AM |
"New ISA Prefix Fusion" | dmcq | 2020/08/18 01:00 PM |
"New ISA Prefix Fusion" | Michael S | 2020/08/18 01:48 PM |
"New ISA Prefix Fusion" | hobold | 2020/08/18 02:29 PM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/18 03:46 PM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/18 03:42 PM |
"New ISA Prefix Fusion" | anon2 | 2020/08/18 07:04 PM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/18 09:17 PM |
"New ISA Prefix Fusion" | dmcq | 2020/08/19 04:08 AM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/19 10:02 AM |
"New ISA Prefix Fusion" | dmcq | 2020/08/19 11:08 AM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/19 12:05 PM |
"New ISA Prefix Fusion" | dmcq | 2020/08/19 02:14 PM |
"New ISA Prefix Fusion" | Maynard Handley | 2020/08/19 02:44 PM |
IBM introduces POWER10 | Thu | 2020/08/16 11:56 PM |
IBM introduces POWER10 | Michael S | 2020/08/17 02:12 AM |
IBM introduces POWER10 | Thu | 2020/08/17 03:27 AM |
IBM introduces POWER10 | TransientStudent | 2020/08/17 04:23 AM |
IBM introduces POWER10 | Rayla | 2020/08/17 04:29 AM |
IBM introduces POWER10 | Maynard Handley | 2020/08/17 10:44 AM |
IBM introduces POWER10 | Kevin G | 2020/08/17 10:57 AM |
IBM introduces POWER10 | Rayla | 2020/08/17 04:26 AM |
IBM introduces POWER10 | Thu | 2020/08/17 05:00 PM |
Matrix Math Accelerator | Adrian | 2020/08/17 01:01 AM |
Matrix Math Accelerator | Michael S | 2020/08/17 02:32 AM |
Matrix Math Accelerator | Adrian | 2020/08/17 02:46 AM |
Matrix Math Accelerator | j | 2020/08/18 02:32 AM |