By: juanrga (nospam.delete@this.juanrga.com), October 30, 2015 10:37 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on October 30, 2015 6:54 am wrote:
> juanrga (nospam.delete@this.juanrga.com) on October 29, 2015 2:16 am wrote:
> > Contrarian (Contrarian.delete@this.hotmail.com) on October 25, 2015 12:25 pm wrote:
> > > mpx (mpx.delete@this.nomail.pl) on October 5, 2015 9:37 am wrote:
> > > > The 128-bit choice could also stem from having the same architecture for both x86 and ARM.
> > > > As ARM has no 256 bit vector processing, so a common denominator is 128-bit units.
> > >
> > > Intel has 2 read and 1 write ports to cache, same as all other high end CPU's, but Intel has three AGU
> > > units and thus can use all of those ports.
> >
> > Power8 cache supports four reads and one write in every cycle, when there is no bank conflicts.
>
> I don't think it's that much.
I reported what IBM claims.
> It has 2 "LU" and 2 "LSU" units, so I think there is a maximum combination of
> 4 per cycle. However the mix I'm not entirely sure of, and there is conflicting information I have not yet
> dug through enough. Some say 2 loads + 2 loads-or-stores per cycle. Some 4 loads or 1 store per cycle. Some
> says a store uses a slot in both LSU and LU, which would be 4 loads, 2 loads + 1 store, or 2 stores.
There are different combinations possible depending of the working mode ST or SMT2--8 and type of operation. For instance, loads that update general registers can execute in the LUs and LSUs, but loads that update a floating point register only execute in the LU.
> juanrga (nospam.delete@this.juanrga.com) on October 29, 2015 2:16 am wrote:
> > Contrarian (Contrarian.delete@this.hotmail.com) on October 25, 2015 12:25 pm wrote:
> > > mpx (mpx.delete@this.nomail.pl) on October 5, 2015 9:37 am wrote:
> > > > The 128-bit choice could also stem from having the same architecture for both x86 and ARM.
> > > > As ARM has no 256 bit vector processing, so a common denominator is 128-bit units.
> > >
> > > Intel has 2 read and 1 write ports to cache, same as all other high end CPU's, but Intel has three AGU
> > > units and thus can use all of those ports.
> >
> > Power8 cache supports four reads and one write in every cycle, when there is no bank conflicts.
>
> I don't think it's that much.
I reported what IBM claims.
> It has 2 "LU" and 2 "LSU" units, so I think there is a maximum combination of
> 4 per cycle. However the mix I'm not entirely sure of, and there is conflicting information I have not yet
> dug through enough. Some say 2 loads + 2 loads-or-stores per cycle. Some 4 loads or 1 store per cycle. Some
> says a store uses a slot in both LSU and LU, which would be 4 loads, 2 loads + 1 store, or 2 stores.
There are different combinations possible depending of the working mode ST or SMT2--8 and type of operation. For instance, loads that update general registers can execute in the LUs and LSUs, but loads that update a floating point register only execute in the LU.