By: David Kanter (dkanter.delete@this.realworldtech.com), May 20, 2017 7:55 am
Room: Moderated Discussions
pocak (pocak100.delete@this.gmail.com) on May 18, 2017 4:19 pm wrote:
> steve m (steve.marton.delete@this.gmail.com) on May 17, 2017 8:35 pm wrote:
> > Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on May 17, 2017 5:43 am wrote:
> > > steve m (steve.marton.delete@this.gmail.com) on May 16, 2017 11:49 am wrote:
> > > Possibly load-balancing to prevent keeping the rasterizers from being idle while the pixels in the previous
> > > tile are being shaded. If the L2 can hold two tiles at the same time then it should be fine.
> >
> > Yeah I still don't quite understand. Why not keep the same rasterizers busy filling in holes
> > for the current tile? Why dispatch pixel shaders to future tiles, and finish them, all the way
> > through ROP, when you still have holes in your earlier tile? Shouldn't your rasterizer prioritize
> > pixels in the current tile first? It seems that the rasterizer has dispatched pixel shader work
> > to future tiles before it finished dispatching the first tile. Or the work queue was not consumed
> > in the intended order (the left hand doesn't know what the right is doing?).
> >
> > The cache is a shared resource. Sure, more tiles will fit, but you don't want to thrash it
> > unnecessarily, because you'll be slowing down your texture units and everything else.
> >
> > I don't see why you would NOT want to dispatch and process all pixel work in a tile before you
> > start dispatching the next tile. If you did that, presumably the work would be done in a much
> > cleaner progression by tile, rather than what we're seeing in the video. Of course there would
> > be some bleed between tiles, but I especially don't see any reason to get more than one tile
> > ahead. I mean, the rasterizer output queue has no reason to be THAT big, does it?...
>
> The key you're missing is that ROPs are statically assigned to portions of the framebuffer in squares
> of about 8x8 pixels. One ROP/rasterizer (I'm assuming those are tied together) doesn't move on to
> the next big tile until its portion is completed in the previous; this is clear in the video.
>
> Don't think of it as ROPs sharing a pool of 512KiB of cache - think of it
> as each of four ROPs using a distinct 128KiB of cache. (example numbers)
Yes that's right. The L2 cache is partitioned across the ROPs/memory controllers and so any memory graphics buffer stored in L2 is probably also partitioned.
Anyway, I'm comfortable with the fact that I might be wrong in parts...just happy to see I got the overall analysis correct.
David
> steve m (steve.marton.delete@this.gmail.com) on May 17, 2017 8:35 pm wrote:
> > Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on May 17, 2017 5:43 am wrote:
> > > steve m (steve.marton.delete@this.gmail.com) on May 16, 2017 11:49 am wrote:
> > > Possibly load-balancing to prevent keeping the rasterizers from being idle while the pixels in the previous
> > > tile are being shaded. If the L2 can hold two tiles at the same time then it should be fine.
> >
> > Yeah I still don't quite understand. Why not keep the same rasterizers busy filling in holes
> > for the current tile? Why dispatch pixel shaders to future tiles, and finish them, all the way
> > through ROP, when you still have holes in your earlier tile? Shouldn't your rasterizer prioritize
> > pixels in the current tile first? It seems that the rasterizer has dispatched pixel shader work
> > to future tiles before it finished dispatching the first tile. Or the work queue was not consumed
> > in the intended order (the left hand doesn't know what the right is doing?).
> >
> > The cache is a shared resource. Sure, more tiles will fit, but you don't want to thrash it
> > unnecessarily, because you'll be slowing down your texture units and everything else.
> >
> > I don't see why you would NOT want to dispatch and process all pixel work in a tile before you
> > start dispatching the next tile. If you did that, presumably the work would be done in a much
> > cleaner progression by tile, rather than what we're seeing in the video. Of course there would
> > be some bleed between tiles, but I especially don't see any reason to get more than one tile
> > ahead. I mean, the rasterizer output queue has no reason to be THAT big, does it?...
>
> The key you're missing is that ROPs are statically assigned to portions of the framebuffer in squares
> of about 8x8 pixels. One ROP/rasterizer (I'm assuming those are tied together) doesn't move on to
> the next big tile until its portion is completed in the previous; this is clear in the video.
>
> Don't think of it as ROPs sharing a pool of 512KiB of cache - think of it
> as each of four ROPs using a distinct 128KiB of cache. (example numbers)
Yes that's right. The L2 cache is partitioned across the ROPs/memory controllers and so any memory graphics buffer stored in L2 is probably also partitioned.
Anyway, I'm comfortable with the fact that I might be wrong in parts...just happy to see I got the overall analysis correct.
David