By: wumpus (lost.delete@this.in-a.cave.net), August 2, 2016 7:57 am
Room: Moderated Discussions
Simon Farnsworth (simon.delete@this.farnz.org.uk) on August 1, 2016 1:01 pm wrote:
> wumpus (lost.delete@this.in-a.cave.net) on August 1, 2016 11:29 am wrote:
> > Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 9:49 am wrote:
> > > vvid (no.delete@this.thanks.com) on August 1, 2016 9:45 am wrote:
> > > > Nvidia uses tiles since ~NV20.
> > > >
> > > > These small rectangles on video are ROP tiles (collection of pixels placed at
> > > > adjacent location in the same memory bank) and can be compressed (nv40+).
> > > >
> > > > http://www.google.ch/patents/US7545382
> > > > http://www.freepatentsonline.com/y2015/0154733.html
> > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/mchehab/linux-media/+/media/v4.7-2/drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv40.c
> > > >
> > > > Specific ordering pattern is likely a result of non-linear (swizzled)
> > > > memory layout of ROP tiles grouped in a second level structure.
> > > >
> > > > AMD uses 8x8 tiles. It is highly intergrated with HSR system.
> > > >
> > >
> > > "tile" is a bit of an overloaded term. What you are describing above is tiled format (ie. layout
> > > of pixels in memory), which is a different thing from an internal tile buffer (ie. tiler gpu)
> > >
> >
> > I'll have to watch the video, but it seems to me that "tiling" is largely a means of increasing
> > cache hits while rendering (if not Nvidia's method, at least it can be used that way). Note
> > that even when not deferred, unless the API/engine is specifically designed to spit out tiles
> > (and likely even then) it is going to add roughly one frame of latency (because you presumably
> > have to collect enough polygons to bother with each tile). This isn't a terribly good long
> > term thing to do with VR on the horizon (which appears to want latency above all else).
>
> I don't see how you get the added frame of latency; both OpenGL and Vulkan have concepts that effectively delimit
> individual frames, and even a full-frame IMR is allowed to batch the drawing up until you hit the "end of rendering"
> command (be it glFlush(), glSwapBuffers(), or the more powerful Vulkan synchronization primitives).
I don't see how "allowed" == "required". From the demo it appears that the ATI board simply draws the triangles as they appear, no latency involved (of course they could be waiting to receive all the triangles first, but that seems weird).
Note that this is only true for current output definitions. Should nvidia create something like "G-sync 2.0" or more accurately "G-sync-VR" and allow the card to send each line to the LCD (or oLED or whatever) output display, it would get rid of much of this problem (since nvidia essentially "chases the beam", or at least works its way down the screen). You would then have to rewrite your graphics engine to chase the beam as well (or more likely, simply chop the screen into four horizontal stripes and render them separately (from a high level design this should be easy, finding and removing any "out of sight, out of mind" memory allocation/caching routines is another story).
All of this should only matter to VR, as it simply hasn't been an issue with current screens (although the G-sync feature seems to improve things). By all accounts, latency is the biggest issue for VR, and it seems to be solved largely by throwing transistors at it (i.e. simply increasing the framerate to force conventional GPUs to get each screen on the display quicker).
> wumpus (lost.delete@this.in-a.cave.net) on August 1, 2016 11:29 am wrote:
> > Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 9:49 am wrote:
> > > vvid (no.delete@this.thanks.com) on August 1, 2016 9:45 am wrote:
> > > > Nvidia uses tiles since ~NV20.
> > > >
> > > > These small rectangles on video are ROP tiles (collection of pixels placed at
> > > > adjacent location in the same memory bank) and can be compressed (nv40+).
> > > >
> > > > http://www.google.ch/patents/US7545382
> > > > http://www.freepatentsonline.com/y2015/0154733.html
> > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/mchehab/linux-media/+/media/v4.7-2/drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv40.c
> > > >
> > > > Specific ordering pattern is likely a result of non-linear (swizzled)
> > > > memory layout of ROP tiles grouped in a second level structure.
> > > >
> > > > AMD uses 8x8 tiles. It is highly intergrated with HSR system.
> > > >
> > >
> > > "tile" is a bit of an overloaded term. What you are describing above is tiled format (ie. layout
> > > of pixels in memory), which is a different thing from an internal tile buffer (ie. tiler gpu)
> > >
> >
> > I'll have to watch the video, but it seems to me that "tiling" is largely a means of increasing
> > cache hits while rendering (if not Nvidia's method, at least it can be used that way). Note
> > that even when not deferred, unless the API/engine is specifically designed to spit out tiles
> > (and likely even then) it is going to add roughly one frame of latency (because you presumably
> > have to collect enough polygons to bother with each tile). This isn't a terribly good long
> > term thing to do with VR on the horizon (which appears to want latency above all else).
>
> I don't see how you get the added frame of latency; both OpenGL and Vulkan have concepts that effectively delimit
> individual frames, and even a full-frame IMR is allowed to batch the drawing up until you hit the "end of rendering"
> command (be it glFlush(), glSwapBuffers(), or the more powerful Vulkan synchronization primitives).
I don't see how "allowed" == "required". From the demo it appears that the ATI board simply draws the triangles as they appear, no latency involved (of course they could be waiting to receive all the triangles first, but that seems weird).
Note that this is only true for current output definitions. Should nvidia create something like "G-sync 2.0" or more accurately "G-sync-VR" and allow the card to send each line to the LCD (or oLED or whatever) output display, it would get rid of much of this problem (since nvidia essentially "chases the beam", or at least works its way down the screen). You would then have to rewrite your graphics engine to chase the beam as well (or more likely, simply chop the screen into four horizontal stripes and render them separately (from a high level design this should be easy, finding and removing any "out of sight, out of mind" memory allocation/caching routines is another story).
All of this should only matter to VR, as it simply hasn't been an issue with current screens (although the G-sync feature seems to improve things). By all accounts, latency is the biggest issue for VR, and it seems to be solved largely by throwing transistors at it (i.e. simply increasing the framerate to force conventional GPUs to get each screen on the display quicker).