By: Philip Taylor (philip.delete@this.zaynar.co.uk), August 3, 2016 1:36 pm
Room: Moderated Discussions
Philip Taylor (philip.delete@this.zaynar.co.uk) on August 2, 2016 4:46 pm wrote:
> [...]
> The framebuffer is split into four partitions, one containing 4/13 of all the tiles (where each tile is 16x16
> pixels, and contains sub-tiles of 4x8 pixels), the others containing 3/13 each, in a sort of interleaved diagonal
> stripe pattern. (This presumably comes from the GTX 970 having 13 SMs, which are grouped into 4 SMMs.)
Got the terminology wrong: it has 13 SMMs, grouped into 4 GPCs.
Incidentally, I think this test demonstrates some mildly interesting variation in GTX 970s.
On my device, I see a clear 4:3:3:3 ratio in tile coverage per partition(/GPC) (by inserting delays in the shaders to make one of the partitions lag behind the rest). When running the original code with no delays, the partition with 4/13 of the tiles naturally takes longer to complete since it's doing more work - so I see 256x512-pixel regions that have 9/13 of their tiles rendered at once, and the remaining 4/13 gets filled in some time later.
David's video shows quite different behaviour for his GTX 970, e.g. around 11:20. The rightmost region only has 2/13 of its tiles rendered. (It's a repeating pattern of 13x13 tiles, with 2 rendered on each row of that pattern). The third from the right has 2/13 in one colour and 3/13 in another colour.
That indicates a 4:4:3:2 ratio.
The GTX 970 architecture diagram shows a 4:3:3:3 assignment of SMs to GPCs, which is consistent with my device. But it looks like David's has one GPC with only 2 SMs, so it's a less well balanced assignment of SMs to other per-GPC resources, which affects this test and maybe could result in very slightly different performance characteristics.
I guess in theory you could have a 4:4:4:1 ratio too, but perhaps that's too extreme and rare enough that those chips just get rejected.
The GTX 1070 (15:45 in the video) has 3 GPCs, which appear to partition the screen into vertical stripes, so their tile coverage has the ratio 11:11:10 in each 512x512 region. (They're uneven just because you can't divide 512x512 by 3). The one with 10/32ths completes a bit sooner than the others, which is why it gets to draw the red stripes in the next region before the others have caught up.
(I hypothesise the GTX 980 (16 SMMs in 4 GPCs), 960 (8 SMMs in 2 GPCs), 1080 (20 SMMs in 4 GPCs) and 1060 (10 SMMs in 2 GPCs) would give vertical stripes like the 1070 (though with a stride of 4 or 2 instead of 3) since all their GPCs are uniform and they don't need a funny partition pattern to balance it well enough. But their partitions will all be exactly uniform numbers of tiles, so they'll largely stay in sync and the stripes wouldn't be visible - you should just see each 256x512/512x512 region filling up solidly from top to bottom.)
> [...]
> The framebuffer is split into four partitions, one containing 4/13 of all the tiles (where each tile is 16x16
> pixels, and contains sub-tiles of 4x8 pixels), the others containing 3/13 each, in a sort of interleaved diagonal
> stripe pattern. (This presumably comes from the GTX 970 having 13 SMs, which are grouped into 4 SMMs.)
Got the terminology wrong: it has 13 SMMs, grouped into 4 GPCs.
Incidentally, I think this test demonstrates some mildly interesting variation in GTX 970s.
On my device, I see a clear 4:3:3:3 ratio in tile coverage per partition(/GPC) (by inserting delays in the shaders to make one of the partitions lag behind the rest). When running the original code with no delays, the partition with 4/13 of the tiles naturally takes longer to complete since it's doing more work - so I see 256x512-pixel regions that have 9/13 of their tiles rendered at once, and the remaining 4/13 gets filled in some time later.
David's video shows quite different behaviour for his GTX 970, e.g. around 11:20. The rightmost region only has 2/13 of its tiles rendered. (It's a repeating pattern of 13x13 tiles, with 2 rendered on each row of that pattern). The third from the right has 2/13 in one colour and 3/13 in another colour.
That indicates a 4:4:3:2 ratio.
The GTX 970 architecture diagram shows a 4:3:3:3 assignment of SMs to GPCs, which is consistent with my device. But it looks like David's has one GPC with only 2 SMs, so it's a less well balanced assignment of SMs to other per-GPC resources, which affects this test and maybe could result in very slightly different performance characteristics.
I guess in theory you could have a 4:4:4:1 ratio too, but perhaps that's too extreme and rare enough that those chips just get rejected.
The GTX 1070 (15:45 in the video) has 3 GPCs, which appear to partition the screen into vertical stripes, so their tile coverage has the ratio 11:11:10 in each 512x512 region. (They're uneven just because you can't divide 512x512 by 3). The one with 10/32ths completes a bit sooner than the others, which is why it gets to draw the red stripes in the next region before the others have caught up.
(I hypothesise the GTX 980 (16 SMMs in 4 GPCs), 960 (8 SMMs in 2 GPCs), 1080 (20 SMMs in 4 GPCs) and 1060 (10 SMMs in 2 GPCs) would give vertical stripes like the 1070 (though with a stride of 4 or 2 instead of 3) since all their GPCs are uniform and they don't need a funny partition pattern to balance it well enough. But their partitions will all be exactly uniform numbers of tiles, so they'll largely stay in sync and the stripes wouldn't be visible - you should just see each 256x512/512x512 region filling up solidly from top to bottom.)