Wavefronts

Article: AMD's Cayman GPU Architecture
By: Seni (seniike.delete@this.hotmail.com), December 20, 2010 1:07 pm
Room: Moderated Discussions
Moritz (better@not.tell) on 12/20/10 wrote:
---------------------------
>did I understand it correctly, that a wavefront is only one single VLIW with a bunch of data to apply it to?
>are the instructions in the VLIW different or usually the same? I suppose the VLIW

I think it's an SIMD of VLIWs, with one VLIW bundle per pixel-lane. The wavefront is not a single VLIW because it is applied across multiple pixels at once.
The instructions within a VLIW bundle can be different but often aren't.

>consists of compiler-time established ILP? so are the instructions in the VLIW likely
>to be successive instructions in a serial formulation of the code?

The vector-level parallelism is present in the source code and is passed through largely untouched to the hardware.
No serial formulation need ever be constructed.
Most of it is vector ops of length 3 and 4, mixed in with a few scalar ops.
A pure vector hardware implementation is possible, but would have idle lanes whenever length less than 4 is used.
Radeons use VLIW instead to pack more instructions into the otherwise unused lanes.

>How are the wavefronts data-wise generated? I don't understand how the right numbers
>that might be anywhere in the memory are collected, aligned and then grouped. Is
>the memory a special type that can be accessed in strides?

They aren't. Problems like "alignment" and "grouping" don't exist, because memory must be layed out to fit the shader, not the other way around.
Memory access is expensive because it requires off-chip bandwidth, and because unrestricted writes would break the parallelism between pixels.
Traditionally, everything is done with the register file, except for texture lookups and the final output write.

To read from memory, you do a texture lookup.
To write to memory, you use the shader's final output, which is a 4-vector written to a predetermined location in the framebuffer as the shader ends.
To write more than 4 numbers, you use "multiple render targets."
You get one target each for 4 framebuffers, to the same predetermined location in each.
This gives a total of 16 numbers per shader, but they are slower.
To read what you wrote to memory and use it you need to do a separate rendering pass with a separate shader.

Newer GPUs have more kinds of memory instructions but they are almost as awkward and slow.
Basically, you avoid touching memory as much as possible.

>It would likely help if I had a detailed explanation of what shaders do, (data-structures)

Shaders can do many different things, so asking what shaders do is not really answerable.
As for data structures, it's basically just whatever fp numbers you can stuff into a tiny register file.

>to understand how GPUs are setup. (Link, search-terms(render-pipeline, shader, 3d-engine, texturing) ?)
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
AMD Cayman Architecture article onlineDavid Kanter2010/12/15 07:39 AM
  AMD Cayman Architecture article onlineBryan Catanzaro2010/12/15 12:25 PM
  AMD Cayman Architecture article onlineCarsten Spille2010/12/15 02:51 PM
  AMD Cayman Architecture article onlineDaveC2010/12/15 05:17 PM
    AMD Cayman Architecture article onlineAntti-Ville Tuunainen2010/12/16 01:13 AM
      AMD Cayman Architecture article onlinePing-Che Chen2010/12/16 02:39 AM
        AMD Cayman Architecture article onlineEduardoS2010/12/16 01:54 PM
          AMD Cayman Architecture article onlineDavid Kanter2010/12/16 03:46 PM
            AMD Cayman Architecture article onlineEduardoS2010/12/16 06:03 PM
    AMD Cayman Architecture article onlineVincent Diepeveen2010/12/17 07:16 AM
  AMD Cayman Architecture article onlinean2010/12/16 12:39 PM
    AMD Cayman Architecture article onlineRichard Cownie2010/12/16 01:51 PM
      AMD Cayman Architecture article onlineVincent Diepeveen2010/12/17 07:31 AM
        AMD Cayman Architecture article onlineRichard Cownie2010/12/17 09:22 AM
    AMD Cayman Architecture article onlineEduardoS2010/12/16 02:01 PM
      AMD Cayman Architecture article onlinean2010/12/16 02:43 PM
        AMD Cayman Architecture article onlineEduardoS2010/12/16 02:51 PM
          AMD Cayman Architecture article onlineDaveC2010/12/16 03:41 PM
            AMD Cayman Architecture article onlinehobold2010/12/16 03:56 PM
              AMD Cayman Architecture article onlineDaveC2010/12/16 05:31 PM
            AMD Cayman Architecture article onlineVincent Diepeveen2010/12/17 07:02 AM
        AMD Cayman Architecture article onlineAaron Spink2010/12/16 03:39 PM
        AMD Cayman Architecture article onlineDavid Kanter2010/12/16 03:48 PM
          AMD Cayman Architecture article onlineVincent Diepeveen2010/12/17 07:07 AM
        AMD Cayman Architecture article onlineVincent Diepeveen2010/12/17 06:56 AM
  Logic error on the articleHeikki Kultala2010/12/17 03:59 AM
    Good pointDavid Kanter2010/12/17 11:21 AM
      Good pointTriskaine2010/12/17 01:02 PM
        Good pointDavid Kanter2010/12/17 04:45 PM
      Good pointJohn2010/12/20 08:05 PM
  WavefrontsMoritz2010/12/20 04:11 AM
    WavefrontsSeni2010/12/20 01:07 PM
      TexturesMoritz2010/12/21 01:41 AM
        TexturesGabriele Svelto2010/12/21 02:21 AM
        TexturesAntti-Ville Tuunainen2010/12/21 02:33 PM
  Integer computerRobert David Graham2010/12/21 04:45 PM
    Integer computeranon2010/12/21 08:22 PM
      Integer computeretzel2010/12/21 09:12 PM
  TransposeMoritz2010/12/23 03:44 PM
    TransposeMatt Sayler2010/12/23 07:58 PM
      Transpose/RotateMoritz2010/12/24 01:38 AM
        Transpose/RotateMatt Sayler2010/12/24 06:45 AM
          Transpose/RotateMoritz2010/12/25 05:33 AM
  control-data associated with scalarsMoritz2010/12/23 03:59 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?