Zero copy matters

Article: AMD Fusion Architecture and Llano
By: Groo (, August 25, 2011 2:11 pm
Room: Moderated Discussions
David Kanter ( on 8/25/11 wrote:
>Gionatan Danti ( on 8/24/11 wrote:
>>Hi David, thank you for this great article.
>>I would pose you a question about zero copy: from my understanding, a discrete
>>graphic card can use texture directly from system memory through, without the need
>>to copy them to local memory. This is achieved using the >GART mapping table found on AGP and PCI-E specifications.
>I am not familiar with this, but it sounds reasonable.
>>Sure this kind of texture access is way lower than accessing local memory, but
>>(to me) closely resemble the zero-copy concept.
>As you described it, yes, this is very similar to the zero copy that Intel and AMD are implementing.
>>So, when AMD presents zero-copy as a Llano's innovations, what exactly means? It
>>regard a zero-copy capability between CPU (system) memory and GPU-reserved memory?
>>It regard both graphic mode and compute mode?
>There is a fairly important distinction from a physical perspective (i.e. what
>the electrons and bits are doing), but they seem to be very similar from a logical (i.e. programmers) perspective.
>The right way to understand the differences is to look at the actual data flow
>for a read operation. Here I use --> to indicate on-die data flow and ==> to indicate off-die data flow.
>Zero copy discrete:
>GPU read-->PCI-E ==> CPU memory controller ==> DRAM
>The important part here is that you are using PCI-E as an external interface.
>The data must flow from the CPU die to the GPU, using PCI-E which costs latency, power and bandwidth.
>Llano zero copy:
>GPU read-->CPU/GPU memory controller==>DRAM
>Sandy Bridge zero copy:
>GPU read-->L3 cache OR
>GPU read-->CPU memory controller==>DRAM
>If you look at Llano, their memory controller can theoretically read 30GB/s (2
>* 8B * 1.866GT/s). The fastest PCI-E interface is theoretically 8GB/s in a single
>direction (2B * 4GT/s). Intel's L3 cache bandwidth to the GPU is ~100GB/s (32B * ~3GT/s).
>So the key to zero copy for AMD or Intel GPUs is that they eliminate an off-die data transfer and achieve:
>1. Vastly higher bandwidth (3-4X for memory, 12X for cache)
>2. Lower latency
>3. Lower power consumption
>So they are similar, but the difference is that using zero copy on a discrete GPU
>tends to lower performance by reducing bandwidth. Zero copy on an integrated GPU
>raises performance, power efficiency, etc.
I am pretty sure that zero copy doesn't copy anything, just tweaks the pointer to memory with the MMU. If so, it seems way more efficient than Intel's scheme. That said, I might be thinking of something that hasn't been released yet. :)

TopicPosted ByDate
