By: Patrick Chase (patrickjchase.delete@this.gmail.com), July 2, 2013 10:01 am
Room: Moderated Discussions
Etienne (etienne_lorrain.delete@this.yahoo.fr) on July 2, 2013 4:36 am wrote:
> Isn't the GPGPU a lot quicker mainly because it does not have to do what the CPU does, > i.e. manage virtual memory and memory protection for every bytes (all the TLB work and > delays),
Modern GPUs have PMMUs and virtual memory. That's how they implement a unified virtual address space (across all CPUs and GPUs in the system) and do zerocopy I/O to pinned host memory.
> manage cache lines shared in between CPUs
> (copying written cache lines to other caches),
True. GPUs implement a limited form of release consistency, which is weaker and less HW-intensive than, say, the x86 memory ordering model.
> manage security by erasing newly allocated pages to processes,
This isn't a HW function and can be done just as easily on a GPU as on a CPU (simply launch a 'bzero kernel' after every allocation...)
> manage all the crappy hardware around (active waits because some version of that
> chip do not allow two consecutive writes within N microseconds...), manage different
> version of libraries (page loaded on demand, position independent code, dynamic
> linking of files which can be in 10 different places in the filesystem)?
Do you really think that these have significant cost for compute-oriented workloads as might be targeted to a GPU?
> Isn't the GPGPU a lot quicker mainly because it does not have to do what the CPU does, > i.e. manage virtual memory and memory protection for every bytes (all the TLB work and > delays),
Modern GPUs have PMMUs and virtual memory. That's how they implement a unified virtual address space (across all CPUs and GPUs in the system) and do zerocopy I/O to pinned host memory.
> manage cache lines shared in between CPUs
> (copying written cache lines to other caches),
True. GPUs implement a limited form of release consistency, which is weaker and less HW-intensive than, say, the x86 memory ordering model.
> manage security by erasing newly allocated pages to processes,
This isn't a HW function and can be done just as easily on a GPU as on a CPU (simply launch a 'bzero kernel' after every allocation...)
> manage all the crappy hardware around (active waits because some version of that
> chip do not allow two consecutive writes within N microseconds...), manage different
> version of libraries (page loaded on demand, position independent code, dynamic
> linking of files which can be in 10 different places in the filesystem)?
Do you really think that these have significant cost for compute-oriented workloads as might be targeted to a GPU?