By: Eric (eric.kjellen.delete@this.gmail.com), August 7, 2012 6:28 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 4, 2012 11:22 am wrote:
> Or perhaps the real issue is that
> we need 3 different types of 'memory', one optimized for latency, one for
> bandwidth and one for capacity.
Can't that problem be solved at the system architecture and software levels by some combination of the traditional cache hierarchy from L1 to main memory (that offers excellent bandwidth and latency for small data sets/cached data, and high capacity in external DIMM modules for large data sets) and stacked, on-package memory (that offers very high bandwidth, fixed moderate latency but limited capacity for streaming SIMD data) with just standard SRAM, eDRAM, DRAM DIMMs, GDDR etc.? I.e. without any additional work at all at the level of physical memory chips. HMC would of course help a lot but I consider that an evolutionary development of standard DRAM, not a different optimization.
As far as I can tell, there are really only two types of workloads:
1. Latency-sensitive applications with irregular data access patterns.
2. Bandwidth-sensitive applications with highly regular data access patterns.
Any program that includes both types of code could use both latency-optimized memory addresses (i.e. CPU caches and external DIMMs) and bandwidth-optimized memory addresses (stacked memory) with the virtual memory space partitioned into different address ranges referring to the respective physical memory types/hierarchies. Am I missing something big or is it really this simple?
> Or perhaps the real issue is that
> we need 3 different types of 'memory', one optimized for latency, one for
> bandwidth and one for capacity.
Can't that problem be solved at the system architecture and software levels by some combination of the traditional cache hierarchy from L1 to main memory (that offers excellent bandwidth and latency for small data sets/cached data, and high capacity in external DIMM modules for large data sets) and stacked, on-package memory (that offers very high bandwidth, fixed moderate latency but limited capacity for streaming SIMD data) with just standard SRAM, eDRAM, DRAM DIMMs, GDDR etc.? I.e. without any additional work at all at the level of physical memory chips. HMC would of course help a lot but I consider that an evolutionary development of standard DRAM, not a different optimization.
As far as I can tell, there are really only two types of workloads:
1. Latency-sensitive applications with irregular data access patterns.
2. Bandwidth-sensitive applications with highly regular data access patterns.
Any program that includes both types of code could use both latency-optimized memory addresses (i.e. CPU caches and external DIMMs) and bandwidth-optimized memory addresses (stacked memory) with the virtual memory space partitioned into different address ranges referring to the respective physical memory types/hierarchies. Am I missing something big or is it really this simple?



