By: none (none.delete@this.none.com), August 10, 2012 6:06 am
Room: Moderated Discussions
Eric (eric.kjellen.delete@this.gmail.com) on August 9, 2012 5:32 pm wrote:
[...]
> But doesn't
> this just mean that reducing memory latency has diminishing returns if memory
> bandwidth isn't also increased, which tends to be the case with advancing memory
> technology (and CPU caches, internal buses etc.) anyway? That is to say, is it
> really a problem that forces us to re-think memory technology or just a matter
> of naturally balancing latency and bandwidth improvements to existing memory
> subsystem models?
IMHO getting higher bandwidth often is easier than reducing latency: just increase bus width. OK it's not that easy as there has to be some tradeoffs: going too wide might impact latency; you'll also need more buffers; place and route will be harder; etc.
IIRC I measured a single core of my i7-920 fetching data from memory at about 16 GB/s (more cores increased that number, it looks like a single core can't saturate the memory controller and the 3 memory channels). Assuming it's running at about 3 GHz that means 5 bytes/cycle which I think is enough for many tasks. That's the reason why even if synthetic benchmarks show big increases with faster memories (both latency and bandwidth) the impact on application performance is not extremely important. (See this for example: http://www.bit-tech.net/hardware/memory/2011/01/11/the-best-memory-for-sandy-bridge/8.)
Note this perhaps only demonstrates how efficient Intel memory subsystem (from L1 down to memory controller) is :)
Of course some charges need huge bandwidth. Also using multiple cores that run threads with larger than "usual" cache miss rates might hit a bandwidth wall.
I guess this doesn't directly answer your question, I just wanted to point out that it looks like memory bandwidth is already very good on recent Intel CPUs, so a completely new memory technology wouldn't make our typical workload necessarily much faster.
Here is an article dealing with latency and bandwidth impact on some scientific applications: www.sandia.gov/~rcmurph/doc/latency.pdf
[...]
> But doesn't
> this just mean that reducing memory latency has diminishing returns if memory
> bandwidth isn't also increased, which tends to be the case with advancing memory
> technology (and CPU caches, internal buses etc.) anyway? That is to say, is it
> really a problem that forces us to re-think memory technology or just a matter
> of naturally balancing latency and bandwidth improvements to existing memory
> subsystem models?
IMHO getting higher bandwidth often is easier than reducing latency: just increase bus width. OK it's not that easy as there has to be some tradeoffs: going too wide might impact latency; you'll also need more buffers; place and route will be harder; etc.
IIRC I measured a single core of my i7-920 fetching data from memory at about 16 GB/s (more cores increased that number, it looks like a single core can't saturate the memory controller and the 3 memory channels). Assuming it's running at about 3 GHz that means 5 bytes/cycle which I think is enough for many tasks. That's the reason why even if synthetic benchmarks show big increases with faster memories (both latency and bandwidth) the impact on application performance is not extremely important. (See this for example: http://www.bit-tech.net/hardware/memory/2011/01/11/the-best-memory-for-sandy-bridge/8.)
Note this perhaps only demonstrates how efficient Intel memory subsystem (from L1 down to memory controller) is :)
Of course some charges need huge bandwidth. Also using multiple cores that run threads with larger than "usual" cache miss rates might hit a bandwidth wall.
I guess this doesn't directly answer your question, I just wanted to point out that it looks like memory bandwidth is already very good on recent Intel CPUs, so a completely new memory technology wouldn't make our typical workload necessarily much faster.
Here is an article dealing with latency and bandwidth impact on some scientific applications: www.sandia.gov/~rcmurph/doc/latency.pdf
Topic | Posted By | Date |
---|---|---|
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 01:37 AM |
New Article: Compute Efficiency 2012 | SHK | 2012/07/25 02:31 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 02:42 AM |
New Article: Compute Efficiency 2012 | none | 2012/07/25 03:18 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:25 AM |
GCN (NT) | EBFE | 2012/07/25 03:25 AM |
GCN - TFLOP DP | jp | 2012/08/09 01:58 PM |
GCN - TFLOP DP | David Kanter | 2012/08/09 03:32 PM |
GCN - TFLOP DP | Kevin G | 2012/08/11 05:22 PM |
GCN - TFLOP DP | Eric | 2012/08/09 05:12 PM |
GCN - TFLOP DP | jp | 2012/08/10 01:23 AM |
GCN - TFLOP DP | EBFE | 2012/08/12 08:27 PM |
GCN - TFLOP DP | jp | 2012/08/13 02:02 AM |
GCN - TFLOP DP | EBFE | 2012/08/13 07:45 PM |
GCN - TFLOP DP | jp | 2012/08/14 01:21 AM |
New Article: Compute Efficiency 2012 | Adrian | 2012/07/25 04:39 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/25 09:33 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:11 AM |
New Article: Compute Efficiency 2012 | sf | 2012/07/25 06:46 AM |
New Article: Compute Efficiency 2012 | aaron spink | 2012/07/25 09:08 AM |
New Article: Compute Efficiency 2012 | someone | 2012/07/25 10:06 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:14 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/26 02:27 AM |
BG/Q | David Kanter | 2012/07/26 09:31 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/03 01:57 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/03 07:59 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/04 06:37 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/04 06:51 PM |
Leaks != products | David Kanter | 2012/08/05 03:19 AM |
Leaks != products | EBFE | 2012/08/06 02:49 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/05 10:37 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/06 03:09 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/06 04:33 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 03:08 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 04:58 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:17 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 05:22 AM |
VR-ZONE KNC B0 leak, poor number? | anonymou5 | 2012/08/07 09:43 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:23 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:24 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:20 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 11:22 AM |
VR-ZONE KNC B0 leak, poor number? | EduardoS | 2012/08/07 03:15 PM |
KNC has FMA | David Kanter | 2012/08/07 09:17 AM |
New Article: Compute Efficiency 2012 | forestlaughing | 2012/07/25 08:51 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 05:12 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 11:53 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 12:51 PM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 02:48 PM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 03:29 PM |
New Article: Compute Efficiency 2012 | anon | 2012/07/29 02:25 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/29 11:53 AM |
Efficiency? No, lack of highly useful features | someone | 2012/07/25 09:58 AM |
Best case for GPUs | David Kanter | 2012/07/25 11:28 AM |
Best case for GPUs | franzliszt | 2012/07/25 01:39 PM |
Best case for GPUs | Chuck | 2012/07/25 08:13 PM |
Best case for GPUs | David Kanter | 2012/07/25 09:45 PM |
Best case for GPUs | Eric | 2012/07/27 05:51 AM |
Silverthorn data point | Michael S | 2012/07/25 02:45 PM |
Silverthorn data point | David Kanter | 2012/07/25 04:06 PM |
New Article: Compute Efficiency 2012 | Unununium | 2012/07/25 05:55 PM |
New Article: Compute Efficiency 2012 | EduardoS | 2012/07/25 08:12 PM |
Ops... I'm wrong... | EduardoS | 2012/07/25 08:14 PM |
New Article: Compute Efficiency 2012 | TacoBell | 2012/07/25 08:36 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 09:49 PM |
New Article: Compute Efficiency 2012 | Michael S | 2012/07/26 02:33 AM |
Line and factor | Moritz | 2012/07/26 01:34 AM |
Line and factor | Peter Boyle | 2012/07/27 07:57 AM |
not entirely | Moritz | 2012/07/27 12:22 PM |
Line and factor | EduardoS | 2012/07/27 05:24 PM |
Line and factor | Moritz | 2012/07/28 12:52 PM |
tables | Michael S | 2012/07/26 02:39 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | David Kanter | 2012/07/26 09:21 AM |
SP vs DP & performance metrics | jp | 2012/07/27 07:08 AM |
SP vs DP & performance metrics | Eric | 2012/07/27 07:57 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:18 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:36 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:47 AM |
"Global" --> system | Paul A. Clayton | 2012/07/27 10:31 AM |
"Global" --> system | jp | 2012/07/27 03:55 PM |
"Global" --> system | aaron spink | 2012/07/27 07:33 PM |
"Global" --> system | jp | 2012/07/28 02:00 AM |
"Global" --> system | aaron spink | 2012/07/28 06:54 AM |
"Global" --> system | jp | 2012/07/29 02:12 AM |
"Global" --> system | aaron spink | 2012/07/29 05:03 AM |
"Global" --> system | none | 2012/07/29 09:05 AM |
"Global" --> system | EduardoS | 2012/07/29 10:26 AM |
"Global" --> system | jp | 2012/07/30 02:24 AM |
"Global" --> system | aaron spink | 2012/07/30 03:05 AM |
"Global" --> system | aaron spink | 2012/07/30 03:03 AM |
daxpy is STREAM TRIAD | Paul A. Clayton | 2012/07/30 06:10 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 07:25 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:40 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/28 07:05 AM |
SP vs DP & performance metrics | jp | 2012/07/28 11:04 AM |
SP vs DP & performance metrics | Brett | 2012/07/28 03:32 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:11 PM |
SP vs DP & performance metrics | anon | 2012/07/29 02:53 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/29 05:39 AM |
Coherency for discretes | Rohit | 2012/07/29 09:24 AM |
SP vs DP & performance metrics | anon | 2012/07/29 11:09 AM |
SP vs DP & performance metrics | Eric | 2012/07/29 01:08 PM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:25 AM |
Regular updates? | Joe | 2012/07/27 09:35 AM |
New Article: Compute Efficiency 2012 | 309 | 2012/07/27 10:34 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 09:01 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 01:11 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 08:04 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 09:32 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/03 04:35 PM |
Memory power and bandwidth? | David Kanter | 2012/08/04 11:22 AM |
Memory power and bandwidth? | Michael S | 2012/08/04 02:36 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/06 02:09 PM |
Memory power and bandwidth? | Eric | 2012/08/07 06:28 PM |
Workloads | David Kanter | 2012/08/08 10:49 AM |
Workloads | Eric | 2012/08/09 05:21 PM |
Latency and bandwidth bottlenecks | Paul A. Clayton | 2012/08/08 04:02 PM |
Latency and bandwidth bottlenecks | Eric | 2012/08/09 05:32 PM |
Latency and bandwidth bottlenecks | none | 2012/08/10 06:06 AM |
Latency and bandwidth bottlenecks -> BDP | ajensen | 2012/08/11 03:21 PM |
Memory power and bandwidth? | Ingeneer | 2012/08/06 11:26 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/11 01:21 PM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/11 09:25 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/12 02:45 AM |
NV aims for 1.8+ TFLOPS DP ? | EBFE | 2012/08/12 10:02 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 01:54 AM |
NV aims for 1.8+ TFLOPS DP ? | Gabriele Svelto | 2012/08/13 09:16 AM |
NV aims for 1.8+ TFLOPS DP ? | Vincent Diepeveen | 2012/08/14 03:04 AM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/13 09:50 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 11:17 AM |
NV aims for 1.8+ TFLOPS DP ? | EduardoS | 2012/08/13 06:45 AM |