Article: AMD's Mobile Strategy
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), January 6, 2012 7:46 pm
Room: Moderated Discussions
Ricardo B (ricardo.b@xxxxx.xx) on 1/6/12 wrote:
---------------------------
>Paul A. Clayton (paaronclayton@gmail.com) on 1/6/12 wrote:
>---------------------------
>>Why would tag comparison have significantly greater
>>latency than a random access to any point on the chip?
[snip]
>I've never designed a cache so I don't really know,
Well, I've never designed a circuit, so I just guess from a
few things I've read.
>but I'd say that it's because the cache's are N-way
>associative, which means the tag lookups are a tad more
>complicated than a simple comparisson. Ie, SandyBridge's
>L3 is 12 way associative.
>
>Buffering and muxing all the signals around should also
>weight in heavily.
>
>In the ASIC I'm working on, I got delays as high as ~9 ns
>as the tool tried to fanout a signal into a bunch of
>places spread across ~7 mm, just from all the buffering.
Interesting. Thanks for sharing.
>>I vaguely recall reading that just going off chip can be
>>absurdly expensive in terms of latency (and power).
>
>It tends to. But it also depends on what your context and
>basis of comparison.
>
>An I/O interface with 0.5 ns (2 GHz clock period) scale
>latencies is... very very hard to do, if not impossible.
>
>But an I/O interface with 10 ns (100 MHz clock period)
>scale latencies is doable.
>
>As I mentioned, you can buy 36 Mbit SSRAM chips with < 5
>ns latency (2.5 clocks at 550 MHz). That is, the driving
>chip (ie, FGPA) sets the read request on it's >address/control pins and, 2.5 clocks after (< 5 ns), it
>can latch the data on it's input pins. Of course, these
>chips use very simple parallel interfaces which require
>very large amounts of pins and traces -- even more than
>DRAM interfaces.
Well, DRAM may dropped its shared (direction) I/O,
eventually, maybe.
>As you move to narrower and narrower interfaces, latencies
>tend to suffer
Hmm. Interesting.
Thanks again for sharing.
---------------------------
>Paul A. Clayton (paaronclayton@gmail.com) on 1/6/12 wrote:
>---------------------------
>>Why would tag comparison have significantly greater
>>latency than a random access to any point on the chip?
[snip]
>I've never designed a cache so I don't really know,
Well, I've never designed a circuit, so I just guess from a
few things I've read.
>but I'd say that it's because the cache's are N-way
>associative, which means the tag lookups are a tad more
>complicated than a simple comparisson. Ie, SandyBridge's
>L3 is 12 way associative.
>
>Buffering and muxing all the signals around should also
>weight in heavily.
>
>In the ASIC I'm working on, I got delays as high as ~9 ns
>as the tool tried to fanout a signal into a bunch of
>places spread across ~7 mm, just from all the buffering.
Interesting. Thanks for sharing.
>>I vaguely recall reading that just going off chip can be
>>absurdly expensive in terms of latency (and power).
>
>It tends to. But it also depends on what your context and
>basis of comparison.
>
>An I/O interface with 0.5 ns (2 GHz clock period) scale
>latencies is... very very hard to do, if not impossible.
>
>But an I/O interface with 10 ns (100 MHz clock period)
>scale latencies is doable.
>
>As I mentioned, you can buy 36 Mbit SSRAM chips with < 5
>ns latency (2.5 clocks at 550 MHz). That is, the driving
>chip (ie, FGPA) sets the read request on it's >address/control pins and, 2.5 clocks after (< 5 ns), it
>can latch the data on it's input pins. Of course, these
>chips use very simple parallel interfaces which require
>very large amounts of pins and traces -- even more than
>DRAM interfaces.
Well, DRAM may dropped its shared (direction) I/O,
eventually, maybe.
>As you move to narrower and narrower interfaces, latencies
>tend to suffer
Hmm. Interesting.
Thanks again for sharing.