By: David Kanter (dkanter.delete@this.realworldtech.com), September 23, 2007 9:57 am
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 9/23/07 wrote:
>Nah. Algorithmic delay of 8b/10b decoding is equal to 10T >regardless of the size
>of packet. At CSI data rates 10T=1.5ns=lost in noise.
No, it's not. At 6.4GT/s that means you have the effective latency of a 0.64GT/s interface, which is just lousy. Moreover, that latency is additive over every hop. It's not an issue for a large message like a cache line (64-128B), but it's rather problematic for much smaller messages. For example, a NACK is probably a single flit and one of the more common messages. Why would you want to inflict extra latency on the receive and send side, after all the coherency optimizations to avoid it?
>For example, algorithmic
>delay of interlane descewing falls in the same range but >nobody sees it as a problem.
>Of course, there is implementation delay apart from >algorithmic delay but the former
>tend to improve with design generations.
The deskewing has a relatively low probability of delaying a given flit. 8B/10B has a high probability (p=1) of delaying every single flit.
>According to my understanding the real reason for not going with PCIe-on-steroids phy is power rather than delay.
>Current 2.5 GT/s PCIe implementations consume 12-15 mW*s/Gbit. If you try to use
>the same technology in 6 GT/s range it would cost over 20 mW*s/Gbit. Narrow source-synchronous
>parallel link with descewing consumes significantly less power per bit esp. if it doesn't use de-emphasis.
>So Intel decided to play it safe.
That's probably another reason.
>Was it a wise decisions? Up until few months ago I'd say >yes. But recently Rambus
>announced a breakthrough in serdes power efficiency - order >of 1 mW*s/Gbit at data
>rates approaching 5 GT/s. So unless Rambus developers are >missing something important
>Intel's decision to go parallel looks not so wise at the >end.
Intel also had an interesting announcement about low power that came out of Intel's circuits group:
A Scalable 5-15Gbps, 14-75mW Low Power I/O Transceiver in 65nm CMOS
It dissipates ~2-5mW/gb/s, but obviously goes much faster.
DK
>Nah. Algorithmic delay of 8b/10b decoding is equal to 10T >regardless of the size
>of packet. At CSI data rates 10T=1.5ns=lost in noise.
No, it's not. At 6.4GT/s that means you have the effective latency of a 0.64GT/s interface, which is just lousy. Moreover, that latency is additive over every hop. It's not an issue for a large message like a cache line (64-128B), but it's rather problematic for much smaller messages. For example, a NACK is probably a single flit and one of the more common messages. Why would you want to inflict extra latency on the receive and send side, after all the coherency optimizations to avoid it?
>For example, algorithmic
>delay of interlane descewing falls in the same range but >nobody sees it as a problem.
>Of course, there is implementation delay apart from >algorithmic delay but the former
>tend to improve with design generations.
The deskewing has a relatively low probability of delaying a given flit. 8B/10B has a high probability (p=1) of delaying every single flit.
>According to my understanding the real reason for not going with PCIe-on-steroids phy is power rather than delay.
>Current 2.5 GT/s PCIe implementations consume 12-15 mW*s/Gbit. If you try to use
>the same technology in 6 GT/s range it would cost over 20 mW*s/Gbit. Narrow source-synchronous
>parallel link with descewing consumes significantly less power per bit esp. if it doesn't use de-emphasis.
>So Intel decided to play it safe.
That's probably another reason.
>Was it a wise decisions? Up until few months ago I'd say >yes. But recently Rambus
>announced a breakthrough in serdes power efficiency - order >of 1 mW*s/Gbit at data
>rates approaching 5 GT/s. So unless Rambus developers are >missing something important
>Intel's decision to go parallel looks not so wise at the >end.
Intel also had an interesting announcement about low power that came out of Intel's circuits group:
A Scalable 5-15Gbps, 14-75mW Low Power I/O Transceiver in 65nm CMOS
It dissipates ~2-5mW/gb/s, but obviously goes much faster.
DK