By: Michael S (already5chosen.delete@this.yahoo.com), September 23, 2007 11:20 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 9/23/07 wrote:
---------------------------
>Michael S (already5chosen@yahoo.com) on 9/23/07 wrote:
>
>>Nah. Algorithmic delay of 8b/10b decoding is equal to 10T >regardless of the size
>>of packet. At CSI data rates 10T=1.5ns=lost in noise.
>
>No, it's not. At 6.4GT/s that means you have the effective latency of a 0.64GT/s
>interface, which is just lousy.
Why do you say that it is lousy?
Algorithmic part of the delay imposed by 8b/10b is indeed that short. Implementation part of delay shouldn't be very long either - close to zero on xmt side and assuming implementation in CPU-class silicon about 1-2ns on the rcv side. Comparatively to algorithmic+implementation delay of CRC-protected packet it is lost in noise even for the short (80bit?) NACK packets.
Besides, non-systematic 8b/10b encoding is needed only for AC coupling, not for Clock-data recovery itself. When an AC coupling is not desirable CDR could happily live with systematic 8b/10b or even 8b/9b encoding schemes that have zero delay.
> Moreover, that latency is additive over every hop.
>It's not an issue for a large message like a cache line (64-128B), but it's rather
>problematic for much smaller messages. For example, a NACK is probably a single
>flit and one of the more common messages. Why would you want to inflict extra latency
>on the receive and send side, after all the coherency optimizations to avoid it?
>
>>For example, algorithmic
>>delay of interlane descewing falls in the same range but >nobody sees it as a problem.
>>Of course, there is implementation delay apart from >algorithmic delay but the former
>>tend to improve with design generations.
>
>The deskewing has a relatively low probability of delaying a given flit. 8B/10B
>has a high probability (p=1) of delaying every single flit.
>
>>According to my understanding the real reason for not going with PCIe-on-steroids phy is power rather than delay.
>>Current 2.5 GT/s PCIe implementations consume 12-15 mW*s/Gbit. If you try to use
>>the same technology in 6 GT/s range it would cost over 20 mW*s/Gbit. Narrow source-synchronous
>>parallel link with descewing consumes significantly less power per bit esp. if it doesn't use de-emphasis.
>>So Intel decided to play it safe.
>
>That's probably another reason.
>
>>Was it a wise decisions? Up until few months ago I'd say >yes. But recently Rambus
>>announced a breakthrough in serdes power efficiency - order >of 1 mW*s/Gbit at data
>>rates approaching 5 GT/s. So unless Rambus developers are >missing something important
>>Intel's decision to go parallel looks not so wise at the >end.
>
>Intel also had an interesting announcement about low power that came out of Intel's circuits group:
>
>A Scalable 5-15Gbps, 14-75mW Low Power I/O Transceiver in 65nm CMOS
>
>It dissipates ~2-5mW/gb/s, but obviously goes much faster.
>
>DK
Are you sure that Intel listed maximum power numbers? AFAIR, they were looking for ways of reduction of average power consumptions that had little effect on maximum power.
Or may be I am mixing different Intel announcements.
---------------------------
>Michael S (already5chosen@yahoo.com) on 9/23/07 wrote:
>
>>Nah. Algorithmic delay of 8b/10b decoding is equal to 10T >regardless of the size
>>of packet. At CSI data rates 10T=1.5ns=lost in noise.
>
>No, it's not. At 6.4GT/s that means you have the effective latency of a 0.64GT/s
>interface, which is just lousy.
Why do you say that it is lousy?
Algorithmic part of the delay imposed by 8b/10b is indeed that short. Implementation part of delay shouldn't be very long either - close to zero on xmt side and assuming implementation in CPU-class silicon about 1-2ns on the rcv side. Comparatively to algorithmic+implementation delay of CRC-protected packet it is lost in noise even for the short (80bit?) NACK packets.
Besides, non-systematic 8b/10b encoding is needed only for AC coupling, not for Clock-data recovery itself. When an AC coupling is not desirable CDR could happily live with systematic 8b/10b or even 8b/9b encoding schemes that have zero delay.
> Moreover, that latency is additive over every hop.
>It's not an issue for a large message like a cache line (64-128B), but it's rather
>problematic for much smaller messages. For example, a NACK is probably a single
>flit and one of the more common messages. Why would you want to inflict extra latency
>on the receive and send side, after all the coherency optimizations to avoid it?
>
>>For example, algorithmic
>>delay of interlane descewing falls in the same range but >nobody sees it as a problem.
>>Of course, there is implementation delay apart from >algorithmic delay but the former
>>tend to improve with design generations.
>
>The deskewing has a relatively low probability of delaying a given flit. 8B/10B
>has a high probability (p=1) of delaying every single flit.
>
>>According to my understanding the real reason for not going with PCIe-on-steroids phy is power rather than delay.
>>Current 2.5 GT/s PCIe implementations consume 12-15 mW*s/Gbit. If you try to use
>>the same technology in 6 GT/s range it would cost over 20 mW*s/Gbit. Narrow source-synchronous
>>parallel link with descewing consumes significantly less power per bit esp. if it doesn't use de-emphasis.
>>So Intel decided to play it safe.
>
>That's probably another reason.
>
>>Was it a wise decisions? Up until few months ago I'd say >yes. But recently Rambus
>>announced a breakthrough in serdes power efficiency - order >of 1 mW*s/Gbit at data
>>rates approaching 5 GT/s. So unless Rambus developers are >missing something important
>>Intel's decision to go parallel looks not so wise at the >end.
>
>Intel also had an interesting announcement about low power that came out of Intel's circuits group:
>
>A Scalable 5-15Gbps, 14-75mW Low Power I/O Transceiver in 65nm CMOS
>
>It dissipates ~2-5mW/gb/s, but obviously goes much faster.
>
>DK
Are you sure that Intel listed maximum power numbers? AFAIR, they were looking for ways of reduction of average power consumptions that had little effect on maximum power.
Or may be I am mixing different Intel announcements.