By: Jonathan Kang (johnbk.delete@this.gmail.com), September 25, 2007 7:05 am
Room: Moderated Discussions
8B/10B Latency (dwhess@banishedsouls.org) on 9/22/07 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 9/22/07 wrote:
>---------------------------
>>Jigal (jigal2@gmail.com) on 9/22/07 wrote:
>>---------------------------
>>>Hi there,
>>>
>>>Being a newbie I throw myself at the mercy of the forum.
>>
>>Welcome - I think you'll find we're a pretty merciful bunch.
>
>Burn him! jk :)
>
>>>Small question - how come they didn't leverage the PCI >Express and needed a new
>>>bus (excuse me, p2p interconnect) altogether?
>>
>>PCI Express isn't coherent, it's also fairly high latency since it uses 8B/10B clock encoding.
>
>How much latency does 8B/10B encoding contribute? If I am reading this correctly,
>Lattice has a programmable logic implementation optimized for throughput with only
>2 clocks of latency on the encoder and 3 clocks on the decoder when working exclusively with serial bit streams:
>
>http://www.latticesemi.com/dynamic/view_document.cfm?document_id=5653
>
>CSI would require a factor of 60 speed up of course. I admittedly have never had
>to deal with the logic design for high speed 8B/10B. Could the shift from a PLD
>design to a full custom one yield that large a difference?
>
>I suspect there is not a good reason to use 8B/10B encoding and sacrifice 20% of
>your throughput where a clock can be made available unless you want to support AC
>coupling like Hypertransport 3 where they do list lower throughput and higher latency
>as a disadvantage for this type of operation.
>
It's not the encode/decode latency, that can easily be done one clock (assuming no pipelining) with fast enough gates. It's the clock recovery. A PLL to recover a serial clock from the data-stream takes hundreds of cycle times. This effectively gives you two options:
1. Keep the link active at all times such that the receive PLL never loses sync. This burns power. This provides the low-latency requirement though.
2. Turn the link off, and when it turns on, spend hundreds of cycle-times to recover the clock from the data-stream.
Then there's issues of synchronization to the local clock of the receiver, which generally takes a few clock cycles but I suppose you'd have to do that with a scheme like CSI as well.
---------------------------
>David Kanter (dkanter@realworldtech.com) on 9/22/07 wrote:
>---------------------------
>>Jigal (jigal2@gmail.com) on 9/22/07 wrote:
>>---------------------------
>>>Hi there,
>>>
>>>Being a newbie I throw myself at the mercy of the forum.
>>
>>Welcome - I think you'll find we're a pretty merciful bunch.
>
>Burn him! jk :)
>
>>>Small question - how come they didn't leverage the PCI >Express and needed a new
>>>bus (excuse me, p2p interconnect) altogether?
>>
>>PCI Express isn't coherent, it's also fairly high latency since it uses 8B/10B clock encoding.
>
>How much latency does 8B/10B encoding contribute? If I am reading this correctly,
>Lattice has a programmable logic implementation optimized for throughput with only
>2 clocks of latency on the encoder and 3 clocks on the decoder when working exclusively with serial bit streams:
>
>http://www.latticesemi.com/dynamic/view_document.cfm?document_id=5653
>
>CSI would require a factor of 60 speed up of course. I admittedly have never had
>to deal with the logic design for high speed 8B/10B. Could the shift from a PLD
>design to a full custom one yield that large a difference?
>
>I suspect there is not a good reason to use 8B/10B encoding and sacrifice 20% of
>your throughput where a clock can be made available unless you want to support AC
>coupling like Hypertransport 3 where they do list lower throughput and higher latency
>as a disadvantage for this type of operation.
>
It's not the encode/decode latency, that can easily be done one clock (assuming no pipelining) with fast enough gates. It's the clock recovery. A PLL to recover a serial clock from the data-stream takes hundreds of cycle times. This effectively gives you two options:
1. Keep the link active at all times such that the receive PLL never loses sync. This burns power. This provides the low-latency requirement though.
2. Turn the link off, and when it turns on, spend hundreds of cycle-times to recover the clock from the data-stream.
Then there's issues of synchronization to the local clock of the receiver, which generally takes a few clock cycles but I suppose you'd have to do that with a scheme like CSI as well.