By: Jonathan Kang (johnbk.delete@this.gmail.com), September 25, 2007 8:15 am
Room: Moderated Discussions
David W. Hess (dwhess@banishedsouls.org) on 9/22/07 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 9/22/07 wrote:
>---------------------------
>>>>>Small question - how come they didn't leverage the PCI >Express and needed a new
>>>>>bus (excuse me, p2p interconnect) altogether?
>>>>
>>>>PCI Express isn't coherent, it's also fairly high latency since it uses 8B/10B clock encoding.
>>>
>>>How much latency does 8B/10B encoding contribute? If I am >reading this correctly,
>>>Lattice has a programmable logic implementation optimized >for throughput with only
>>>2 clocks of latency on the encoder and 3 clocks on the >decoder when working exclusively with serial bit streams:
>>>
>>>http://www.latticesemi.com/dynamic/view_document.cfm?document_id=5653
>>
>>So I think the issue is that you'd need at least 10 data transfers to occur before
>>you can get usable data extracted from the symbols.
>
>That is what I thought at first before reading about the Lattice design which is
>pipelined and operates at the serial level. As near as I can tell, the added latency
>including the encoding and decoding is 5 clocks and the symbols are decoded one
>bit at a time which is what caught my attention. Just a cursorily study of the
>5B/6B and 3B/4B code tables convinces me that I must have misunderstood something.
>Now I am going to be up all night thinking about it.
>
8b/10b works by encoding the bottom 5 bits and upper 3 bits separately. That's why there's two blocks. The 5 bits are used to determine "class" (28, 25, etc.) whereas the 3 bits are used to determine polarity of the 10 bit word (+1, -1, neutral) after the 5 bits are encoded already.
The thing missing from just looking at the link-layer of a serdes connection (where 8b/10b is done) is that you're ignoring the physical layer latency involved in any serdes (especially high-speed). When you're recovering a data-word to achieve word-alignment, you're assuming that you've recovered the proper bit-alignment already (the PLL has locked and you have a recovered clock). This is true in most cases since in a serdes like PCI Express or Fiber-channel, the link is always active. Idle characters (k28.5 I believe) is sent across the link when it isn't transmitting data. On power-up, a PLL for a gigabit serial stream usually takes hundreds of cycle times to achieve synchronization.
This obviously makes it unsuitable for a link where one wants to shut down power when the link isn't being used.
---------------------------
>David Kanter (dkanter@realworldtech.com) on 9/22/07 wrote:
>---------------------------
>>>>>Small question - how come they didn't leverage the PCI >Express and needed a new
>>>>>bus (excuse me, p2p interconnect) altogether?
>>>>
>>>>PCI Express isn't coherent, it's also fairly high latency since it uses 8B/10B clock encoding.
>>>
>>>How much latency does 8B/10B encoding contribute? If I am >reading this correctly,
>>>Lattice has a programmable logic implementation optimized >for throughput with only
>>>2 clocks of latency on the encoder and 3 clocks on the >decoder when working exclusively with serial bit streams:
>>>
>>>http://www.latticesemi.com/dynamic/view_document.cfm?document_id=5653
>>
>>So I think the issue is that you'd need at least 10 data transfers to occur before
>>you can get usable data extracted from the symbols.
>
>That is what I thought at first before reading about the Lattice design which is
>pipelined and operates at the serial level. As near as I can tell, the added latency
>including the encoding and decoding is 5 clocks and the symbols are decoded one
>bit at a time which is what caught my attention. Just a cursorily study of the
>5B/6B and 3B/4B code tables convinces me that I must have misunderstood something.
>Now I am going to be up all night thinking about it.
>
8b/10b works by encoding the bottom 5 bits and upper 3 bits separately. That's why there's two blocks. The 5 bits are used to determine "class" (28, 25, etc.) whereas the 3 bits are used to determine polarity of the 10 bit word (+1, -1, neutral) after the 5 bits are encoded already.
The thing missing from just looking at the link-layer of a serdes connection (where 8b/10b is done) is that you're ignoring the physical layer latency involved in any serdes (especially high-speed). When you're recovering a data-word to achieve word-alignment, you're assuming that you've recovered the proper bit-alignment already (the PLL has locked and you have a recovered clock). This is true in most cases since in a serdes like PCI Express or Fiber-channel, the link is always active. Idle characters (k28.5 I believe) is sent across the link when it isn't transmitting data. On power-up, a PLL for a gigabit serial stream usually takes hundreds of cycle times to achieve synchronization.
This obviously makes it unsuitable for a link where one wants to shut down power when the link isn't being used.