By: Felid (Felid.delete@this.mailinator.com), November 16, 2012 1:05 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on November 16, 2012 12:23 pm wrote:
[…]
> Moreover, anything
> that is contained in the IQ will be contained in the uop cache. So I'm curious whether
> the IQ is solely for decoupling now, or whether it still works as a loop cache.
Placing LSD logic 2 times (for IQ and IDQ) is a useless waste of area and power.
> Honestly, the relationship between those three structures (instruction queue,
> uop cache, decoded uop buffer) is relatively unclear. I'm guessing:
>
> 1. Small loops operate from the decoded uop buffer, without probing the uop cache or L1I
> 2. Medium instruction footprint code works from the uop cache
> 3. Larger footprint code works from the L1I
> 4. The IQ is largely for decoupling and removing bubbles from L1I fetches
> 5. The decoded uop buffer acts as a decoupling buffer that removes bubbles
> for decoded uops (whether from the uop cache or traditional decoding)
I agree, except for #5. Adding IDQ may add some 0,1% of performance in Conroe and Nhm, but almost useless after mop-cache (with 80% hit rate, as Intel declares). So the only reason to keep it there (and even expand x2 in IB for 1 thread) is power save. In loop-lock mode the core can turn off even mop cache.
> That seems like the most logical arrangement, but I've never had
> a detailed discussion with Intel on this particular topic yet.
Hm, and if I want to discuss this, who should I contact? I'm curious, because I've just made my own detailed article about IB (cores, SMEP, DRNG, 22 nm — everything). And a lot of questions remain unanswered. Can you share some your contacts? :)
[…]
> Moreover, anything
> that is contained in the IQ will be contained in the uop cache. So I'm curious whether
> the IQ is solely for decoupling now, or whether it still works as a loop cache.
Placing LSD logic 2 times (for IQ and IDQ) is a useless waste of area and power.
> Honestly, the relationship between those three structures (instruction queue,
> uop cache, decoded uop buffer) is relatively unclear. I'm guessing:
>
> 1. Small loops operate from the decoded uop buffer, without probing the uop cache or L1I
> 2. Medium instruction footprint code works from the uop cache
> 3. Larger footprint code works from the L1I
> 4. The IQ is largely for decoupling and removing bubbles from L1I fetches
> 5. The decoded uop buffer acts as a decoupling buffer that removes bubbles
> for decoded uops (whether from the uop cache or traditional decoding)
I agree, except for #5. Adding IDQ may add some 0,1% of performance in Conroe and Nhm, but almost useless after mop-cache (with 80% hit rate, as Intel declares). So the only reason to keep it there (and even expand x2 in IB for 1 thread) is power save. In loop-lock mode the core can turn off even mop cache.
> That seems like the most logical arrangement, but I've never had
> a detailed discussion with Intel on this particular topic yet.
Hm, and if I want to discuss this, who should I contact? I'm curious, because I've just made my own detailed article about IB (cores, SMEP, DRNG, 22 nm — everything). And a lot of questions remain unanswered. Can you share some your contacts? :)



