By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), May 30, 2013 8:05 am
Room: Moderated Discussions
Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 29, 2013 10:16 pm wrote:
> - Good to know that I have Haswell's scheduling system basically understood. However, I have one question;
> Haswell has four ports, yet somewhere in Haswell's article, it said that the scheduler can dispatch 8
> uops per clk in ideal conditions, whilst Sandy Bridge, with it's 6 ports, was only allowed 6 uops per
> clk in ideal conditions. So I understand most of what is being said, but what I don't understand is that
> 4 ports = 8 uops and 3 ports = 6 uops. It seems that 2 uops can be sent through one port per clk? Though
> I don't believe this is mentioned anywhere in the article. This confuses me; if you don't mind, could
> you please tell me why it seems that the ports have double the width they are made out to have?
Maybe it's not clear from David's diagrams, but it's explicitly written in the text: Haswell has 8 ports.
>
> - Your Silvermont link gives me a good idea more or less of what's going on. If I have a decent understanding
> of things, though some of the things you say confuse me a bit. What I understand from this diagram is that:
> The data prefetcher requests data, and it looks for this data in the L2 cache/L3 cache/RAM using the DTLB,
> where this data flows over to the store buffers and into the data cache for the execution units to use?
>
> A few things confuse me; and sorry for the mass of questions, I appreciate the help.
> - You say that only the path from L1 cache to the registers is shown, yet I
> see a path from L2 to registers. Am I wrong somewhere? Also, I do not see the
> RSV stations in the core diagram; perhaps they are omitted to save space?
> - Is it me, or does the labelled store buffer actually serve
> the function of a load buffer and a store buffer?
> - The AGU generates tags for the location of where data is stored in the caches
> and memory for use in the DTLB by flowing completed instructions to the ROB to notify
> the TLB of changes in data position or state. Is my understanding correct?
Wow. Nope.
First, put aside the prefetcher and the store buffers.
The AGU computes a _logical address_.
Then the DTLB computes a _physical address_, along with other bits of information, by performing a look up in the page table.
All that is fed to the L1 D$.
If it doesn't have the cache line, the L1 D$ will fetch it from the L2 $.
If L1 D$ does have the line, it will make the requested data available.
(I really see no path between the registers and the L2 $, by the way)
The store buffer is used to buffer stores, before they're actually commited to a cache line. The memory logic also checks the contents of the store buffer when performing a load.
Weather it comes from the L1 data cache or the store buffer, information is then routed to the registers via some data path (the think line going to the Integer Rename Buffers in Extremetech's diagram).
The prefetcher is an auxiliary, which tracks memory access patterns and prefetches caches lines which haven't been requested but (it hopes) will be needed soon.
> - Good to know that I have Haswell's scheduling system basically understood. However, I have one question;
> Haswell has four ports, yet somewhere in Haswell's article, it said that the scheduler can dispatch 8
> uops per clk in ideal conditions, whilst Sandy Bridge, with it's 6 ports, was only allowed 6 uops per
> clk in ideal conditions. So I understand most of what is being said, but what I don't understand is that
> 4 ports = 8 uops and 3 ports = 6 uops. It seems that 2 uops can be sent through one port per clk? Though
> I don't believe this is mentioned anywhere in the article. This confuses me; if you don't mind, could
> you please tell me why it seems that the ports have double the width they are made out to have?
Maybe it's not clear from David's diagrams, but it's explicitly written in the text: Haswell has 8 ports.
>
> - Your Silvermont link gives me a good idea more or less of what's going on. If I have a decent understanding
> of things, though some of the things you say confuse me a bit. What I understand from this diagram is that:
> The data prefetcher requests data, and it looks for this data in the L2 cache/L3 cache/RAM using the DTLB,
> where this data flows over to the store buffers and into the data cache for the execution units to use?
>
> A few things confuse me; and sorry for the mass of questions, I appreciate the help.
> - You say that only the path from L1 cache to the registers is shown, yet I
> see a path from L2 to registers. Am I wrong somewhere? Also, I do not see the
> RSV stations in the core diagram; perhaps they are omitted to save space?
> - Is it me, or does the labelled store buffer actually serve
> the function of a load buffer and a store buffer?
> - The AGU generates tags for the location of where data is stored in the caches
> and memory for use in the DTLB by flowing completed instructions to the ROB to notify
> the TLB of changes in data position or state. Is my understanding correct?
Wow. Nope.
First, put aside the prefetcher and the store buffers.
The AGU computes a _logical address_.
Then the DTLB computes a _physical address_, along with other bits of information, by performing a look up in the page table.
All that is fed to the L1 D$.
If it doesn't have the cache line, the L1 D$ will fetch it from the L2 $.
If L1 D$ does have the line, it will make the requested data available.
(I really see no path between the registers and the L2 $, by the way)
The store buffer is used to buffer stores, before they're actually commited to a cache line. The memory logic also checks the contents of the store buffer when performing a load.
Weather it comes from the L1 data cache or the store buffer, information is then routed to the registers via some data path (the think line going to the Integer Rename Buffers in Extremetech's diagram).
The prefetcher is an auxiliary, which tracks memory access patterns and prefetches caches lines which haven't been requested but (it hopes) will be needed soon.