By: , May 28, 2013 4:27 pm
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 28, 2013 3:58 pm wrote:
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 28, 2013 12:45 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on May 28, 2013 12:14 pm wrote:
> > > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 28, 2013 9:00 am wrote:
> > > >
> > > > If Silvermont uses RSV banks for every execution unit function, and the execution units
> > > > simply "sucks" in the next instruction to execute as the previous one is finished, what
> > > > is there to prevent dependancy errors? Sorry, I'm quite a bit new to CPU architecture
> > > > and want to learn lots, so I apologize if this is a very newbish question.
> > >
> > > That's a good question. Reservation stations perform dependency checking:
> > >
> > > "...each distributed scheduler will dispatch the oldest, ready to execute µop to the appropriate port."
> > >
> > > http://www.realworldtech.com/silvermont/5/
> > >
> > > So if the oldest instruction is still waiting on a register, then the next oldest
> > > will be chosen. If all 8 entries are waiting, then nothing is sent.
> > >
> > > The memory RSV is a little different though.
> > >
> > > David
> >
> > Ah, so I guess the scheduler only dispatches instructions that it knows it can complete to the RSV. Good to
> > know, this seems like itd be very good for parallelism to
> > be able to keep as many ALUs functions busy as possible
>
> No, instructions are sent to the reservation station in program order,
> as soon as there is a free slot in the reservation station.
>
> It's the reservation station who checks for dependencies and, when it notices one of
> the instruction it's holding had its dependencies resolved, sends it for execution.
>
> > instead of relying on one unified scheduler to release
> > instructions through a limited number of ports... Why
> > does this design seem better than the design in Haswell? Maybe its me not thinking straight.
>
> Each Saltwell reservation station can only hold certain types
> of instructions and only 8 of them (6 for load/store).
> Unless the instruction type and dependency mix is just right (which
> mostly, it won't be), Saltwell will easily reach situations where:
> a) some reservation stations will go mostly unused (ie, FP stations in a integer mostly program)
>
> b) one of the stations fills up and generates back-pressure into the common path (rename buffer, re-order
> buffer), stopping instructions from flowing into the other, not yet full, reservation stations
>
> c) instructions without dependency which could be executed exist,
> but are just outside the small 8 instruction look ahead window.
>
> Haswell's unified scheduler provides a 60 instruction look ahead window and does
> for almost any instruction mix (short of pathological/artificial cases).
> It's far more robust, performance wise.
>
> Of course, it's physical implementation also has to be far
> more complex. Thus, Intel avoiding it for Saltwell.
>
>
> >
> > If you dont mind, I have another, more general question about
> > CPU architecture, I hope you dont mind all my questions.
> >
> > I know that a CPU will be fed instructions as it gets copied to RAM, to L3, to L2, to L1, and finally
> > into the registers as it performs the instruction on the data, but my question is; how does the
> > data make its way from RAM to register? Surely it isnt pumped through the scheduler, right?
>
> Though the load unit, of course.
>
>
Oh thank you, that helps explain. So it seems that the reservation station has a "brain" of it's own and doesn't just act as a mindless queue. Interesting, and innovative.
I understand what you are saying about RSV statements going unused or being filled up or otherwise not being totally optimal; though, to my understanding, doesnt Haswell go the traditional route of using the Unified Scheduler to send the instructions through ports directly to the ALU functions? I mean, I'm sure it's more than robust enough to have atleast one instruction to run through atleast most of the time; but can't it only dispatch one instruction per clk? Or can it send dispatches through any combination of open ports the second instructions become ready?
The load unit? I've looked over the Silvermont article and found little mention of it-- maybe I'm skipping over it or my brain just isnt comprehending it under a different name or something... Could you please point me to the right direction as to where I could learn more about the load unit?
Thank you very much for your explanations, they're very clear and I'm learning a lot!
> Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 28, 2013 12:45 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on May 28, 2013 12:14 pm wrote:
> > > Sebastian Soeiro (sebastian_2896.delete@this.hotmail.com) on May 28, 2013 9:00 am wrote:
> > > >
> > > > If Silvermont uses RSV banks for every execution unit function, and the execution units
> > > > simply "sucks" in the next instruction to execute as the previous one is finished, what
> > > > is there to prevent dependancy errors? Sorry, I'm quite a bit new to CPU architecture
> > > > and want to learn lots, so I apologize if this is a very newbish question.
> > >
> > > That's a good question. Reservation stations perform dependency checking:
> > >
> > > "...each distributed scheduler will dispatch the oldest, ready to execute µop to the appropriate port."
> > >
> > > http://www.realworldtech.com/silvermont/5/
> > >
> > > So if the oldest instruction is still waiting on a register, then the next oldest
> > > will be chosen. If all 8 entries are waiting, then nothing is sent.
> > >
> > > The memory RSV is a little different though.
> > >
> > > David
> >
> > Ah, so I guess the scheduler only dispatches instructions that it knows it can complete to the RSV. Good to
> > know, this seems like itd be very good for parallelism to
> > be able to keep as many ALUs functions busy as possible
>
> No, instructions are sent to the reservation station in program order,
> as soon as there is a free slot in the reservation station.
>
> It's the reservation station who checks for dependencies and, when it notices one of
> the instruction it's holding had its dependencies resolved, sends it for execution.
>
> > instead of relying on one unified scheduler to release
> > instructions through a limited number of ports... Why
> > does this design seem better than the design in Haswell? Maybe its me not thinking straight.
>
> Each Saltwell reservation station can only hold certain types
> of instructions and only 8 of them (6 for load/store).
> Unless the instruction type and dependency mix is just right (which
> mostly, it won't be), Saltwell will easily reach situations where:
> a) some reservation stations will go mostly unused (ie, FP stations in a integer mostly program)
>
> b) one of the stations fills up and generates back-pressure into the common path (rename buffer, re-order
> buffer), stopping instructions from flowing into the other, not yet full, reservation stations
>
> c) instructions without dependency which could be executed exist,
> but are just outside the small 8 instruction look ahead window.
>
> Haswell's unified scheduler provides a 60 instruction look ahead window and does
> for almost any instruction mix (short of pathological/artificial cases).
> It's far more robust, performance wise.
>
> Of course, it's physical implementation also has to be far
> more complex. Thus, Intel avoiding it for Saltwell.
>
>
> >
> > If you dont mind, I have another, more general question about
> > CPU architecture, I hope you dont mind all my questions.
> >
> > I know that a CPU will be fed instructions as it gets copied to RAM, to L3, to L2, to L1, and finally
> > into the registers as it performs the instruction on the data, but my question is; how does the
> > data make its way from RAM to register? Surely it isnt pumped through the scheduler, right?
>
> Though the load unit, of course.
>
>
Oh thank you, that helps explain. So it seems that the reservation station has a "brain" of it's own and doesn't just act as a mindless queue. Interesting, and innovative.
I understand what you are saying about RSV statements going unused or being filled up or otherwise not being totally optimal; though, to my understanding, doesnt Haswell go the traditional route of using the Unified Scheduler to send the instructions through ports directly to the ALU functions? I mean, I'm sure it's more than robust enough to have atleast one instruction to run through atleast most of the time; but can't it only dispatch one instruction per clk? Or can it send dispatches through any combination of open ports the second instructions become ready?
The load unit? I've looked over the Silvermont article and found little mention of it-- maybe I'm skipping over it or my brain just isnt comprehending it under a different name or something... Could you please point me to the right direction as to where I could learn more about the load unit?
Thank you very much for your explanations, they're very clear and I'm learning a lot!