By: Stubabe (email@example.com), November 15, 2012 4:14 am
Room: Moderated Discussions
> But on paper that loop should only take 2 clk/loop on sandy due to co-issue of the fused branch,
> it takes 3 on Sandy (and 2 on Ivy) due to the loop buffer penalty of 1 clk/iteration
Sorry I was thinking of MOVDQA it should be 3 on Sandy. But the loop buffer can issue max 4 uops/clk to the renamer + 1 penalty clock so minimum loop time is 2 clocks irrespective of what the backend does with it.