By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), May 6, 2013 9:00 pm
Room: Moderated Discussions
Page 2:
For "pseudo-least recently used (LRU)", I prefer "(pLRU)" (though I have seen "(PLRU)"). (This would also mean changing the statement "full ECC protection and pseudo-LRU replacement" [on page 7], which is the only use of the LRU abbreviation and perhaps the reason for not using the more complete pLRU abbreviation [which is less commonly used and perhaps more difficult to remember].)
(I am guessing this is binary-tree-based.)
Page 3:
"When the overriding predictor correctly predicts a branch, there is no pipeline flush." might be better as something like: "When the overriding predictor agrees with a correct BTB prediction, there is no pipeline flush."
As usual, this was a nice, detailed microarchitecture article!
With respect to Silvermont's microarchitecture, I am somewhat disappointed by the dropping of SMT. This decision probably makes sense for tablets, phones, and some embedded uses, but it might be less good for server workloads (though SMT is less useful in a narrow [and somewhat shallow] OoO microarchitecture). (Since, as far as I know, Intel's SMT implementation does not support software setting of thread priority, embedded uses that could have real-time software benefits from multithreading presumably cannot fully exploit such multithreading benefits.) I admit, I am irrationally fond of multithreading.
It is interesting that copying-based renaming is used (similar to the PentiumPro). One would assume that the branch misprediction recovery (and other?) benefits would outweigh the copying cost, but it does seem an interesting choice even for a more moderate OoO microarchitecture.
The somewhat large fully associative L1 TLBs also seem interesting. I would have guessed that such would be more power hungry than comparable-in-performance alternatives, but presumably Intel ran the appropriate simulations for various alternatives (even using a sectored TLB)--or perhaps layout or other constraints made "better" alternatives impractical.
For "pseudo-least recently used (LRU)", I prefer "(pLRU)" (though I have seen "(PLRU)"). (This would also mean changing the statement "full ECC protection and pseudo-LRU replacement" [on page 7], which is the only use of the LRU abbreviation and perhaps the reason for not using the more complete pLRU abbreviation [which is less commonly used and perhaps more difficult to remember].)
(I am guessing this is binary-tree-based.)
Page 3:
"When the overriding predictor correctly predicts a branch, there is no pipeline flush." might be better as something like: "When the overriding predictor agrees with a correct BTB prediction, there is no pipeline flush."
As usual, this was a nice, detailed microarchitecture article!
With respect to Silvermont's microarchitecture, I am somewhat disappointed by the dropping of SMT. This decision probably makes sense for tablets, phones, and some embedded uses, but it might be less good for server workloads (though SMT is less useful in a narrow [and somewhat shallow] OoO microarchitecture). (Since, as far as I know, Intel's SMT implementation does not support software setting of thread priority, embedded uses that could have real-time software benefits from multithreading presumably cannot fully exploit such multithreading benefits.) I admit, I am irrationally fond of multithreading.
It is interesting that copying-based renaming is used (similar to the PentiumPro). One would assume that the branch misprediction recovery (and other?) benefits would outweigh the copying cost, but it does seem an interesting choice even for a more moderate OoO microarchitecture.
The somewhat large fully associative L1 TLBs also seem interesting. I would have guessed that such would be more power hungry than comparable-in-performance alternatives, but presumably Intel ran the appropriate simulations for various alternatives (even using a sectored TLB)--or perhaps layout or other constraints made "better" alternatives impractical.