By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), November 14, 2014 1:42 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on November 14, 2014 10:47 am wrote:
>
> The P6 lineage ended with Nehalem. Sandy Bridge is, like P4, an entirely different sort of beast. I would go
> so far as to say that SB is more similar to P4 (another PRF machine with a uop cache) than it is to Nehalem.
Hmm. Only very superficially.
The "uop cache" on Sandybridge is a pre-decoded instruction cache, while the P4 was a trace cache. By calling them both "uop caches" you make them sound much more similar than they actually are.
Also, the instruction decode flow in Nehalem vs SB is actually very similar, despite the added uop cache. I'd say that SB clearly added a uop cache to what looks very much like the Nehalem front-end, which in turn looks fairly clearly like a "improved P6 instruction decoder". Things were comopletely different in P4, with the single decoder.
Also, the memory pipeline - imnsho one of the really fundamental parts of the core - is totally different in P4 vs SB, while SB/NH are clearly related. Sure, SB is a big improvement with the whole dual read ports, so it's not nearly the same, but I think you can clearly see how it's an evolution of the other.
I think the PRF is a relatively minor difference in that picture. Yes, it is obviously very central to the core, and sure, you can "bin" P4 and SB together by saying that they both use a physical register file, but I really think that's a pretty small detail in the big picture.
Linus
>
> The P6 lineage ended with Nehalem. Sandy Bridge is, like P4, an entirely different sort of beast. I would go
> so far as to say that SB is more similar to P4 (another PRF machine with a uop cache) than it is to Nehalem.
Hmm. Only very superficially.
The "uop cache" on Sandybridge is a pre-decoded instruction cache, while the P4 was a trace cache. By calling them both "uop caches" you make them sound much more similar than they actually are.
Also, the instruction decode flow in Nehalem vs SB is actually very similar, despite the added uop cache. I'd say that SB clearly added a uop cache to what looks very much like the Nehalem front-end, which in turn looks fairly clearly like a "improved P6 instruction decoder". Things were comopletely different in P4, with the single decoder.
Also, the memory pipeline - imnsho one of the really fundamental parts of the core - is totally different in P4 vs SB, while SB/NH are clearly related. Sure, SB is a big improvement with the whole dual read ports, so it's not nearly the same, but I think you can clearly see how it's an evolution of the other.
I think the PRF is a relatively minor difference in that picture. Yes, it is obviously very central to the core, and sure, you can "bin" P4 and SB together by saying that they both use a physical register file, but I really think that's a pretty small detail in the big picture.
Linus