By: Maynard Handley (name99.delete@this.name99.org), January 20, 2014 3:25 am
Room: Moderated Discussions
Exophase (exophase.delete@this.gmail.com) on January 19, 2014 11:01 pm wrote:
> Maynard Handley (name99.delete@this.name99.org) on January 19, 2014 10:26 pm wrote:
> > - another ramp up in branch prediction, from the current about 98% accurate to about 99% accurate. This
> > is probably doable with the newest branch predictors, like TAGE --- but again, Intel turnaround time.
>
> Do you have a reference for 99% prediction accuracy? I'm looking at what I think is the correct paper
> (http://www.jilp.org/vol9/v9paper6.pdf) and they're citing a rate of 3.314 mispredictions per 1000
> instructions over several benchmark traces. The individual numbers vary tremendously per benchmark,
> as you'd expect. It doesn't say how many instructions per 1K are branches so we can't deduce prediction
> accuracy from this but even with a very generous 20% branch mix it'd still only be 98.343% accurate.
> I expect the actual average branch mix to be closer to 15%, which would be under 98%.
It seems like you're looking at the same sort of papers I am. The basic TAGE scheme can be augmented in various ways, like
http://hal.archives-ouvertes.fr/docs/00/63/91/93/PDF/MICRO44_Andre_Seznec.pdf
along with various finicky details about exactly what gets updated when.
I believe in one of the most recent Seznec papers I saw a figure of 99%, but of course this is dependent on your benchmark set. My point is not so much the exact figure of 99% but that it looks like there is still practical scope (i.e. with realistic amounts of area and power), if you are prepared to put in the effort, to about halve mispredict rate compared to what we do today.
One thing I have not yet had time to look into closely is the state of art in indirect jump prediction. I could well believe this is something Apple care about (and have invested in) rather more than Intel. Sure, Intel care about C++ and Java, but they also care about a lot of other stuff, whereas Apple cares pretty much exclusively about Objective C, which means that when you hit message dispatch you want that thing to run like a bat out of hell.
The obvious thing I was thinking of (as I say, I need some time to do reading on this) is something somewhat like gshare.
We operate on the assumption that the PC of a particular indirect jump is not, by itself, very useful in predicting the target address because pretty much every Objective C dispatch goes through the same routine (which does a table lookup based on the class+method, and jumps to what's in the table). In this respect, ObjC is less helpful than C++, where for many if not most VTable dispatches, the target last time at this PC is the same as the target this time. (But ObjC does match how this also plays out with most interpreted languages.)
But if you had history of, say, at least the five earlier function-call PC's that led to this address, now you have some useful history for predicting where the branch will go. Which suggests, along with the history vector used by directional branch prediction, which gets updated on every directional branch, we have a similar such "vector" which consists of something like an xor of the last N (five? eleven? who knows what the optimal number is) PCs of branching (again, is the best to note all branch PCs, or just nested function calls?). Anyway this "hash" of the last N branch/function call points, (perhaps again xored with the directional branch vector) gives us a key into a branch target table.
Something like this, it seems to me, has the potential to do a good job on ObjC dispatch as well as C++, and probably most of your interpreted languages including Java.
> Maynard Handley (name99.delete@this.name99.org) on January 19, 2014 10:26 pm wrote:
> > - another ramp up in branch prediction, from the current about 98% accurate to about 99% accurate. This
> > is probably doable with the newest branch predictors, like TAGE --- but again, Intel turnaround time.
>
> Do you have a reference for 99% prediction accuracy? I'm looking at what I think is the correct paper
> (http://www.jilp.org/vol9/v9paper6.pdf) and they're citing a rate of 3.314 mispredictions per 1000
> instructions over several benchmark traces. The individual numbers vary tremendously per benchmark,
> as you'd expect. It doesn't say how many instructions per 1K are branches so we can't deduce prediction
> accuracy from this but even with a very generous 20% branch mix it'd still only be 98.343% accurate.
> I expect the actual average branch mix to be closer to 15%, which would be under 98%.
It seems like you're looking at the same sort of papers I am. The basic TAGE scheme can be augmented in various ways, like
http://hal.archives-ouvertes.fr/docs/00/63/91/93/PDF/MICRO44_Andre_Seznec.pdf
along with various finicky details about exactly what gets updated when.
I believe in one of the most recent Seznec papers I saw a figure of 99%, but of course this is dependent on your benchmark set. My point is not so much the exact figure of 99% but that it looks like there is still practical scope (i.e. with realistic amounts of area and power), if you are prepared to put in the effort, to about halve mispredict rate compared to what we do today.
One thing I have not yet had time to look into closely is the state of art in indirect jump prediction. I could well believe this is something Apple care about (and have invested in) rather more than Intel. Sure, Intel care about C++ and Java, but they also care about a lot of other stuff, whereas Apple cares pretty much exclusively about Objective C, which means that when you hit message dispatch you want that thing to run like a bat out of hell.
The obvious thing I was thinking of (as I say, I need some time to do reading on this) is something somewhat like gshare.
We operate on the assumption that the PC of a particular indirect jump is not, by itself, very useful in predicting the target address because pretty much every Objective C dispatch goes through the same routine (which does a table lookup based on the class+method, and jumps to what's in the table). In this respect, ObjC is less helpful than C++, where for many if not most VTable dispatches, the target last time at this PC is the same as the target this time. (But ObjC does match how this also plays out with most interpreted languages.)
But if you had history of, say, at least the five earlier function-call PC's that led to this address, now you have some useful history for predicting where the branch will go. Which suggests, along with the history vector used by directional branch prediction, which gets updated on every directional branch, we have a similar such "vector" which consists of something like an xor of the last N (five? eleven? who knows what the optimal number is) PCs of branching (again, is the best to note all branch PCs, or just nested function calls?). Anyway this "hash" of the last N branch/function call points, (perhaps again xored with the directional branch vector) gives us a key into a branch target table.
Something like this, it seems to me, has the potential to do a good job on ObjC dispatch as well as C++, and probably most of your interpreted languages including Java.