By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), February 25, 2013 5:54 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on February 19, 2013 6:16 pm wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on February 19, 2013 2:10 pm wrote:
> >
> > Why don't you post your methodology and detailed results then? At least I backed up what I said. Your turn.
>
> You're an ass-hat. I was the one who gave real numbers first, you followed up with your "methodology".
>
> You can re-create the thing yourself easily enough with just a trivial
>
> objdump --disassemble vmlinux |
> grep ' j[a-z]* ' |
> sed 's/.* (j[a-z]*) .*/1/' |
> sort |
> uniq -c |
> sort -n
>
> (that grep/sed string has a tab before, and a space after the pattern, you'll need to edit after
> cut-and-pasting it). You'll have to ignore the unconditional "jmp" instructions, of course.
>
> It ends up giving a few false positives if there's anything in the .text segment that isn't
> really text, but that is just a tiny amount of noise, it doesn't change the signal.
>
> And btw, before you start saying that kernels are special and different - you don't need to just look at the
> kernel. You can use the above for just about any binary. For cc1 (the real meat of gcc) I get about 44k instances
> of non-equality instructions (ja, jg, jae, jbe..) and about 210k equality comparisons (jne/je).
>
> Which is pretty much *exactly* the same as for the kernel (ie just over 20% non-equality).
> So it really isn't just a fluke. Same for cc1plus, and just to test something completely
> different, the google chrome binary has 23% non-equality tests.
>
> So shut up already. I gave you the numbers, you were wrong about your 90%+ number. Just admit it.
Right, so you measured the wrong thing altogether and still argue you are right... No surprise there.
If you want to show how useful 4 condition bits are compared to having 1, what you need to measure is the distribution of conditional branches after ALU instructions. Obviously that excludes compares which are unchanged by how many condition bits are used. That's what I did, and that gives you 96% equality.
And btw, having multiple branches after a single compare is very rare as well. In GCC I counted just 122 instances, despite the many switch statements. That's just noise.
So that proves that for compiled code having 1 condition bit is just as powerful as having 4 (or 100...).
Wilco
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on February 19, 2013 2:10 pm wrote:
> >
> > Why don't you post your methodology and detailed results then? At least I backed up what I said. Your turn.
>
> You're an ass-hat. I was the one who gave real numbers first, you followed up with your "methodology".
>
> You can re-create the thing yourself easily enough with just a trivial
>
> objdump --disassemble vmlinux |
> grep ' j[a-z]* ' |
> sed 's/.* (j[a-z]*) .*/1/' |
> sort |
> uniq -c |
> sort -n
>
> (that grep/sed string has a tab before, and a space after the pattern, you'll need to edit after
> cut-and-pasting it). You'll have to ignore the unconditional "jmp" instructions, of course.
>
> It ends up giving a few false positives if there's anything in the .text segment that isn't
> really text, but that is just a tiny amount of noise, it doesn't change the signal.
>
> And btw, before you start saying that kernels are special and different - you don't need to just look at the
> kernel. You can use the above for just about any binary. For cc1 (the real meat of gcc) I get about 44k instances
> of non-equality instructions (ja, jg, jae, jbe..) and about 210k equality comparisons (jne/je).
>
> Which is pretty much *exactly* the same as for the kernel (ie just over 20% non-equality).
> So it really isn't just a fluke. Same for cc1plus, and just to test something completely
> different, the google chrome binary has 23% non-equality tests.
>
> So shut up already. I gave you the numbers, you were wrong about your 90%+ number. Just admit it.
Right, so you measured the wrong thing altogether and still argue you are right... No surprise there.
If you want to show how useful 4 condition bits are compared to having 1, what you need to measure is the distribution of conditional branches after ALU instructions. Obviously that excludes compares which are unchanged by how many condition bits are used. That's what I did, and that gives you 96% equality.
And btw, having multiple branches after a single compare is very rare as well. In GCC I counted just 122 instances, despite the many switch statements. That's just noise.
So that proves that for compiled code having 1 condition bit is just as powerful as having 4 (or 100...).
Wilco