By: none (none.delete@this.none.com), October 3, 2015 4:37 am
Room: Moderated Discussions
Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on October 3, 2015 4:37 am wrote:
> none (none.delete@this.none.com) on October 3, 2015 4:11 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on October 3, 2015 4:02 am wrote:
> > > none (none.delete@this.none.com) on October 3, 2015 2:04 am wrote:
> > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on October 2, 2015 5:06 pm wrote:
> > > > [...]
> > > > > GCC does do a lot of function calls. Not sure whether there are performance counters that can count
> > > > > load vs LDP, but a static count should give a reasonable idea anyway given GCC is not loop heavy.
> > > >
> > > > You don't need performance counters on real hardware for this kind of measures, you can
> > > > use a fast simulator.
> > > >
> > > > On 403.gcc compiled with way:
> > > > gcc-linaro-4.9-2015.02-3-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
> > > > -DSPEC_CPU_LP64 -DSPEC_CPU -Ofast -mcpu=cortex-a57 -static
> > > >
> > > > The 9 inputs total ~947B instructions. Among them 201B are loads and 74B are stores.
> > > > Among these ld/st, ~36B are LDP and ~38B are STP. Most of them are memset/memcpy and
> > > > function prologues/epilogues.
> > >
> > > Note GCC 4.9 doesn't have general LDP/STP enabled, so GCC 5
> > > or latest trunk will show even more LDP/STP instructions.
> >
> > Do you mean FSF trunk? I might give it a try then, though if a precompiled one exists
> > somewhere, that'd help.
>
> I don't think anyone provides trunk builds as developers build their own (easy
> if you build native). But here is the upcoming 5.1 release from Linaro:
>
> http://snapshots.linaro.org/components/toolchain/binaries/5.1-2015.08-rc2/
Thanks.
So with that compiler, I get:
- ~929B instructions
- ~193B LD + ~68B ST
- ~37B LDP + ~38B STP
- ~22B dc zva
> none (none.delete@this.none.com) on October 3, 2015 4:11 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on October 3, 2015 4:02 am wrote:
> > > none (none.delete@this.none.com) on October 3, 2015 2:04 am wrote:
> > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on October 2, 2015 5:06 pm wrote:
> > > > [...]
> > > > > GCC does do a lot of function calls. Not sure whether there are performance counters that can count
> > > > > load vs LDP, but a static count should give a reasonable idea anyway given GCC is not loop heavy.
> > > >
> > > > You don't need performance counters on real hardware for this kind of measures, you can
> > > > use a fast simulator.
> > > >
> > > > On 403.gcc compiled with way:
> > > > gcc-linaro-4.9-2015.02-3-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
> > > > -DSPEC_CPU_LP64 -DSPEC_CPU -Ofast -mcpu=cortex-a57 -static
> > > >
> > > > The 9 inputs total ~947B instructions. Among them 201B are loads and 74B are stores.
> > > > Among these ld/st, ~36B are LDP and ~38B are STP. Most of them are memset/memcpy and
> > > > function prologues/epilogues.
> > >
> > > Note GCC 4.9 doesn't have general LDP/STP enabled, so GCC 5
> > > or latest trunk will show even more LDP/STP instructions.
> >
> > Do you mean FSF trunk? I might give it a try then, though if a precompiled one exists
> > somewhere, that'd help.
>
> I don't think anyone provides trunk builds as developers build their own (easy
> if you build native). But here is the upcoming 5.1 release from Linaro:
>
> http://snapshots.linaro.org/components/toolchain/binaries/5.1-2015.08-rc2/
Thanks.
So with that compiler, I get:
- ~929B instructions
- ~193B LD + ~68B ST
- ~37B LDP + ~38B STP
- ~22B dc zva