By: dmcq (dmcq.delete@this.fano.co.uk), March 9, 2015 1:14 am
Room: Moderated Discussions
none (none.delete@this.none.com) on March 9, 2015 1:10 am wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on March 7, 2015 9:26 am wrote:
> > none (none.delete@this.none.com) on March 6, 2015 5:50 am wrote:
> > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on March 6, 2015 5:04 am wrote:
> > > [...]
> > > > Btw while on the topic of benchmarks, you might want to edit your recent article about CoreMark-Pro
> > > > - the one subtest you claimed is most representative spends almost all of its time (~80%) in one
> > > > tiny string function. So it's trivial to claim 5x gains with library optimizations...
> > >
> > > Funny given that EEMBC AndEBenchPro, which contains CoreMark Pro, seems to be using some
> > > dedicated Intel libraries, in particular for string handling.
> >
> > Which functions are in those libraries?
>
> libcoremark_intel_atom.so
> T __cacheSize
> T __divdi3
> T _intel_fast_memcmp
> T _intel_fast_memset
> T _intel_fast_memset.A
> T _intel_fast_memset.H
> T _intel_fast_memset.J
> T _intel_fast_memset.M
> T _intel_fast_memset.P
> T __intel_memset
> T __intel_new_memset
> T __intel_new_memset_P3
> T __intel_sse2_memset
> T __intel_sse2_rep_memset
> T __intel_sse2_strcat
> T __intel_sse2_strncmp
> T __intel_sse2_strtok
> T __intel_ssse3_strcpy
> T Java_com_eembc_andebench_CoreMark_coremarkJNIPEAK
> T Java_com_eembc_andebench_CoreMark_stoptestJNIPEAK
> T Java_com_eembc_andebench_PeakNativeBenchmarks_storageJNI
> T Java_com_eembc_coremark_CoreMark_coremarkJNI
> T Java_com_eembc_coremark_CoreMark_stoptestJNI
> T Java_com_eembc_coremark_Scenario1_storageJNI
> T __umoddi3
>
> libmithsp_intel_atom.so
> T __cacheSize
> T _intel_fast_memcmp
> T _intel_fast_memcpy
> T _intel_fast_memcpy.A
> T _intel_fast_memcpy.H
> T _intel_fast_memcpy.J
> T _intel_fast_memcpy.M
> T _intel_fast_memcpy.P
> T _intel_fast_memmove
> T _intel_fast_memmove.A
> T _intel_fast_memmove.M
> T _intel_fast_memmove.P
> T _intel_fast_memset
> T _intel_fast_memset.A
> T _intel_fast_memset.H
> T _intel_fast_memset.J
> T _intel_fast_memset.M
> T _intel_fast_memset.P
> T __intel_memcpy
> T __intel_memset
> T __intel_new_memcpy
> T __intel_new_memcpy_P3
> T __intel_new_memset
> T __intel_new_memset_P3
> T __intel_sse2_memset
> T __intel_sse2_rep_memset
> T __intel_sse2_strcat
> T __intel_sse2_strchr
> T __intel_sse2_strcspn
> T __intel_sse2_strdup
> T __intel_sse2_strlen
> T __intel_sse2_strncmp
> T __intel_sse2_strpbrk
> T __intel_sse2_strspn
> T __intel_ssse3_memcpy
> T __intel_ssse3_memmove
> T __intel_ssse3_rep_memcpy
> T __intel_ssse3_rep_memmove
> T __intel_ssse3_strcpy
> T __intel_ssse3_strncpy
> T Java_com_eembc_andebench_PeakNativeBenchmarks_cjpegtemplate
> T Java_com_eembc_andebench_PeakNativeBenchmarks_linearalgmid100x100sp
> T Java_com_eembc_andebench_PeakNativeBenchmarks_loopsallmid10ksp
> T Java_com_eembc_andebench_PeakNativeBenchmarks_mbmemtest
> T Java_com_eembc_andebench_PeakNativeBenchmarks_parser500k
> T Java_com_eembc_andebench_PeakNativeBenchmarks_shatest
> T Java_com_eembc_andebench_PeakNativeBenchmarks_ziptest
> T Java_com_eembc_coremark_Scenario1_cjpegtemplate
> T Java_com_eembc_coremark_Scenario1_linearalgmid100x100sp
> T Java_com_eembc_coremark_Scenario1_loopsallmid10ksp
> T Java_com_eembc_coremark_Scenario1_mbmemtest
> T Java_com_eembc_coremark_Scenario1_parser500k
> T Java_com_eembc_coremark_Scenario1_shatest
> T Java_com_eembc_coremark_Scenario1_ziptest
> T __svml_exp2f4
> B __svml_feature_flag
> T __svml_feature_flag_init
>
> > > Are going to find out this is similar to the AnTuTu/icc story?
> >
> > If the AndEBenchPro executable is built with icc, then that is quite likely. Even if that
> > isn't the case, and they all used the same GCC version with identical options, it looks like
> > there has been some library tuning for x86 that isn't matched on other architectures.
>
> Some of the function names above definitely look like they come from Intel own libraries,
> but it doesn't look like the code was compiled with icc.
>
> I couldn't find any evidence of dedicated string functions in ARM code.
Well I think a reasonable integer benchmark should spend some time in the string functions, and if optimised versions are used without too much fiddling then kudos to that architecture's support. The real question is how much of the test is just string functions and do they use a real world type of sample of them or is the whole test just a check on how fast say memcpy has been coded?
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on March 7, 2015 9:26 am wrote:
> > none (none.delete@this.none.com) on March 6, 2015 5:50 am wrote:
> > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on March 6, 2015 5:04 am wrote:
> > > [...]
> > > > Btw while on the topic of benchmarks, you might want to edit your recent article about CoreMark-Pro
> > > > - the one subtest you claimed is most representative spends almost all of its time (~80%) in one
> > > > tiny string function. So it's trivial to claim 5x gains with library optimizations...
> > >
> > > Funny given that EEMBC AndEBenchPro, which contains CoreMark Pro, seems to be using some
> > > dedicated Intel libraries, in particular for string handling.
> >
> > Which functions are in those libraries?
>
> libcoremark_intel_atom.so
> T __cacheSize
> T __divdi3
> T _intel_fast_memcmp
> T _intel_fast_memset
> T _intel_fast_memset.A
> T _intel_fast_memset.H
> T _intel_fast_memset.J
> T _intel_fast_memset.M
> T _intel_fast_memset.P
> T __intel_memset
> T __intel_new_memset
> T __intel_new_memset_P3
> T __intel_sse2_memset
> T __intel_sse2_rep_memset
> T __intel_sse2_strcat
> T __intel_sse2_strncmp
> T __intel_sse2_strtok
> T __intel_ssse3_strcpy
> T Java_com_eembc_andebench_CoreMark_coremarkJNIPEAK
> T Java_com_eembc_andebench_CoreMark_stoptestJNIPEAK
> T Java_com_eembc_andebench_PeakNativeBenchmarks_storageJNI
> T Java_com_eembc_coremark_CoreMark_coremarkJNI
> T Java_com_eembc_coremark_CoreMark_stoptestJNI
> T Java_com_eembc_coremark_Scenario1_storageJNI
> T __umoddi3
>
> libmithsp_intel_atom.so
> T __cacheSize
> T _intel_fast_memcmp
> T _intel_fast_memcpy
> T _intel_fast_memcpy.A
> T _intel_fast_memcpy.H
> T _intel_fast_memcpy.J
> T _intel_fast_memcpy.M
> T _intel_fast_memcpy.P
> T _intel_fast_memmove
> T _intel_fast_memmove.A
> T _intel_fast_memmove.M
> T _intel_fast_memmove.P
> T _intel_fast_memset
> T _intel_fast_memset.A
> T _intel_fast_memset.H
> T _intel_fast_memset.J
> T _intel_fast_memset.M
> T _intel_fast_memset.P
> T __intel_memcpy
> T __intel_memset
> T __intel_new_memcpy
> T __intel_new_memcpy_P3
> T __intel_new_memset
> T __intel_new_memset_P3
> T __intel_sse2_memset
> T __intel_sse2_rep_memset
> T __intel_sse2_strcat
> T __intel_sse2_strchr
> T __intel_sse2_strcspn
> T __intel_sse2_strdup
> T __intel_sse2_strlen
> T __intel_sse2_strncmp
> T __intel_sse2_strpbrk
> T __intel_sse2_strspn
> T __intel_ssse3_memcpy
> T __intel_ssse3_memmove
> T __intel_ssse3_rep_memcpy
> T __intel_ssse3_rep_memmove
> T __intel_ssse3_strcpy
> T __intel_ssse3_strncpy
> T Java_com_eembc_andebench_PeakNativeBenchmarks_cjpegtemplate
> T Java_com_eembc_andebench_PeakNativeBenchmarks_linearalgmid100x100sp
> T Java_com_eembc_andebench_PeakNativeBenchmarks_loopsallmid10ksp
> T Java_com_eembc_andebench_PeakNativeBenchmarks_mbmemtest
> T Java_com_eembc_andebench_PeakNativeBenchmarks_parser500k
> T Java_com_eembc_andebench_PeakNativeBenchmarks_shatest
> T Java_com_eembc_andebench_PeakNativeBenchmarks_ziptest
> T Java_com_eembc_coremark_Scenario1_cjpegtemplate
> T Java_com_eembc_coremark_Scenario1_linearalgmid100x100sp
> T Java_com_eembc_coremark_Scenario1_loopsallmid10ksp
> T Java_com_eembc_coremark_Scenario1_mbmemtest
> T Java_com_eembc_coremark_Scenario1_parser500k
> T Java_com_eembc_coremark_Scenario1_shatest
> T Java_com_eembc_coremark_Scenario1_ziptest
> T __svml_exp2f4
> B __svml_feature_flag
> T __svml_feature_flag_init
>
> > > Are going to find out this is similar to the AnTuTu/icc story?
> >
> > If the AndEBenchPro executable is built with icc, then that is quite likely. Even if that
> > isn't the case, and they all used the same GCC version with identical options, it looks like
> > there has been some library tuning for x86 that isn't matched on other architectures.
>
> Some of the function names above definitely look like they come from Intel own libraries,
> but it doesn't look like the code was compiled with icc.
>
> I couldn't find any evidence of dedicated string functions in ARM code.
Well I think a reasonable integer benchmark should spend some time in the string functions, and if optimised versions are used without too much fiddling then kudos to that architecture's support. The real question is how much of the test is just string functions and do they use a real world type of sample of them or is the whole test just a check on how fast say memcpy has been coded?