By: Travis Downs (travis.downs.delete@this.gmail.com), March 24, 2019 5:34 pm
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on February 28, 2019 9:54 am wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on February 27, 2019 7:25 am wrote:
> >
> > You can reverse the behavior, showing the max value by setting W_MAX envvar to 1, like so:
> >
> >
> >
> > If you have the chance to run the CoffeeLake test again like this I would be very interested.
> >
>
>
> I have done the test again on the Coffee Lake.
>
> I have repeated it a few times without seeing anything except maybe a shift by
> about half cycle towards higher values. There was still no value over 10.
>
>
> Then I opened a file manager and I have started to do random clicks on directories or files.
>
> This caused immediately a few 18 cycle values, one precisely when I opened the file manager
> and a few others exactly when doing various clicks. Not all clicks had effects.
>
>
> So I assume that this happens due to some interaction with the cache activity
> of the other threads. I still wonder why with the old microcode it happened so
> reproducible at certain positions even without much activity on the computer.
>
>
>
> Like I have written in the other message, with W_MAX=1 and the old microcode there were some extra neighboring
> positions where the 18 cycle values appeared frequently, but not reproducible when the test was repeated.
Interesting. I saw a similar effect on Skylake (client) with the old microcodes: it usually ran fast, but in the specific case that I loaded all 4 CPUs with 'stress -c 4', the test would always run in slow mode. Loading only 3 CPUs only rarely triggered slow mode (probably when actually 4 CPUs were active due to some background activity).
The weird thing is that this only had to be true at the start of the test: once it started slow, every iteration would usually be slow even if I killed the extra load. The opposite was also true: if I started the load in the middle of the test it would stay fast. The effect also remained when I ran the test process at the highest priority of --rr 99, so it would always monopolize the CPU regardless of any other load.
I don't have any explanation for this effect, and whether it is hardware or OS related or what. I had some theory that if the CPU was not idle right as the test process started that it somehow would behave different, hence needing to load all 4 CPUs which guarantees no idle CPUs, but then I would expect a test where I loaded CPU 0, and the started with test with affinity forced to CPU 0 would also show the effect ... but it didn't.
Thanks for your help, Adrian!
> Travis Downs (travis.downs.delete@this.gmail.com) on February 27, 2019 7:25 am wrote:
> >
> > You can reverse the behavior, showing the max value by setting W_MAX envvar to 1, like so:
> >
> >
W_MAX=1 ./offset-test.sh
> >
> > If you have the chance to run the CoffeeLake test again like this I would be very interested.
> >
>
>
> I have done the test again on the Coffee Lake.
>
> I have repeated it a few times without seeing anything except maybe a shift by
> about half cycle towards higher values. There was still no value over 10.
>
>
> Then I opened a file manager and I have started to do random clicks on directories or files.
>
> This caused immediately a few 18 cycle values, one precisely when I opened the file manager
> and a few others exactly when doing various clicks. Not all clicks had effects.
>
>
> So I assume that this happens due to some interaction with the cache activity
> of the other threads. I still wonder why with the old microcode it happened so
> reproducible at certain positions even without much activity on the computer.
>
>
>
> Like I have written in the other message, with W_MAX=1 and the old microcode there were some extra neighboring
> positions where the 18 cycle values appeared frequently, but not reproducible when the test was repeated.
Interesting. I saw a similar effect on Skylake (client) with the old microcodes: it usually ran fast, but in the specific case that I loaded all 4 CPUs with 'stress -c 4', the test would always run in slow mode. Loading only 3 CPUs only rarely triggered slow mode (probably when actually 4 CPUs were active due to some background activity).
The weird thing is that this only had to be true at the start of the test: once it started slow, every iteration would usually be slow even if I killed the extra load. The opposite was also true: if I started the load in the middle of the test it would stay fast. The effect also remained when I ran the test process at the highest priority of --rr 99, so it would always monopolize the CPU regardless of any other load.
I don't have any explanation for this effect, and whether it is hardware or OS related or what. I had some theory that if the CPU was not idle right as the test process started that it somehow would behave different, hence needing to load all 4 CPUs which guarantees no idle CPUs, but then I would expect a test where I loaded CPU 0, and the started with test with affinity forced to CPU 0 would also show the effect ... but it didn't.
Thanks for your help, Adrian!