By: Travis (travis.downs.delete@this.gmail.com),
Room: Moderated Discussions
Jeff S. (fakity.delete@this.fake.com) on August 4, 2018 1:46 pm wrote:
> Adrian (a.delete@this.acm.org) on August 3, 2018 7:08 am wrote:
> > Travis (travis.downs.delete@this.gmail.com) on August 2, 2018 11:06 am wrote:
> > > BTW, if anyone has any recommendation on how to boot into a really "quiet" system for
> > > benchmarking, I'd be happy to hear it. Certainly it should be non-GUI, but that's probably
> > > not enough on a mainstream distro since there will still be a lot of background services,
> > > etc, running. Maybe just a cut down distro that defaults to everything off?
> > I suppose that rebooting in single user mode is the best you can do without using some special kernel.
> > On the kinds of operating systems that I am using, this is done with "init
> > 1". On systems with systemd, that is more complex. You should study your local
>
> You can go indefinitely down the rabbit hole, but quite a lot can be accomplished with:
> - disabling C-states in BIOS to keep your system from trying to flush and sleeps its caches
> - booting with the 'isolcpus' kernel parameter to keep some
> set of cores away from the dynamic scheduler entirely
> - booting with tickless/'nohz_full' kernel parameter to get unbounded time quanta on threads given their
> own cores
> - disabling the irqbalance service and pinning IRQ handling to a "dirty" core
> - booting with 'rcu_nocbs' and 'rcu_nocb_poll' to similarly redirect RCU spam to a dirty core
I have already done most of these things. These are important for "on core noise" (i.e., they solve things that affect even core-bound code), but I've got that close to zero anyways. In particular, if you do short tests, 99% of the time you won't get interrupted anyways and you can detect the cases you did and throw them out (although it seems you can't detect SMC interrupts except indirectly).
So I am good in that respect. I'm looking for an approach that reduces noise on the shared part of the system for tests that stress L3 or memory.
Two notes on your list:
1) I didn't find isolcpus to really make any difference in practice for benchmarks. If you have enough cores and pin your task the scheduler will never interrupt it anyways, at least to a first order approximation. I used it for a while, but never really saw any difference and removed it.
2) I found nohz_full to be *worse* than without for my use-case. Now either on or off is fine for small benchmarks as described above, but I saw more variance with nohz_full for longer benchmarks.
Eventually I traced it down to this[1]: even when using nohz_full one CPU has to receive the timer tick periodically to keep track of whatever the kernel needs to keep track of periodically no matter what. If you are using the normal approach (without nohz_full), it is your benchmark process (the only running process) that will usually get the timer tick since it is the active CPU (no point in waking up a different sleeping CPU). This adds some variance, but only has much as the timer tick routine which is fairly efficient (maybe worse now with meltdown/spectre - this was before all that).
If you are using nohz_full, however, it is some other CPU that is woken up (assuming you set the CPU you are benchmarking on as "adaptive tick" to get the full benefit of NOHZ). You'd think that would be better than doing it on the benchmark CPU, but it was worse: the second CPU spinning up caused a turbo transition which takes 10s of microseconds, and then it causes the CPU to run 200 MHz slower for the duration of the interrupt (which is probably much slower than doing it on the current CPU because some cache data may have been lost in sleep, etc).
So with turbo on, it was actually fastest and lower variance just to take your interrupts on the benchmark core than to deal with the turbo ratio gyrating around all the time. With turbo off it might be different but in this case I wanted to run with turbo on.
[1] (this is from memory so should be considered approximate)
> Adrian (a.delete@this.acm.org) on August 3, 2018 7:08 am wrote:
> > Travis (travis.downs.delete@this.gmail.com) on August 2, 2018 11:06 am wrote:
> > > BTW, if anyone has any recommendation on how to boot into a really "quiet" system for
> > > benchmarking, I'd be happy to hear it. Certainly it should be non-GUI, but that's probably
> > > not enough on a mainstream distro since there will still be a lot of background services,
> > > etc, running. Maybe just a cut down distro that defaults to everything off?
> > I suppose that rebooting in single user mode is the best you can do without using some special kernel.
> > On the kinds of operating systems that I am using, this is done with "init
> > 1". On systems with systemd, that is more complex. You should study your local
>
> You can go indefinitely down the rabbit hole, but quite a lot can be accomplished with:
> - disabling C-states in BIOS to keep your system from trying to flush and sleeps its caches
> - booting with the 'isolcpus' kernel parameter to keep some
> set of cores away from the dynamic scheduler entirely
> - booting with tickless/'nohz_full' kernel parameter to get unbounded time quanta on threads given their
> own cores
> - disabling the irqbalance service and pinning IRQ handling to a "dirty" core
> - booting with 'rcu_nocbs' and 'rcu_nocb_poll' to similarly redirect RCU spam to a dirty core
I have already done most of these things. These are important for "on core noise" (i.e., they solve things that affect even core-bound code), but I've got that close to zero anyways. In particular, if you do short tests, 99% of the time you won't get interrupted anyways and you can detect the cases you did and throw them out (although it seems you can't detect SMC interrupts except indirectly).
So I am good in that respect. I'm looking for an approach that reduces noise on the shared part of the system for tests that stress L3 or memory.
Two notes on your list:
1) I didn't find isolcpus to really make any difference in practice for benchmarks. If you have enough cores and pin your task the scheduler will never interrupt it anyways, at least to a first order approximation. I used it for a while, but never really saw any difference and removed it.
2) I found nohz_full to be *worse* than without for my use-case. Now either on or off is fine for small benchmarks as described above, but I saw more variance with nohz_full for longer benchmarks.
Eventually I traced it down to this[1]: even when using nohz_full one CPU has to receive the timer tick periodically to keep track of whatever the kernel needs to keep track of periodically no matter what. If you are using the normal approach (without nohz_full), it is your benchmark process (the only running process) that will usually get the timer tick since it is the active CPU (no point in waking up a different sleeping CPU). This adds some variance, but only has much as the timer tick routine which is fairly efficient (maybe worse now with meltdown/spectre - this was before all that).
If you are using nohz_full, however, it is some other CPU that is woken up (assuming you set the CPU you are benchmarking on as "adaptive tick" to get the full benefit of NOHZ). You'd think that would be better than doing it on the benchmark CPU, but it was worse: the second CPU spinning up caused a turbo transition which takes 10s of microseconds, and then it causes the CPU to run 200 MHz slower for the duration of the interrupt (which is probably much slower than doing it on the current CPU because some cache data may have been lost in sleep, etc).
So with turbo on, it was actually fastest and lower variance just to take your interrupts on the benchmark core than to deal with the turbo ratio gyrating around all the time. With turbo off it might be different but in this case I wanted to run with turbo on.
[1] (this is from memory so should be considered approximate)


