By: nksingh (nksingh.delete@this.live.com), December 8, 2014 5:53 pm
Room: Moderated Discussions
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on December 8, 2014 4:17 am wrote:
> this isn't always twice, far from it (see timings for 16 threads) from GiGNiC's timings:
>
>
> is that it uses a CS internally to lock its state so a single mutex lock/unlock
> > pair requires 4 CS lock/unlocks. you mean a sequence such as
>
> EnterCriticalSection(cs1);
> EnterCriticalSection(cs2);
> LeaveCriticalSection(cs2);
> LeaveCriticalSection(cs1);
>
I wasn't clear in my statement. An acquire or release of std::mutex each requires 1 CS lock and 1 CS unlock so an acquire-release pair for std::mutex requires 4 CS operations. Each CS operation is roughly the same cost if there is no kernel transition.
The effect you see as you increase the thread count appears to reflect two things: the CS implementation does user-mode spinning on MP machines, so up to a point you mostly see the cost of MP contention on the CS state and the std::mutex state itself. When the thread count becomes high enough, we probably start to see blocking on the CS and the cost of invoking the scheduler to wake up sleeping threads. You might even be seeing context switch overheads at that point (i.e. lock convoys).
> I suppose, but it will not explain the timings with reported with wild variations of the speedup factor
I've usually been most interested in the performance of locks that are relatively uncontended since that reflects the best achievable performance given a 'good' application design. Once significant contention happens we make effort to avoid convoys and to put all scheduling operations outside of the locks, but performance definitely becomes more variable since more parts of the system get involved.
>
> >I guess I should have a conversation with the CRT owners about making
> this cheaper so that there's no reason to use your own wrappers.
>
> from your comment I suppose that you work in the VC++ team, right ?
>
I work on the Windows Kernel, so I don't have a ton of influence on the VC team, but we do talk to them from time to time and exchange information.
> this isn't always twice, far from it (see timings for 16 threads) from GiGNiC's timings:
>
> thread count std::mutex time (us) CRITICAL_SECTION time (us) CS speedup
> 2 52002 23001 ~2.26
> 4 448025 106006 ~4.2
> 8 893051 197011 ~4.5
> 16 147516437 440025 ~335
>
>
> is that it uses a CS internally to lock its state so a single mutex lock/unlock
> > pair requires 4 CS lock/unlocks. you mean a sequence such as
>
> EnterCriticalSection(cs1);
> EnterCriticalSection(cs2);
> LeaveCriticalSection(cs2);
> LeaveCriticalSection(cs1);
>
I wasn't clear in my statement. An acquire or release of std::mutex each requires 1 CS lock and 1 CS unlock so an acquire-release pair for std::mutex requires 4 CS operations. Each CS operation is roughly the same cost if there is no kernel transition.
The effect you see as you increase the thread count appears to reflect two things: the CS implementation does user-mode spinning on MP machines, so up to a point you mostly see the cost of MP contention on the CS state and the std::mutex state itself. When the thread count becomes high enough, we probably start to see blocking on the CS and the cost of invoking the scheduler to wake up sleeping threads. You might even be seeing context switch overheads at that point (i.e. lock convoys).
> I suppose, but it will not explain the timings with reported with wild variations of the speedup factor
I've usually been most interested in the performance of locks that are relatively uncontended since that reflects the best achievable performance given a 'good' application design. Once significant contention happens we make effort to avoid convoys and to put all scheduling operations outside of the locks, but performance definitely becomes more variable since more parts of the system get involved.
>
> >I guess I should have a conversation with the CRT owners about making
> this cheaper so that there's no reason to use your own wrappers.
>
> from your comment I suppose that you work in the VC++ team, right ?
>
I work on the Windows Kernel, so I don't have a ton of influence on the VC team, but we do talk to them from time to time and exchange information.