By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), July 16, 2015 1:35 pm
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on July 16, 2015 1:55 am wrote:
[snip interesting Xenon reordering]
> The second data-point is the POWER8, from the user manual:
>
> 10.1.22 Store Queue and Store Forwarding
>
>
> the point of coherency in strict program order. Note that this is not dependent on the mode the
> processor or memory page is on. So it seems that IBM has opted for a stronger ordering for POWER8
> (not sure how it compares to x86 as I didn't bother to check how loads are handled).
>
> These are only two data-points but it suggests that the low-hanging fruit
> offered by the weak ordering model is interesting only on simpler designs.
With speculative execution, strict program order committing of store instructions avoids having to distinguish between speculative stores and non-speculative stores (i.e., no preceding exceptions or branch mispredictions that would make the store not commit).
While true out-of-order commit (i.e., irreversible changing of state, not something like Adrian Cristal et al.'s "Out-of-Order Commit Processors", which uses checkpointing) is possible, the modest benefits presumably do not justify the increase in complexity.
A larger-scale dataflow system might benefit from such early commitment, but even with early communication of values to other threads/tasklets it might not be that much more trouble to mark data as speculative and effectively increase the scope of speculation. In the extreme such a system could treat every load as potentially value-predicted. Presumably validation of such large-scale speculation would be more difficult than when speculation and ordering is more constrained such as to a core, but there might be some advantage to distributing the ordering/speculation overhead. Even constraining such to a core but allowing cross-thread out-of-order communication might have some benefit, though I doubt such would be worthwhile for current software.
Speculative data handling could also be used with transactional memory, speculating that a transaction will commit (or at least that the data values will eventually be valid). However, it seems unlikely that extensive speculation would have much benefit unless inter-core (or thread) communication was very low latency.
Of course, if one drops the requirement for determinism or even correctness, certain efficiency optimizations become possible. How approximate or stochastic computation could be abstracted for "average programmers" is left as an exercise for the reader.
[snip interesting Xenon reordering]
> The second data-point is the POWER8, from the user manual:
>
> 10.1.22 Store Queue and Store Forwarding
>
>
The LSU contains a 40-entry store reorder queue (SRQ) that holds real addresses
> and a 40-entry store data queue (SDQ) that holds a quadword of data.
>
>
Stores are removed from the SRQ and SDQ and written to the cache inSo while the core itself is executing stores in any order internally they're written out to
> program order after all the previous instructions are committed.
>
>
> the point of coherency in strict program order. Note that this is not dependent on the mode the
> processor or memory page is on. So it seems that IBM has opted for a stronger ordering for POWER8
> (not sure how it compares to x86 as I didn't bother to check how loads are handled).
>
> These are only two data-points but it suggests that the low-hanging fruit
> offered by the weak ordering model is interesting only on simpler designs.
With speculative execution, strict program order committing of store instructions avoids having to distinguish between speculative stores and non-speculative stores (i.e., no preceding exceptions or branch mispredictions that would make the store not commit).
While true out-of-order commit (i.e., irreversible changing of state, not something like Adrian Cristal et al.'s "Out-of-Order Commit Processors", which uses checkpointing) is possible, the modest benefits presumably do not justify the increase in complexity.
A larger-scale dataflow system might benefit from such early commitment, but even with early communication of values to other threads/tasklets it might not be that much more trouble to mark data as speculative and effectively increase the scope of speculation. In the extreme such a system could treat every load as potentially value-predicted. Presumably validation of such large-scale speculation would be more difficult than when speculation and ordering is more constrained such as to a core, but there might be some advantage to distributing the ordering/speculation overhead. Even constraining such to a core but allowing cross-thread out-of-order communication might have some benefit, though I doubt such would be worthwhile for current software.
Speculative data handling could also be used with transactional memory, speculating that a transaction will commit (or at least that the data values will eventually be valid). However, it seems unlikely that extensive speculation would have much benefit unless inter-core (or thread) communication was very low latency.
Of course, if one drops the requirement for determinism or even correctness, certain efficiency optimizations become possible. How approximate or stochastic computation could be abstracted for "average programmers" is left as an exercise for the reader.