By: anon (anon.delete@this.anon.com), August 24, 2014 6:21 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on August 24, 2014 11:11 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on August 24, 2014 11:06 am wrote:
> > anon (anon.delete@this.anon.com) on August 22, 2014 5:50 pm wrote:
> > > I really don't think following a store with a load to the same location does as
> > > much as you think. I doubt it does *anything* that you can rely on, actually.
> >
> > I think we may be conflating x86 ordering rules and PCI[e] ordering rules in this discussion.
> > For PCIe a load will indeed flush preceding stores as Michael assumes. IIRC in x86 the load can
> > hit the store buffer leading to exactly the behavior you described in the rest of your post.
I will give him more credit than that! I attribute it to poor documentation. Without existing experience, it is difficult to understand the 8.2. If you also miss the clarification in the mfence instruction in a different part of the document that uses different wording, it's easy to be confused.
>
> One other remark: Keep in mind that a speculative core has to "hold" all stores in local buffers
> until the corresponding uop retires. If loading from the same address did indeed impose visibility/ordering
> constraints on the store then that would have require the OoO backend to be flushed up to at least
> the store. In other words, it would have basically the same cost as a fence.
Very true, although you could also have a non-speculative local store buffer after instruction completion. Not sure if anybody actually does that. But you're right that ordering cost of raw could defeat most benefit of store forwarding on a deep oooe pipeline.
> Patrick Chase (patrickjchase.delete@this.gmail.com) on August 24, 2014 11:06 am wrote:
> > anon (anon.delete@this.anon.com) on August 22, 2014 5:50 pm wrote:
> > > I really don't think following a store with a load to the same location does as
> > > much as you think. I doubt it does *anything* that you can rely on, actually.
> >
> > I think we may be conflating x86 ordering rules and PCI[e] ordering rules in this discussion.
> > For PCIe a load will indeed flush preceding stores as Michael assumes. IIRC in x86 the load can
> > hit the store buffer leading to exactly the behavior you described in the rest of your post.
I will give him more credit than that! I attribute it to poor documentation. Without existing experience, it is difficult to understand the 8.2. If you also miss the clarification in the mfence instruction in a different part of the document that uses different wording, it's easy to be confused.
>
> One other remark: Keep in mind that a speculative core has to "hold" all stores in local buffers
> until the corresponding uop retires. If loading from the same address did indeed impose visibility/ordering
> constraints on the store then that would have require the OoO backend to be flushed up to at least
> the store. In other words, it would have basically the same cost as a fence.
Very true, although you could also have a non-speculative local store buffer after instruction completion. Not sure if anybody actually does that. But you're right that ordering cost of raw could defeat most benefit of store forwarding on a deep oooe pipeline.