Any high performance weakly ordered CPU from the past two decades or so have the silicon necessary for TSO - if you put a release memory barrier after every store, then the memory operations are TSO more or less.

An ISA that allows weak ordering does not require it. No ISA has ever specified that memory operations must be reordered. An implementation is free to make it as strong as it likes, and correspondingly weaken or no-op the barrier instructions.

Few take up the opportunity. There's no question weaker ordering allows more flexibility in implementation, what matters is what it costs to make stronger ordering run as fast.

Right. But there is no reason they couldn't have made the processor always operate with an x86-like memory model and optimize that, even before their x86 emulation foray. They chose not to.
