By: Michael S (already5chosen.delete@this.yahoo.com), August 29, 2021 3:13 am
Room: Moderated Discussions
Following AVX512-on-not-so-little-cores discussion in the thread below.
What if powerful, Saphire Rapids alike, AVX512 unit is attached to E-cluster as shared co-processor?
The main problem with such approach, the one that made similarly designed AMD Bulldozer a dog, is high latency of scalar and narrow ops. But, may be, it is possibly to process all scalar/128/256b operations locally and only 512bit ops ship to co-processor?
Of course, it's a big wastage of silicon, but nowadays it's not considered a serious disadvantage.
The bigger problem is if it's feasible from point of view of complexity of tracking of OoO resources which raises because architecturally scalar/128/265b registers are same as lower bits of 512b registers.
To me, as layman, the problem looks insurmountable, but I don't know even 1/100th of their tricks.
What if powerful, Saphire Rapids alike, AVX512 unit is attached to E-cluster as shared co-processor?
The main problem with such approach, the one that made similarly designed AMD Bulldozer a dog, is high latency of scalar and narrow ops. But, may be, it is possibly to process all scalar/128/256b operations locally and only 512bit ops ship to co-processor?
Of course, it's a big wastage of silicon, but nowadays it's not considered a serious disadvantage.
The bigger problem is if it's feasible from point of view of complexity of tracking of OoO resources which raises because architecturally scalar/128/265b registers are same as lower bits of 512b registers.
To me, as layman, the problem looks insurmountable, but I don't know even 1/100th of their tricks.