By: Matt Lohmann (mlohmann.delete@this.noemail.com), May 13, 2022 5:21 am
Room: Moderated Discussions
> In fact the E cores (at least to judge by the patents, though this may be at the A15 or
> even later level, not A14/M1) are capable of two full 512b vector-vector ops per cycle.
Is the performance per clock cycle of the per-cluster vector engine for the E cores different from the performance per clock cycle of the per-cluster vector engine for the P cores in the Apple M1?
How many 512-bit vector arithmetic operations per clock cycle can be performed by the per-cluster vector engine for the P cores in the Apple M1?
Can the per-cluster vector engines in the Apple M1 overlap 512-bit loads and stores with arithmetic operations?
Are fp64 fused multiply-adds (a * b + c) supported in either the per-cluster vector engines or the CPU cores in the Apple M1?
Are 256-bit vector operations performed in each P core or only in the shared vector engine for a cluster of P cores in the Apple M1?
Thank you in advance to anyone who can answer any of these questions.
> even later level, not A14/M1) are capable of two full 512b vector-vector ops per cycle.
Is the performance per clock cycle of the per-cluster vector engine for the E cores different from the performance per clock cycle of the per-cluster vector engine for the P cores in the Apple M1?
How many 512-bit vector arithmetic operations per clock cycle can be performed by the per-cluster vector engine for the P cores in the Apple M1?
Can the per-cluster vector engines in the Apple M1 overlap 512-bit loads and stores with arithmetic operations?
Are fp64 fused multiply-adds (a * b + c) supported in either the per-cluster vector engines or the CPU cores in the Apple M1?
Are 256-bit vector operations performed in each P core or only in the shared vector engine for a cluster of P cores in the Apple M1?
Thank you in advance to anyone who can answer any of these questions.