By: Adrian (a.delete@this.acm.org), November 7, 2022 4:19 am
Room: Moderated Discussions
hobold (hobold.delete@this.vectorizer.org) on November 6, 2022 3:48 am wrote:
> Adrian (a.delete@this.acm.org) on November 5, 2022 2:00 am wrote:
>
> [...]
> > This method would be optimal, by requiring the least hardware, for code that only uses 512-bit instructions
> > without interleaving them with 256-bit operations (which would be a very stupid programming style).
>
> SMT might interleave instruction streams that use vectors of different width, even
> when each individual program is using a single fixed vector width exclusively.
Good point.
I am still not convinced that improving the throughput for this unusual workload is worth extra hardware.
> Adrian (a.delete@this.acm.org) on November 5, 2022 2:00 am wrote:
>
> [...]
> > This method would be optimal, by requiring the least hardware, for code that only uses 512-bit instructions
> > without interleaving them with 256-bit operations (which would be a very stupid programming style).
>
> SMT might interleave instruction streams that use vectors of different width, even
> when each individual program is using a single fixed vector width exclusively.
Good point.
I am still not convinced that improving the throughput for this unusual workload is worth extra hardware.