By: NoSpammer (no.delete@this.spam.com), August 26, 2015 6:33 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 26, 2015 4:23 am wrote:
> As to mentioning AVX-1024 for current hardware, that's the whole point! They should (IMHO)
> have skipped AVX512 completely and to jump straight from AVX256 to at least AVX1024.
It's understandable that Intel was trying to be pragmatic here and match the cache line size of the mainstream CPUs.
> Personally, I'd prefer even wider registers (4096 bit sounds about right) with exactly the same ISA
> supported over all market segments with varying width of actual execution units. Something like 64b
> DP/128b SP execution units on phones/tablets, 256b DP/SP on mainstream laptop/desktop/E3s, either
> 128b or 256b on E5/E7, 1024b DP/SP on HPC parts, 1024b DP/2048b SP on imaging/military parts.
> However, if all we are looking for is an HPC competitor for Maxwell-based Teslas, then straight-forward
> 1024-bit registers backed by 1024-bit execution units will do the trick.
If 1024-bit execution is such a perfect sweet spot then two 512-bit execution units will do the trick, too, won't they?
> As to mentioning AVX-1024 for current hardware, that's the whole point! They should (IMHO)
> have skipped AVX512 completely and to jump straight from AVX256 to at least AVX1024.
It's understandable that Intel was trying to be pragmatic here and match the cache line size of the mainstream CPUs.
> Personally, I'd prefer even wider registers (4096 bit sounds about right) with exactly the same ISA
> supported over all market segments with varying width of actual execution units. Something like 64b
> DP/128b SP execution units on phones/tablets, 256b DP/SP on mainstream laptop/desktop/E3s, either
> 128b or 256b on E5/E7, 1024b DP/SP on HPC parts, 1024b DP/2048b SP on imaging/military parts.
> However, if all we are looking for is an HPC competitor for Maxwell-based Teslas, then straight-forward
> 1024-bit registers backed by 1024-bit execution units will do the trick.
If 1024-bit execution is such a perfect sweet spot then two 512-bit execution units will do the trick, too, won't they?