By: Brett (ggtgp.delete@this.yahoo.com), July 27, 2022 11:07 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on July 27, 2022 8:15 am wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on July 26, 2022 2:27 pm wrote:
> > Bill G (bill.delete@this.g.com) on July 25, 2022 10:06 pm wrote:
> > > Adrian (a.delete@this.acm.org) on July 16, 2022 11:19 pm wrote:
> > > > for the computations dominated by the use of floating-point numbers or big integer numbers,
> > > > the AMD and Intel CPUs remain without competition (except from GPUs).
> > >
> > > What is it about AMD and Intel CPUs that make them good for computations on big integers? ARM
> > > has add, subtract with carry flag instructions (ADC, SBC) and multiply, multiply-accumulate producing
> > > the full width product (MULL, MLAL). x86 has similar instructions (ADCX, ADOX, MULX).
> >
> > If I wanted the equivalent of the ADCX ADOX instruction on ARM I'd probably
> > go for an instruction that added three integers and produced two integers
> > - the sum and a high part - and not bother with the flags register at all.
>
>
> That is certainly the best instruction from the POV of a software developer.
>
> In general, all the base operations needed to implement big number
> computations have 3 input operands and 2 output operands.
>
> For addition/subtraction, at input you have 2 word operands and
> carry in, while at output you have 1 word result and carry out.
>
> For multiplication, at input you have 2 words that are multiplied and 1 word that is
> added, while at output you have 2 words, for the low and high parts of the product.
>
> For division, at input you have 2 words that are concatenated into the dividend and 1 word
> as the divisor, while at output you have 2 words, the quotient and the remainder.
>
> For an optimal implementation of the big number algorithms, the best
> is for all 5 operands to be arbitrary general-purpose registers.
>
>
> However, most CPU or ISA designers have been reluctant to provide 5 register fields in the instruction
> encoding and/or 5 ports in the register file, so most ISA's use various tricks like implicit registers
> or shared input/output registers, in order to have less than 5 explicit registers per instruction.
MulHigh gives the extra result register and is combined with the previous MulLow instruction.
> dmcq (dmcq.delete@this.fano.co.uk) on July 26, 2022 2:27 pm wrote:
> > Bill G (bill.delete@this.g.com) on July 25, 2022 10:06 pm wrote:
> > > Adrian (a.delete@this.acm.org) on July 16, 2022 11:19 pm wrote:
> > > > for the computations dominated by the use of floating-point numbers or big integer numbers,
> > > > the AMD and Intel CPUs remain without competition (except from GPUs).
> > >
> > > What is it about AMD and Intel CPUs that make them good for computations on big integers? ARM
> > > has add, subtract with carry flag instructions (ADC, SBC) and multiply, multiply-accumulate producing
> > > the full width product (MULL, MLAL). x86 has similar instructions (ADCX, ADOX, MULX).
> >
> > If I wanted the equivalent of the ADCX ADOX instruction on ARM I'd probably
> > go for an instruction that added three integers and produced two integers
> > - the sum and a high part - and not bother with the flags register at all.
>
>
> That is certainly the best instruction from the POV of a software developer.
>
> In general, all the base operations needed to implement big number
> computations have 3 input operands and 2 output operands.
>
> For addition/subtraction, at input you have 2 word operands and
> carry in, while at output you have 1 word result and carry out.
>
> For multiplication, at input you have 2 words that are multiplied and 1 word that is
> added, while at output you have 2 words, for the low and high parts of the product.
>
> For division, at input you have 2 words that are concatenated into the dividend and 1 word
> as the divisor, while at output you have 2 words, the quotient and the remainder.
>
> For an optimal implementation of the big number algorithms, the best
> is for all 5 operands to be arbitrary general-purpose registers.
>
>
> However, most CPU or ISA designers have been reluctant to provide 5 register fields in the instruction
> encoding and/or 5 ports in the register file, so most ISA's use various tricks like implicit registers
> or shared input/output registers, in order to have less than 5 explicit registers per instruction.
MulHigh gives the extra result register and is combined with the previous MulLow instruction.
Topic | Posted By | Date |
---|---|---|
Yitian 710 | anonymous2 | 2021/10/20 08:57 PM |
Yitian 710 | Adrian | 2021/10/21 12:20 AM |
Yitian 710 | Wilco | 2021/10/21 03:47 AM |
Yitian 710 | Rayla | 2021/10/21 05:52 AM |
Yitian 710 | Wilco | 2021/10/21 11:59 AM |
Yitian 710 | anon2 | 2021/10/21 05:16 PM |
Yitian 710 | Wilco | 2022/07/16 12:21 PM |
Yitian 710 | Anon | 2022/07/16 08:22 PM |
Yitian 710 | Rayla | 2022/07/17 09:10 AM |
Yitian 710 | Anon | 2022/07/17 12:04 PM |
Yitian 710 | Rayla | 2022/07/17 12:08 PM |
Yitian 710 | Wilco | 2022/07/17 01:16 PM |
Yitian 710 | Anon | 2022/07/17 01:32 PM |
Yitian 710 | Wilco | 2022/07/17 02:22 PM |
Yitian 710 | Anon | 2022/07/17 02:47 PM |
Yitian 710 | Wilco | 2022/07/17 03:50 PM |
Yitian 710 | Anon | 2022/07/17 08:46 PM |
Yitian 710 | Wilco | 2022/07/18 03:01 AM |
Yitian 710 | Anon | 2022/07/19 11:21 AM |
Yitian 710 | Wilco | 2022/07/19 06:15 PM |
Yitian 710 | Anon | 2022/07/21 01:25 AM |
Yitian 710 | none | 2022/07/21 01:49 AM |
Yitian 710 | Anon | 2022/07/21 03:03 AM |
Yitian 710 | none | 2022/07/21 04:34 AM |
Yitian 710 | James | 2022/07/21 02:29 AM |
Yitian 710 | Anon | 2022/07/21 03:05 AM |
Yitian 710 | Wilco | 2022/07/21 04:31 AM |
Yitian 710 | Anon | 2022/07/21 05:17 AM |
Yitian 710 | Wilco | 2022/07/21 05:33 AM |
Yitian 710 | Anon | 2022/07/21 05:50 AM |
Yitian 710 | Wilco | 2022/07/21 06:07 AM |
Yitian 710 | Anon | 2022/07/21 06:20 AM |
Yitian 710 | Wilco | 2022/07/21 10:02 AM |
Yitian 710 | Anon | 2022/07/21 10:22 AM |
Yitian 710 | Adrian | 2022/07/17 11:09 PM |
Yitian 710 | Wilco | 2022/07/18 01:15 AM |
Yitian 710 | Adrian | 2022/07/18 02:35 AM |
Yitian 710 | Adrian | 2022/07/16 11:19 PM |
Computations on Big Integers | Bill G | 2022/07/25 10:06 PM |
Computations on Big Integers | none | 2022/07/25 11:35 PM |
x86 MUL 64x64 | Eric Fink | 2022/07/26 01:06 AM |
x86 MUL 64x64 | Adrian | 2022/07/26 02:27 AM |
x86 MUL 64x64 | none | 2022/07/26 02:38 AM |
x86 MUL 64x64 | Jörn Engel | 2022/07/26 10:17 AM |
x86 MUL 64x64 | Linus Torvalds | 2022/07/27 10:13 AM |
x86 MUL 64x64 | ⚛ | 2022/07/28 09:40 AM |
x86 MUL 64x64 | Jörn Engel | 2022/07/28 10:18 AM |
More than 3 registers per instruction | -.- | 2022/07/28 07:01 PM |
More than 3 registers per instruction | Anon | 2022/07/28 10:39 PM |
More than 3 registers per instruction | Jörn Engel | 2022/07/28 10:42 PM |
More than 3 registers per instruction | -.- | 2022/07/29 04:31 AM |
Computations on Big Integers | Bill G | 2022/07/26 01:40 AM |
Computations on Big Integers | none | 2022/07/26 02:17 AM |
Computations on Big Integers | Bill G | 2022/07/26 03:52 AM |
Computations on Big Integers | --- | 2022/07/26 09:57 AM |
Computations on Big Integers | Adrian | 2022/07/26 02:53 AM |
Computations on Big Integers | Bill G | 2022/07/26 03:39 AM |
Computations on Big Integers | Adrian | 2022/07/26 04:21 AM |
Computations on Big Integers in Apple AMX Units | Bill G | 2022/07/26 04:28 AM |
Computations on Big Integers in Apple AMX Units | Adrian | 2022/07/26 05:13 AM |
Typo | Adrian | 2022/07/26 05:20 AM |
IEEE binary64 is 53 bits rather than 52. (NT) | Michael S | 2022/07/26 05:34 AM |
IEEE binary64 is 53 bits rather than 52. | Adrian | 2022/07/26 07:32 AM |
IEEE binary64 is 53 bits rather than 52. | Michael S | 2022/07/26 10:02 AM |
IEEE binary64 is 53 bits rather than 52. | Adrian | 2022/07/27 06:58 AM |
IEEE binary64 is 53 bits rather than 52. | none | 2022/07/27 07:14 AM |
IEEE binary64 is 53 bits rather than 52. | Adrian | 2022/07/27 07:55 AM |
Thanks a lot for the link to the article! (NT) | none | 2022/07/27 08:09 AM |
Typo | zArchJon | 2022/07/26 09:51 AM |
Typo | Michael S | 2022/07/26 10:25 AM |
Typo | zArchJon | 2022/07/26 11:52 AM |
Typo | Michael S | 2022/07/26 01:02 PM |
Computations on Big Integers | Michael S | 2022/07/26 05:55 AM |
Computations on Big Integers | Adrian | 2022/07/26 07:59 AM |
IFMA and Division | Bill G | 2022/07/26 04:25 PM |
IFMA and Division | rwessel | 2022/07/26 08:16 PM |
IFMA and Division | Adrian | 2022/07/27 07:25 AM |
Computations on Big Integers | none | 2022/07/27 01:22 AM |
Big integer multiplication with vector IFMA | Bill G | 2022/07/29 01:06 AM |
Big integer multiplication with vector IFMA | Adrian | 2022/07/29 01:35 AM |
Big integer multiplication with vector IFMA | -.- | 2022/07/29 04:32 AM |
Big integer multiplication with vector IFMA | Adrian | 2022/07/29 09:47 PM |
Big integer multiplication with vector IFMA | Anon | 2022/07/30 08:12 AM |
Big integer multiplication with vector IFMA | Adrian | 2022/07/30 09:27 AM |
AVX-512 unfriendly to heter-performance cores | Paul A. Clayton | 2022/07/31 03:20 PM |
AVX-512 unfriendly to heter-performance cores | Anon | 2022/07/31 03:33 PM |
AVX-512 unfriendly to heter-performance cores | anonymou5 | 2022/07/31 05:03 PM |
AVX-512 unfriendly to heter-performance cores | Brett | 2022/07/31 07:26 PM |
AVX-512 unfriendly to heter-performance cores | Adrian | 2022/08/01 01:45 AM |
Why can't E-cores have narrow/slow AVX-512? (NT) | anonymous2 | 2022/08/01 03:37 PM |
Why can't E-cores have narrow/slow AVX-512? | Ivan | 2022/08/02 12:09 AM |
Why can't E-cores have narrow/slow AVX-512? | anonymou5 | 2022/08/02 10:13 AM |
Why can't E-cores have narrow/slow AVX-512? | Dummond D. Slow | 2022/08/02 03:02 PM |
AVX-512 unfriendly to heter-performance cores | Paul A. Clayton | 2022/08/02 01:19 PM |
AVX-512 unfriendly to heter-performance cores | Anon | 2022/08/02 09:09 PM |
AVX-512 unfriendly to heter-performance cores | Adrian | 2022/08/03 12:50 AM |
AVX-512 unfriendly to heter-performance cores | Anon | 2022/08/03 09:15 AM |
AVX-512 unfriendly to heter-performance cores | -.- | 2022/08/03 08:17 PM |
AVX-512 unfriendly to heter-performance cores | Anon | 2022/08/03 09:02 PM |
IFMA: empty promises from Intel as usual | Kent R | 2022/07/29 07:15 PM |
No hype lasts forever | Anon | 2022/07/30 08:06 AM |
Big integer multiplication with vector IFMA | me | 2022/07/30 09:15 AM |
Computations on Big Integers | --- | 2022/07/26 09:48 AM |
Computations on Big Integers | none | 2022/07/27 01:10 AM |
Computations on Big Integers | --- | 2022/07/28 11:43 AM |
Computations on Big Integers | --- | 2022/07/28 06:44 PM |
Computations on Big Integers | dmcq | 2022/07/26 02:27 PM |
Computations on Big Integers | Adrian | 2022/07/27 08:15 AM |
Computations on Big Integers | Brett | 2022/07/27 11:07 AM |
Yitian 710 | Wes Felter | 2021/10/21 12:51 PM |
Yitian 710 | Adrian | 2021/10/21 01:25 PM |
Yitian 710 | Anon | 2021/10/21 06:08 AM |
Strange definition of the word single. (NT) | anon2 | 2021/10/21 05:00 PM |
AMD Epyc uses chiplets. This is why "strange"? | Mark Roulo | 2021/10/21 05:08 PM |
AMD Epyc uses chiplets. This is why "strange"? | anon2 | 2021/10/21 05:34 PM |
Yeah. Blame spec.org, too, though! | Mark Roulo | 2021/10/21 05:58 PM |
Yeah. Blame spec.org, too, though! | anon2 | 2021/10/21 08:07 PM |
Yeah. Blame spec.org, too, though! | Björn Ragnar Björnsson | 2022/07/17 06:23 AM |
Yeah. Blame spec.org, too, though! | Rayla | 2022/07/17 09:13 AM |
Yeah. Blame spec.org, too, though! | Anon | 2022/07/17 12:01 PM |