Barcelona optimization guide

By: EduardoS (no.delete@this.spam.com), May 16, 2007 6:57 am
>Let's distinguish the questions to the specific instructions:
>
>unsigned integer 64 bits multiply x 64 bits delivering a 128 bits unsigned integer result.
>
>Throughput of 1 how must i interpret that?
>
>Does it mean that it while those 5 cycles of the multiply get spend, the cpu CAN
>practically execute 4 x 2 = 8 uops of 64 bits unsigned integer bitfiddling in the other 2 execution units?
>
>So in short the improvement in barcelona over the currently sold K8 is that it
>no longer blocks other execution units during a part of its execution?
>
>Is it correct that Barcelona improved here over current K8?

Let's say... For the full multiplication it is issued in the first ALU and that ALU (and only that ALU) will be "locked" for 2 clocks (0.5 Throughput), after that it can receive any instructions wich has its operands ready.
So, for example, a multiplication on the first clock, a new one on the third, one more on the fifth, etc.

>secondly there is the different SIMD instructions.
>When executing a SIMD instruction, for example the one that multiplies 2 doubles
>* 2 doubles == 2 doubles (a double being a 64 bits floating point with 52+ bits
>mantissa), which is so crucial for highend calculations, such as research to find
>new medicines (eating an impressive 0.5% of system time at big supercomputers).
>
>That multiplication eats 5 cycles, and 2 cycles out of those 5 cycles the cpu totally
>blocks other instructions getting executed, so here we can execute 3 x 2 = 6 uops
>to integer instructions while that SIMD multiply occurs.
>
>Do i say formulate it correctly like this?

During this period (2 clocks) other units will be able to execute... 4 uops... or 16 uops (2 in each ALU, AGU, FADD and FMISC, 3xALU + 3xAGU + 1FADD + 1FMISC) if you ignore decoding and retirenment.
The packed floating point multiplication generates 2 macro-ops on K-8 (one on Barcelona) and the floating point multiplication unit can receive a new macro-op every clock, although the result only will be ready 4 clocks after.
 < Previous Post in Thread Next Post in Thread >
TopicPosted ByDate
Barcelona optimization guidemas2007/05/10 07:43 AM
Barcelona optimization guideLinus Torvalds2007/05/10 10:00 AM
Barcelona optimization guideRob Thorpe2007/05/10 10:23 AM
Barcelona optimization guideLinus Torvalds2007/05/10 10:42 AM
Barcelona optimization guideRob Thorpe2007/05/11 09:22 AM
Barcelona optimization guideDavid Kanter2007/05/11 05:17 PM
Barcelona optimization guideLinus Torvalds2007/05/11 05:30 PM
Barcelona optimization guideanonymous2007/05/11 11:29 PM
Barcelona optimization guideanonymous2007/05/12 07:47 AM
Barcelona optimization guidehobold2007/05/14 05:30 AM
Barcelona optimization guideAndreas Kaiser2007/05/12 09:32 AM
Barcelona optimization guideVincent Diepeveen2007/05/13 05:20 AM
Barcelona optimization guideEduardoS2007/05/13 07:01 AM
Barcelona optimization guideVincent Diepeveen2007/05/13 09:18 AM
Barcelona optimization guideMichael S2007/05/13 10:03 AM
Barcelona optimization guideEduardoS2007/05/13 10:30 AM
Barcelona optimization guideDresdenboy2007/05/14 08:18 AM
Barcelona optimization guideVincent Diepeveen2007/05/16 02:36 AM
Barcelona optimization guideEduardoS2007/05/16 06:57 AM
Barcelona optimization guideVincent Diepeveen2007/05/16 09:51 AM
Barcelona optimization guideDavid Kanter2007/05/16 04:13 AM
Barcelona vs Core2 Vincent Diepeveen2007/05/16 06:35 AM
Barcelona vs Core2 David Kanter2007/05/16 12:06 PM
Barcelona vs Core2 EduardoS2007/05/16 12:41 PM
Barcelona vs Core2 David Kanter2007/05/16 12:53 PM
Barcelona vs Core2 EduardoS2007/05/16 01:37 PM
Barcelona vs Core2 David Kanter2007/05/16 02:43 PM
Barcelona vs Core2 EduardoS2007/05/16 04:32 PM
Barcelona vs Core2 Gabriele Svelto2007/05/17 06:38 AM
Barcelona optimization guideanonymous2007/05/16 08:13 PM
Barcelona optimization guideMichael S2007/05/17 05:26 AM
Barcelona optimization guideanonymous2007/05/17 06:23 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?