Coding Challenge I

Pages: 1 2 3 4 5 6 7 8 9

Assembler Solution from RU

;This is an optimized assembly version of the pi.c program, ; which is available at http://www.tempest-sw.com/benchmark/ ;This program REQUIRES SSE and does NOT check that it exists ;This code was written by [RU], and ; submitted to the code challenge at http://www.realworldtech.com, ; which required the program to run on P3/P4/AthlonXP, so SSE is ; *assumed* to be present. An exception WILL occur if SSE is not present

;Major changes include: ;

  • – use of floating point calculations instead of integer. (this should be equally accurate until cases get very large)
  • ;- a[] in the C code was removed, and q was made into an array instead. This allows a smaller array to be maintained.
  • ;- A large portion of the code was unrolled to allow vectorization and better scheduling.

The code can further be optimized by:
  • – vectorizing/scheduling the remaining dependent divisions {~5% I’d guess}
  • – using integer division resource when available, as right now the vectorized loop only uses the SSE division resource (between 1/16 and 1/12 I would guess) {~5-10%}
  • – prefetching to speed up large cases that use main memory (speedup depends on memory, cache, cpu speed…) {???}
  • – using SSE2 division to put back to integer calculations (loss of speed due to greater precision) {~-10% P4}
  • format PE console entry start macro align value { rb (value-1) – (rva $ + value-1) mod value } section ‘.code’ code readable executable start: push 0 ;return value for program push 0FFFFFFF5h ;STD_OUTPUT_HANDLE call [GetStdHandle] mov [stdOut], eax call [GetCommandLine] call processCMDLN ;eax gets filename, ;[numdigits] gets number of digits push 0 ;handle to template file push 0 ;file attributes push 2 ;how to create (CREATE_ALWAYS) push 0 ;SD push 0 ;share mode push 40000000h ;access mode (GENERIC_WRITE) push eax ;file name call [CreateFile] mov [fptr], eax mov edx, [numdigits] add edx, 512 lea edx, [edx*4+edx] ;amount to malloc push 4 ;flProtect (read-write) push 1000h ;flAllocationType (commit) push edx ;size desired push 0 ;lpAddress call [VirtualAlloc] or eax, eax jz not_enough_memory push eax ;push address to deallocate later mov [pi], eax mov ebx, [numdigits] ;ebx will be used to calc alength mov edx, eax ;edx will be used to calc q[] mov edi, ebx ;numdigits for later use mov [store_esp], esp add eax, ebx add ebx, ebx add edx, 1039 ;space for 1K + 15B (for 16B boundary) mov byte [eax+1], 10 ;new line at pi[numdigits+1] mov eax, 0aaaaaaabh ;used to divide by 3 (step 1) lea ebx, [ebx*4+ebx] and edx, 0fffffff0h ;align to 16B boundary mov [q], edx ;store address of q[] mov esi, edx mul ebx ;used to divide by 3 (step 2) lea esi, [esi+edi*4] ;&q[numdigits] shr edx, 1 ;used to divide by 3 (step 3) mov [alength], edx ;store alength (numdigits*10)/3 mov [loopterm], esi ;store loop termination lea ebx, [edx*2-1] ;p=alength*2-1 mov edi, edx ;at this point the following is a mapping of registers to variables: ; edi : i (not stored) ; ebx : p (not stored) ;Between this point and the “doneloop#” label is automatically generated code ;As a result there is no comments, but it is pretty much just an unrolled ; vectorized floating-point version of the original. cmp edi, 40 jl doneloop12 movaps xmm7, dqword [millionth] sub dword [loopterm], 44 mov eax, ebx mov esp, 2 mov [p0+12], eax mov [i0+12], edi sub eax, esp dec edi mov [p0+8], eax mov [i0+8], edi sub eax, esp dec edi mov [p0+4], eax mov [i0+4], edi sub eax, esp dec edi mov [p0], eax mov [i0], edi cvtpi2ps xmm0, qword [p0+8] cvtpi2ps xmm1, qword [i0+8] movlhps xmm0, xmm0 movlhps xmm1, xmm1 cvtpi2ps xmm0, qword [p0] cvtpi2ps xmm1, qword [i0] movaps dqword [p0], xmm0 movaps dqword [i0], xmm1 sub eax, esp dec edi mov [p4+12], eax mov [i4+12], edi sub eax, esp dec edi mov [p4+8], eax mov [i4+8], edi sub eax, esp dec edi mov [p4+4], eax mov [i4+4], edi sub eax, esp dec edi mov [p4], eax mov [i4], edi cvtpi2ps xmm0, qword [p4+8] cvtpi2ps xmm1, qword [i4+8] movlhps xmm0, xmm0 movlhps xmm1, xmm1 cvtpi2ps xmm0, qword [p4] cvtpi2ps xmm1, qword [i4] movaps dqword [p4], xmm0 movaps dqword [i4], xmm1 sub eax, esp dec edi mov [p8+12], eax mov [i8+12], edi sub eax, esp dec edi mov [p8+8], eax mov [i8+8], edi sub eax, esp dec edi mov [p8+4], eax mov [i8+4], edi sub eax, esp dec edi mov [p8], eax mov [i8], edi cvtpi2ps xmm0, qword [p8+8] cvtpi2ps xmm1, qword [i8+8] movlhps xmm0, xmm0 movlhps xmm1, xmm1 cvtpi2ps xmm0, qword [p8] cvtpi2ps xmm1, qword [i8] movaps dqword [p8], xmm0 movaps dqword [i8], xmm1 forI12: movaps xmm0, dqword [twenty] movaps dqword [a0], xmm0 movaps dqword [a4], xmm0 movaps dqword [a8], xmm0 mov esi, dword [q] mov edi, 2 mov ebp, ebx cvtsi2ss xmm1, dword [a0+12] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i0+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx cvtsi2ss xmm1, dword [a0+12] cvtsi2ss xmm0, dword [esi+4] mulss xmm0, dword [i0+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+4], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx cvtsi2ss xmm1, dword [a0+8] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i0+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx mov ebp, ebx cvtsi2ss xmm1, dword [a0+12] cvtsi2ss xmm0, dword [esi+8] mulss xmm0, dword [i0+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+8], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx cvtsi2ss xmm1, dword [a0+8] cvtsi2ss xmm0, dword [esi+4] mulss xmm0, dword [i0+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+4], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx cvtsi2ss xmm1, dword [a0+4] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i0+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+8] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+0] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+8], mm0 mov edi, 2 mov eax, ebp mul dword [esi+12] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+8] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+4] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+12] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+4] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+4], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+12], mm0 mov edi, 2 mov eax, ebp mul dword [esi+16] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+8] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+4] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx sub ebp, edi cvtsi2ss xmm1, dword [a4+12] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i4+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+16] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+8] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+8], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+16], mm0 mov edi, 2 mov eax, ebp mul dword [esi+20] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+16] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+8] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx sub ebp, edi cvtsi2ss xmm1, dword [a4+12] cvtsi2ss xmm0, dword [esi+4] mulss xmm0, dword [i4+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+4], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx cvtsi2ss xmm1, dword [a4+8] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i4+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+20] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+12] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+12], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+20], mm0 mov edi, 2 mov eax, ebp mul dword [esi+24] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+20] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+16] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx sub ebp, edi cvtsi2ss xmm1, dword [a4+12] cvtsi2ss xmm0, dword [esi+8] mulss xmm0, dword [i4+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+8], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx cvtsi2ss xmm1, dword [a4+8] cvtsi2ss xmm0, dword [esi+4] mulss xmm0, dword [i4+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+4], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx cvtsi2ss xmm1, dword [a4+4] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i4+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+24] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+16] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+16], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+24], mm0 mov edi, 2 mov eax, ebp mul dword [esi+28] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+24] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+20] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+16] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+8] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+0] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+8], mm0 mov edi, 2 sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+8] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+4] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebp, edi mov eax, ebp mul dword [esi] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+28] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+20] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+20], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+28], mm0 mov edi, 2 mov eax, ebp mul dword [esi+32] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+28] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+24] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+20] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+12] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+4] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+4], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+12], mm0 mov edi, 2 sub ebp, edi mov eax, ebp mul dword [esi+16] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+8] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+4] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx sub ebp, edi cvtsi2ss xmm1, dword [a8+12] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i8+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+32] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+24] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+24], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+32], mm0 mov edi, 2 mov eax, ebp mul dword [esi+36] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+32] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+28] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+24] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+16] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+8] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+8], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+16], mm0 mov edi, 2 sub ebp, edi mov eax, ebp mul dword [esi+20] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+16] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+8] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx sub ebp, edi cvtsi2ss xmm1, dword [a8+12] cvtsi2ss xmm0, dword [esi+4] mulss xmm0, dword [i8+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+4], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx cvtsi2ss xmm1, dword [a8+8] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i8+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx mov ebp, ebx cvtpi2ps xmm1, qword [a0+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a0] cvtpi2ps xmm0, qword [esi+36] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+28] mulps xmm0, dqword [i0] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a0], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a0+8], mm2 divps xmm0, dqword [p0] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+28], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+36], mm0 mov edi, 2 mov eax, ebp mul dword [esi+40] mov ecx, dword [a0+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+36] mov ecx, dword [a0+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+32] mov ecx, dword [a0+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+28] mov ecx, dword [a0] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+20] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+12] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+12], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+20], mm0 mov edi, 2 sub ebp, edi mov eax, ebp mul dword [esi+24] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebp, edi mov eax, ebp mul dword [esi+20] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebp, edi mov eax, ebp mul dword [esi+16] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebp, edi mov eax, ebp mul dword [esi+12] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx sub ebp, edi cvtsi2ss xmm1, dword [a8+12] cvtsi2ss xmm0, dword [esi+8] mulss xmm0, dword [i8+12] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+12] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+8], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx cvtsi2ss xmm1, dword [a8+8] cvtsi2ss xmm0, dword [esi+4] mulss xmm0, dword [i8+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi+4], eax mul ebp sub ebp, edi sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx cvtsi2ss xmm1, dword [a8+4] cvtsi2ss xmm0, dword [esi] mulss xmm0, dword [i8+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov [esi], eax mul ebp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx mov ebp, ebx forJ12: cvtpi2ps xmm0, qword [esi+8] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi] cvtpi2ps xmm1, qword [esi+24] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [esi+16] cvtpi2ps xmm2, qword [esi+40] movlhps xmm2, xmm2 cvtpi2ps xmm2, qword [esi+32] mulps xmm0, dqword [i8] mulps xmm1, dqword [i4] mulps xmm2, dqword [i0] cvtpi2ps xmm3, qword [a0+8] movlhps xmm3, xmm3 cvtpi2ps xmm3, qword [a0] cvtpi2ps xmm4, qword [a4+8] movlhps xmm4, xmm4 cvtpi2ps xmm4, qword [a4] cvtpi2ps xmm5, qword [a8+8] movlhps xmm5, xmm5 cvtpi2ps xmm5, qword [a8] addps xmm2, xmm3 addps xmm1, xmm4 addps xmm0, xmm5 movaps xmm3, xmm7 movaps xmm4, xmm7 movaps xmm5, xmm7 addps xmm3, xmm2 addps xmm4, xmm1 addps xmm5, xmm0 cvttps2pi mm0, xmm3 movq qword [a0], mm0 movhlps xmm3, xmm3 cvttps2pi mm1, xmm3 movq qword [a0+8], mm1 cvttps2pi mm2, xmm4 movq qword [a4], mm2 movhlps xmm4, xmm4 cvttps2pi mm3, xmm4 movq qword [a4+8], mm3 cvttps2pi mm4, xmm5 movq qword [a8], mm4 movhlps xmm5, xmm5 cvttps2pi mm5, xmm5 movq qword [a8+8], mm5 divps xmm2, dqword [p0] divps xmm1, dqword [p4] divps xmm0, dqword [p8] addps xmm2, xmm7 addps xmm1, xmm7 addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi], mm0 movhlps xmm0, xmm0 cvttps2pi mm1, xmm0 movq qword [esi+8], mm1 cvttps2pi mm2, xmm1 movq qword [esi+16], mm2 movhlps xmm1, xmm1 cvttps2pi mm3, xmm1 movq qword [esi+24], mm3 cvttps2pi mm4, xmm2 movq qword [esi+32], mm4 movhlps xmm2, xmm2 cvttps2pi mm5, xmm2 movq qword [esi+40], mm5 mov esp, 2 mov eax, ebx mul dword [esi+44] mov ecx, dword [a0+12] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+12], ecx mov eax, ebx mul dword [esi+40] mov ecx, dword [a0+8] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx mov eax, ebx mul dword [esi+36] mov ecx, dword [a0+4] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx mov eax, ebx mul dword [esi+32] mov ecx, dword [a0] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx mov eax, ebx mul dword [esi+28] mov ecx, dword [a4+12] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx mov eax, ebx mul dword [esi+24] mov ecx, dword [a4+8] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx mov eax, ebx mul dword [esi+20] mov ecx, dword [a4+4] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx mov eax, ebx mul dword [esi+16] mov ecx, dword [a4] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx mov eax, ebx mul dword [esi+12] mov ecx, dword [a8+12] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx mov eax, ebx mul dword [esi+8] mov ecx, dword [a8+8] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx mov eax, ebx mul dword [esi+4] mov ecx, dword [a8+4] sub ebx, esp sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx mov eax, ebx mul dword [esi] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp add esi, 4 cmp esi, dword [loopterm] jl forJ12 sub ebx, 2 cvtsi2ss xmm1, dword [a0+8] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i0+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+8], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a0+4] cvtsi2ss xmm0, dword [esi+36] mulss xmm0, dword [i0+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+36], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a0] cvtsi2ss xmm0, dword [esi+32] mulss xmm0, dword [i0] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+32], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+24] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+16] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+16], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+24], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+20] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+16] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+8] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+0] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+0], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+8], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+12] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+8] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+4] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 4 cvtsi2ss xmm1, dword [a0+4] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i0+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0+4], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a0] cvtsi2ss xmm0, dword [esi+36] mulss xmm0, dword [i0] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+36], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+28] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+20] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+20], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+28], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+32] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+20] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+12] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+4] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+4], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+12], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+16] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+12] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+8] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+4] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 6 cvtsi2ss xmm1, dword [a0] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i0] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p0] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a0], ecx cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+32] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+24] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+24], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+32], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+36] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+32] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+16] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+8] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+8], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+16], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+20] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+16] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+12] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+8] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 8 cvtpi2ps xmm1, qword [a4+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a4] cvtpi2ps xmm0, qword [esi+36] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+28] mulps xmm0, dqword [i4] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a4], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a4+8], mm2 divps xmm0, dqword [p4] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+28], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+36], mm0 mov edi, 2 mov eax, ebx mul dword [esi+40] mov ecx, dword [a4+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+36] mov ecx, dword [a4+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+32] mov ecx, dword [a4+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+20] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+12] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+12], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+20], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+20] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+16] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+12] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 10 cvtsi2ss xmm1, dword [a4+8] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i4+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+8], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a4+4] cvtsi2ss xmm0, dword [esi+36] mulss xmm0, dword [i4+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+36], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a4] cvtsi2ss xmm0, dword [esi+32] mulss xmm0, dword [i4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+32], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+24] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+16] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+16], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+24], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+20] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+16] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 12 cvtsi2ss xmm1, dword [a4+4] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i4+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4+4], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a4] cvtsi2ss xmm0, dword [esi+36] mulss xmm0, dword [i4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+36], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+28] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+20] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+20], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+28], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+32] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+20] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 14 cvtsi2ss xmm1, dword [a4] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a4], ecx cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+32] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+24] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+24], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+32], mm0 mov edi, 2 sub ebx, edi mov eax, ebx mul dword [esi+36] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+32] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+24] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 16 cvtpi2ps xmm1, qword [a8+8] movlhps xmm1, xmm1 cvtpi2ps xmm1, qword [a8] cvtpi2ps xmm0, qword [esi+36] movlhps xmm0, xmm0 cvtpi2ps xmm0, qword [esi+28] mulps xmm0, dqword [i8] movaps xmm2, xmm7 addps xmm0, xmm1 addps xmm2, xmm0 cvttps2pi mm2, xmm2 movq qword [a8], mm2 movhlps xmm2, xmm2 cvttps2pi mm2, xmm2 movq qword [a8+8], mm2 divps xmm0, dqword [p8] addps xmm0, xmm7 cvttps2pi mm0, xmm0 movq qword [esi+28], mm0 movhlps xmm0, xmm0 cvttps2pi mm0, xmm0 movq qword [esi+36], mm0 mov edi, 2 mov eax, ebx mul dword [esi+40] mov ecx, dword [a8+12] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+12], ecx sub ebx, edi mov eax, ebx mul dword [esi+36] mov ecx, dword [a8+8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, edi mov eax, ebx mul dword [esi+32] mov ecx, dword [a8+4] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, edi mov eax, ebx mul dword [esi+28] mov ecx, dword [a8] sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 18 cvtsi2ss xmm1, dword [a8+8] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i8+8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+8], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a8+4] cvtsi2ss xmm0, dword [esi+36] mulss xmm0, dword [i8+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+36], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a8] cvtsi2ss xmm0, dword [esi+32] mulss xmm0, dword [i8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+32], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 20 cvtsi2ss xmm1, dword [a8+4] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i8+4] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8+4] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8+4], ecx sub ebx, 2 cvtsi2ss xmm1, dword [a8] cvtsi2ss xmm0, dword [esi+36] mulss xmm0, dword [i8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+36], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx mov ebx, ebp sub ebx, 22 cvtsi2ss xmm1, dword [a8] cvtsi2ss xmm0, dword [esi+40] mulss xmm0, dword [i8] movaps xmm2, xmm7 addss xmm0, xmm1 addss xmm2, xmm0 cvttss2si ecx, xmm2 divss xmm0, dword [p8] addss xmm0, xmm7 cvttss2si eax, xmm0 mov dword [esi+40], eax mul ebx sub ecx, eax add ecx, ecx lea ecx, [ecx*4+ecx] mov dword [a8], ecx movaps xmm4, dqword [twentyfour] movaps xmm5, dqword [twelve] movaps xmm0, dqword [p0] movaps xmm1, dqword [p4] movaps xmm2, dqword [p8] subps xmm0, xmm4 subps xmm1, xmm4 subps xmm2, xmm4 movaps dqword [p0], xmm0 movaps dqword [p4], xmm1 movaps dqword [p8], xmm2 movaps xmm0, dqword [i0] movaps xmm1, dqword [i4] movaps xmm2, dqword [i8] subps xmm0, xmm5 subps xmm1, xmm5 subps xmm2, xmm5 movaps dqword [i0], xmm0 movaps dqword [i4], xmm1 movaps dqword [i8], xmm2 sub ebx, 2 cmp ebx, 24 jg forI12 add dword [loopterm], 44 mov edi, ebx inc edi shr edi, 1 doneloop12: ;This is the end of the automatically generated code. ;this loop wraps things up (without parallelism, so normal int div is used) ; worst case for wrapping is pretty small at around 10ms ;The code below is perfectly capable of running WITH or WITHOUT the ; automatically generated code. cmp edi, 1 jle doneloopI loopI: mov esp, [q] ;q[0] mov ecx, 20 ;a=20 loopJ: mov eax, edi mul dword [esp] ;q[j]*i xor edx, edx add eax, ecx ;x=(a*10)+(q[j]*i) div ebx ;eax=x/p and edx=x%p mov ecx, edx ;a=x%p lea ecx, [edx*4+edx] ;a*=5 add ecx, ecx ;a*=2 mov [esp], eax ;q[j]=x/p add esp, 4 ;q[j++] cmp esp, [loopterm] ;while (&q[j]<&q[numdigits]) jl loopJ sub ebx, 2 ;p-=2 dec edi ;i– cmp edi, 1 jg loopI doneloopI: ;at this point the actual digits must be calculated ;the loop uses 1 div per pass with numdigits passes mov esp, [pi] ;easy access to &pi mov ecx, [q] ;q[j=0] mov esi, 1 ;nines=1 mov ebp, 2 ;a=2 mov edi, 10 ;constant 10 loopJJ: add ebp, ebp ;a*=2 mov eax, [ecx] ;q[j] lea ebp, [ebp*4+ebp] ;a*=5 add eax, ebp ;x=q[j]+(a*10) xor edx, edx div edi ;eax=x/10 and edx=x%10 mov ebp, edx ;a=x%10 cmp eax, 9 je ninesCode ;if (q != 9) cmp eax, 10 je tenCode ;if (q != 10) mov bl, [predigit] mov [predigit], al ;predigit = q mov eax, [pilength] add bl, ‘0’ add eax, esp mov edx, esi mov byte [eax], bl ;pi[piLength] = predigit + ‘0’ dec edx jnz copy9s jmp done9s copy9s: mov byte [eax+edx], ‘9’ dec edx jnz copy9s done9s: add [pilength], esi ;piLength += nines; mov esi, 1 ;nines = 1 jmp ninesDone tenCode: ;if (q == 10) mov bl, [predigit] mov eax, [pilength] mov [predigit], 0 ;predigit = 0 add bl, ‘1’ add eax, esp mov edx, esi mov byte [eax], bl ;pi[piLength] = predigit + ‘1’ dec edx jnz copy0s jmp done0s copy0s: mov byte [eax+edx], ‘0’ dec edx jnz copy0s done0s: add [pilength], esi ;piLength += nines; mov esi, 1 ;nines = 1 jmp ninesDone ninesCode: ;if (q == 9) inc esi ninesDone: add ecx, 4 ;q[j++] cmp ecx, [loopterm] jl loopJJ ;while (&q[j]<&q[numdigits]) mov esp, [store_esp] movzx edx, [predigit] mov eax, [pi] mov ebx, [pilength] or edx, ‘0’ mov [ebx+eax], edx add ebx, 2 push 0 ;overlapped I/O push bytes_count ;number of bytes written to push ebx ;number of bytes to write push [pi] ;pointer to data push [fptr] ;handle to file call [WriteFile] push [fptr] ;file handle call [CloseHandle] ;current top of stack holds address of allocated memory push 0 ;dwSze (0 since releasing) push 0C000h ;dwFreeType (release and decommit) call [VirtualFree] end_prog: call [ExitProcess] processCMDLN: movzx esi, byte [eax] inc eax cmp esi, 0 je no_params cmp esi, ‘ ‘ jne processCMDLN ;now eax points to char after first space xor edx, edx jmp startnumdigits getnumdigits: add edx, edx lea edx, [edx*4+edx] and esi, 0fh add edx, esi startnumdigits: movzx esi, byte [eax] inc eax cmp esi, ‘ ‘ jne getnumdigits ;now eax points to file name mov [numdigits], edx ret no_params: mov edx, noParams call printlnNStr jmp end_prog not_enough_memory: mov edx, notEnoughMemErr call printNStr push 0 ;return value call [ExitProcess] printlnNStr: ;function: prints null-terminated string and newline call printNStr mov edx, newLn call printNStr ret printNStr: ;function: prints null-terminated string ;gotten elsewhere (forgot where) push ecx mov ebp, [stdOut] mov esi, edx mov edi, esi or ecx, -1 xor al, al repne scasb neg ecx sub ecx,2 push 0 ;overlapped I/O push bytes_count ;number of bytes to written push ecx ;number of bytes to write push esi ;pointer to data push ebp ;handle to file call [WriteFile] pop ecx ret section ‘.data’ data readable writeable fptr dd ? numdigits dd ? pi dd ? alength dd ? loopterm dd ? q dd ? r0 dd ? r1 dd ? r2 dd ? r3 dd ? r4 dd ? r5 dd ? r6 dd ? r7 dd ? nines dd 1 pilength dd 0 store_esp dd ? ;store stack to free esp stdOut dd 0 ;stores stdout bytes_count dd ? ;used by print predigit db 0 align 16 i0: times 4 dd 1 i4: times 4 dd 1 i8: times 4 dd 1 p0: times 4 dd 1 p4: times 4 dd 1 p8: times 4 dd 1 a0: times 4 dd 20 a4: times 4 dd 20 a8: times 4 dd 20 twenty: times 4 dd 20 one: times 4 dd 1.0 two: times 4 dd 2.0 four: times 4 dd 4.0 eight: times 4 dd 8.0 twelve: times 4 dd 12.0 sixteen: times 4 dd 16.0 twentyfour: times 4 dd 24.0 millionth: times 4 dd 0.00000095367431640625 ;1/1M d64b db ‘0123456789ABCDEF’,0 ;used by dispEAXEDX newLn db 0Dh, 0Ah, 0 ;used by println notEnoughMemErr db 0Dh,0Ah,’Not Enough Memory, need 32MB’,0Dh,0Ah,0Dh,0Ah,0 noParams db 0Dh,0Ah,’Please specify number of digits and output file’,0Dh,0Ah,0Dh,0Ah,0 section ‘.idata’ import data readable writeable dd 0,0,0,rva kernel_name,rva kernel_table dd 0,0,0,0,0 kernel_table: ExitProcess dd rva _ExitProcess GetStdHandle dd rva _GetStdHandle WriteFile dd rva _WriteFile VirtualAlloc dd rva _VirtualAlloc VirtualFree dd rva _VirtualFree GetCommandLine dd rva _GetCommandLineA CreateFile dd rva _CreateFileA CloseHandle dd rva _CloseHandle dd 0 kernel_name db ‘KERNEL32.DLL’,0 _ExitProcess dw 0 db ‘ExitProcess’,0 _GetStdHandle dw 0 db ‘GetStdHandle’,0 _WriteFile dw 0 db ‘WriteFile’,0 _VirtualAlloc dw 0 db ‘VirtualAlloc’, 0 _VirtualFree dw 0 db ‘VirtualFree’, 0 _GetCommandLineA dw 0 db ‘GetCommandLineA’, 0 _CreateFileA dw 0 db ‘CreateFileA’,0 _CloseHandle dw 0 db ‘CloseHandle’,0 section ‘.reloc’ fixups data readable discardable ;used by VirtualAlloc ;MEM_COMMIT = 1000h ;MEM_RESERVE = 2000h ;MEM_DECOMMIT = 4000h ;MEM_RELEASE = 8000h ;MEM_FREE = 10000h ;MEM_PRIVATE = 20000h ;MEM_MAPPED = 40000h ;MEM_RESET = 80000h ;MEM_TOP_DOWN = 100000h ;PAGE_NOACCESS = 1 ;PAGE_READONLY = 2 ;PAGE_READWRITE = 4 ;PAGE_WRITECOPY = 8 ;PAGE_EXECUTE = 10h ;PAGE_EXECUTE_READ = 20h ;PAGE_EXECUTE_READWRITE = 40h ;PAGE_EXECUTE_WRITECOPY = 80h ;PAGE_GUARD = 100h ;PAGE_NOCACHE = 200h

    Pages: « Prev   1 2 3 4 5 6 7 8 9   Next »

    Discuss (16 comments)