By: Michael S (already5chosen.delete@this.yahoo.com), October 30, 2008 6:32 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 10/30/08 wrote:
---------------------------
>James (alan@devonex.webhop.org) on 10/29/08 wrote:
>---------------------------
>>First quibble:
>>
>>> The next step was to collect the actual event-based
>>> sampling data with a 1MHz resolution (sampling every 1ms).
>>
>>Wouldn't 1 ms lead to a 1 kHz resolution? For 1 MHz, you'd want a sample >every 1 microsecond (µs).
>
>Yup, that was a mistake that made it past the proof-reading : (
>
>That's the problem with having smart readers - they notice all your mistakes!
>
>Of course, that's the best way to learn as well.
>
>David
>
O.k. Then more nitpicks:
Chart 2 – System Settings/System Bus you use clock frequency for AMD vs transfer rate for Intel. That's inconsistent.
It would be better to mention either clock rate vs clock rate, i.e. 1000 MHz vs 266 MHz or transfer rate vs. transfer rate, i.e. 2000 MT/s vs 1066MT/s.
Also in the same chart you mention nForce 590 SLI as a Northbridge. I'd rather say that the on AMD platform the Northbridge is integrated. nForce chip better fits thd Southbridge moniker.
Same page, Figure 1:
Merom
6MB L2 is Penryn, not Merom that you tested.
Execution port 0 can't do SSE Shuffles.
Execution port 1 can't do SSE MUL, it can do FP/SSE MOVE and Logic + 64-bit fixpoint shuffle (or 128bit fixpoint shuffle at reduced performance).
Execution port 2 does complete Integer/FP load, not just Load Address. Memory data arrives to the inner core through writeback port 2.
K8:
Unlike Intel where all external memory/IO accesses travel through L2 cache, on K8's SRQ is actually attached directly to all three caches.
IMHO, you should draw L1 TLBs on the right (system) side of respective data caches. This way you make clear that AMD L1 caches while physically-tagged are virtually-indexed. AMD itself certainly draws L1 TLBs on the system side.
The arrows between the L1D and LSU_1 create an impression that LSUs are fully symmetric and can sustain any combination of loads and stores. That's incorrect. K8 L1D cache could sustain at most one store per clock.
---------------------------
>James (alan@devonex.webhop.org) on 10/29/08 wrote:
>---------------------------
>>First quibble:
>>
>>> The next step was to collect the actual event-based
>>> sampling data with a 1MHz resolution (sampling every 1ms).
>>
>>Wouldn't 1 ms lead to a 1 kHz resolution? For 1 MHz, you'd want a sample >every 1 microsecond (µs).
>
>Yup, that was a mistake that made it past the proof-reading : (
>
>That's the problem with having smart readers - they notice all your mistakes!
>
>Of course, that's the best way to learn as well.
>
>David
>
O.k. Then more nitpicks:
Chart 2 – System Settings/System Bus you use clock frequency for AMD vs transfer rate for Intel. That's inconsistent.
It would be better to mention either clock rate vs clock rate, i.e. 1000 MHz vs 266 MHz or transfer rate vs. transfer rate, i.e. 2000 MT/s vs 1066MT/s.
Also in the same chart you mention nForce 590 SLI as a Northbridge. I'd rather say that the on AMD platform the Northbridge is integrated. nForce chip better fits thd Southbridge moniker.
Same page, Figure 1:
Merom
6MB L2 is Penryn, not Merom that you tested.
Execution port 0 can't do SSE Shuffles.
Execution port 1 can't do SSE MUL, it can do FP/SSE MOVE and Logic + 64-bit fixpoint shuffle (or 128bit fixpoint shuffle at reduced performance).
Execution port 2 does complete Integer/FP load, not just Load Address. Memory data arrives to the inner core through writeback port 2.
K8:
Unlike Intel where all external memory/IO accesses travel through L2 cache, on K8's SRQ is actually attached directly to all three caches.
IMHO, you should draw L1 TLBs on the right (system) side of respective data caches. This way you make clear that AMD L1 caches while physically-tagged are virtually-indexed. AMD itself certainly draws L1 TLBs on the system side.
The arrows between the L1D and LSU_1 create an impression that LSUs are fully symmetric and can sustain any combination of loads and stores. That's incorrect. K8 L1D cache could sustain at most one store per clock.