PCMark2002 – A First Look

Pages: 1 2 3 4 5 6

Memory Tests

In the memory tests, different operations are performed using several different block sizes in order to determine the speed of L1 and L2 cache, as well as system memory. These operations are read, write, read-modify-write, and random access. According to the help file “These tests are implemented in the same manner as memory accesses in normal applications, and are not optimized to achieve maximum throughput. However, since no other tasks are run while performing the memory transfers, quite high throughput numbers can be expected.

Included are several Video Memory tests, which I did not record the results of. This is because after several runs on different platforms with the same video card, I found the numbers to be very consistent. The overall memory score does, however, include the video test results, as follows:

Read(3072*32+1536*32+384*16+48*1+6*1) + Write(3072*32+1536*32+384*16+48*1+6*1) + Modify(3072*64+1536*64+384*32+48*2+6*2) + Container(1536*64+768*64+384*32+96*4+48*2+6*2) + VideoMem(1*4+4*8+16*16+32*32) } / 160

Since all platforms used the same video card, this should not be a problem with regards to comparing the other components. According to the help file, a high-end PC should get around 5000 points as total Memory score. Because these tests were all performed on SDRAM based systems, we can’t expect to be near the scores for a ‘high end PC’, but in this particular scenario all we really care about is the relative scores between different components and setups. Let’s see how this looks:

P4 Willamette

1.4GHz

1.6GHz

1.8GHz

2.0GHz

Block Read – 3072KB

964.5

966.6

968

969.7

Block Read – 1536KB

964.5

966.6

968.1

969.1

Block Read – 384KB

964.5

965.7

984.6

977.9

Block Read – 48KB

5652.2

6475.5

7287.8

8094.1

Block Read – 6KB

10449.2

11944.7

13200.1

14919.2

Block Write – 3072KB

416.6

402.2

401

399.2

Block Write – 1536KB

417.7

400.7

400.4

400.3

Block Write – 384KB

420.3

403.9

405.4

402.3

Block Write – 48KB

4712.6

5390.3

6059.7

6737.7

Block Write – 6KB

4651.1

5316.4

5973.4

6644.9

Block Modify – 3072KB

417.6

404.2

403.6

403.6

Block Modify – 1536KB

416.7

404.6

402.6

404.6

Block Modify – 384KB

425.9

403.6

402.3

403

Block Modify – 48KB

3391.6

3684.6

4653.1

4616.3

Block Modify – 6KB

4458.2

5097.5

5903.2

6373.3

Random Access – 1536KB

921.9

925.1

932.2

936

Random Access – 768KB

920.9

922.6

930.4

934

Random Access – 384KB

922.8

921.2

928

932.9

Random Access – 96KB

3769.2

4323.6

4849.2

5404.5

Random Access – 48KB

3759.3

4331.7

4842.2

5414.1

Random Access – 6KB

5350

6088.3

6769.2

7326.7

Memory Overall

2523

2553

2613

2660

P4 Northwood

2.0A GHz

2.2 GHz

2.4 GHz

Block Read – 3072KB

970.1

971.5

973.4

Block Read – 1536KB

969.9

971.3

973.3

Block Read – 384KB

7833.8

8600.6

9390.4

Block Read – 48KB

8097

8877.5

9715.4

Block Read – 6KB

14911.3

16407.2

17904

Block Write – 3072KB

398.7

401.5

418.1

Block Write – 1536KB

399

402.6

419

Block Write – 384KB

6572.1

7223.7

7912.7

Block Write – 48KB

6738

7412.9

8085.8

Block Write – 6KB

6645.2

7309.2

7973.1

Block Modify – 3072KB

403.7

405.8

418.9

Block Modify – 1536KB

403.4

406.1

420.3

Block Modify – 384KB

4546.6

4952

5735.8

Block Modify – 48KB

4610.8

5056

5633.4

Block Modify – 6KB

6367.2

7002.9

7638.7

Random Access – 1536KB

935.7

940.1

942.9

Random Access – 768KB

933.7

938.2

941.7

Random Access – 384KB

5232.3

5776.7

6264

Random Access – 96KB

5420.1

5976.2

6471.2

Random Access – 48KB

5432.9

6012.9

6428.9

Random Access – 6KB

7437.1

8096.7

9100.9

Memory Overall

3410

3551

3724

These results are very interesting, at least from my point of view. First, we can see that the performance increases very smoothly from 1.4GHz all the way through 2.4GHz when the block size is 6KB, 48KB and 96KB (random access). This is due to the L1 and L2 cache speed increase with clock rate. Since the L1 cache size of the P4 is 8KB, you can see the throughput double between 48KB and 6KB. Note also that all of the large block sizes result in the same throughput regardless of the processor speed, since we are now being limited by the FSB. The third thing to notice is better shown in the following table:

2.0 GHz

2.0A GHz

Block Read – 3072KB

969.7

970.1

Block Read – 1536KB

969.1

969.9

Block Read – 384KB

977.9

7833.8

Block Read – 48KB

8094.1

8097

Block Read – 6KB

14919.2

14911.3

Block Write – 3072KB

399.2

398.7

Block Write – 1536KB

400.3

399

Block Write – 384KB

402.3

6572.1

Block Write – 48KB

6737.7

6738

Block Write – 6KB

6644.9

6645.2

Block Modify – 3072KB

403.6

403.7

Block Modify – 1536KB

404.6

403.4

Block Modify – 384KB

403

4546.6

Block Modify – 48KB

4616.3

4610.8

Block Modify – 6KB

6373.3

6367.2

Random Access – 1536KB

936

935.7

Random Access – 768KB

934

933.7

Random Access – 384KB

932.9

5232.3

Random Access – 96KB

5404.5

5420.1

Random Access – 48KB

5414.1

5432.9

Random Access – 6KB

7326.7

7437.1

Memory Overall

2660

3410

This, combined with the CPU scores shown earlier, shows the larger L2 cache of the Northwood processor is responsible for the performance improvement over the Willamette, and little (if anything) else – at least if we can trust the CPU results as being a good measure of all CPU features. OK, so let’s look at the 300MHz P6 processors now:

PIII 300

Cel. 300A

Cel 300

Block Read – 3072KB

337.2

420.1

449.1

Block Read – 1536KB

337.2

420.8

448.9

Block Read – 384KB

909.7

420.1

448.8

Block Read – 48KB

911.9

1038.5

450.6

Block Read – 6KB

2216.9

2216.6

2215.4

Block Write – 3072KB

130.6

124.8

159.9

Block Write – 1536KB

132.2

121

159.9

Block Write – 384KB

252.5

124.4

159.2

Block Write – 48KB

254.3

572

151.6

Block Write – 6KB

1977.4

1977.5

1975.7

Block Modify – 3072KB

125.9

119.6

154.2

Block Modify – 1536KB

127.2

121

154.2

Block Modify – 384KB

253.2

131.9

153.2

Block Modify – 48KB

254.2

504.9

146.9

Block Modify – 6KB

1289.2

1288.8

1288.2

Random Access – 1536KB

175.7

203.2

198.5

Random Access – 768KB

175.6

203.1

198.6

Random Access – 384KB

448.1

203.1

198.2

Random Access – 96KB

458.4

545.6

198.7

Random Access – 48KB

459.6

570.2

198.2

Random Access – 6KB

801.7

801.6

800.9

Memory Overall

833

863

916

Surprised? I was, until I thought about it a little bit. Since the Celeron 300 has no L2 cache that must be maintained, the L2 cache lookup overhead is eliminated so accesses to system memory is actually faster. From this, we can also see why the streaming instructions in SSE can provide a performance boost when used under the right circumstances. We can also clearly see where the full speed L2 cache provides a benefit, as well as where the larger half speed cache is faster. This latter point is obviously not anything surprising, but it is nice to have a ‘visual’ of this effect.


Pages: « Prev   1 2 3 4 5 6   Next »

Be the first to discuss this article!