Westmere Performance

Pages: 1 2 3 4 5 6 7 8

Fire Spread Probabilities

FSPro is used to model forest fires by the Missoula Fire Sciences laboratory. After extensive discussions with one of the authors of the benchmark, we have a much better understanding of the profile of the benchmark. Here’s a description from Stu Brittain:

The first part of the simulation is the FlamMap portion. FlamMap calculates fire behavior for different Weather scenarios (Fuel Moistures, Wind Speed, Wind Direction) and FSPro stores the necesary FlamMap output in memory (watch the memory usage grow while running FlamMap). But as you’ve noticed processor usage isn’t 100% during FlamMap runs. The FlamMap portion runs a single multithreaded FlamMap at a time, i.e. FlamMap runs on all 16 cores for each run. This particular dataset isn’t huge, so the work isn’t really enough for all the processors, plus FSPro must assemble the FlamMap output with a single thread. So I usually see about 70% processor usage (peak) during FlamMap runs, with dips down to 1 core being pegged during assembly of outputs after each FlamMap run.

So FlamMap is multithreaded, but the size of the dataset plus the necessary nature of outputs assembly reduces overall cpu usage.

Then when all the FlamMap runs are done, FSPro starts burning fires. Here we run multiple burn threads, one fire per thread. So for that Nehalem system we’re running 16 fires at a time until all 128 are done. Processor usage should be 100% during this portion of the run, at least until near the end when the last fires complete. For this benchmark each core will get 8 fires to burn (8 fires x 16 threads = 128 total fires). All of the fires for this benchmark are exactly the same (same weather and winds for each day) so theoretically all the threads should finish at the same time but this never usually happens. The OS or some services inevitably take some clock cycles, and there is some minor output manipulation after every fire that requires a critical section to access output arrays.

So the processor power profile will be about 60-70% during the FlamMap portion, then 100% during most of the Fires portion.

Figure 6 – FSPro Performance

Again we report performance in terms of execution time, lower being better. Given the description of the benchmark, it’s not surprisingly that scaling is less than perfect. Westmere’s execution time is roughly 29% faster than Nehalem; much better than VRAD, but not quite as good as POV-Ray. In our previous power measurements, we found that due to the different modes of execution for the code that the power draw also varied. This time around we are only reporting the average power, which was very similar: 289.9W for Westmere and 288W for Nehalem.

Figure 7 – FSPro Energy Efficiency

Again we come to the energy efficiency, and just like with VRAD, we will look at the energy consumed per simulation, rather than a throughput oriented metric. With nearly identical power consumption and 29% better performance, the energy efficiency is 28% better for Westmere.

Pages: « Prev   1 2 3 4 5 6 7 8   Next »

Discuss (6 comments)