Nehalem Performance Preview

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

Fire Spread Probabilities

This benchmark is a close cousin to the one we previously used. FSPro is used to model forest fires by the Missoula Fire Sciences laboratory. After extensive discussions with one of the authors of the benchmark, we have a much better understanding of the profile of the benchmark. Here’s a description from Stu Brittain: The first part of the simulation is the FlamMap portion. FlamMap calculates fire behavior for different Weather scenarios (Fuel Moistures, Wind Speed, Wind Direction) and FSPro stores the necesary FlamMap output in memory (watch the memory usage grow while running FlamMap). But as you’ve noticed processor usage isn’t 100% during FlamMap runs. The FlamMap portion runs a single multithreaded FlamMap at a time, i.e. FlamMap runs on all 16 cores for each run. This particular dataset isn’t huge, so the work isn’t really enough for all the processors, plus FSPro must assemble the FlamMap output with a single thread. So I usually see about 70% processor usage (peak) during FlamMap runs, with dips down to 1 core being pegged during assembly of outputs after each FlamMap run. So FlamMap is multithreaded, but the size of the dataset plus the necessary nature of outputs assembly reduces overall cpu usage.Then when all the FlamMap runs are done, FSPro starts burning fires. Here we run multiple burn threads, one fire per thread. So for that Nehalem system we’re running 16 fires at a time until all 128 are done. Processor usage should be 100% during this portion of the run, at least until near the end when the last fires complete. For this benchmark each core will get 8 fires to burn (8 fires x 16 threads = 128 total fires). All of the fires for this benchmark are exactly the same (same weather and winds for each day) so theoretically all the threads should finish at the same time but this never usually happens. The OS or some services inevitably take some clock cycles, and there is some minor output manipulation after every fire that requires a critical section to access output arrays. So the processor power profile will be about 60-70% during the FlamMap portion, then 100% during most of the Fires portion.
Figure 7 – FSPro Performance
Again we report performance in terms of execution time, lower being better, with Nehalem running the simulations about 1.5X faster than Harpertown. Turning to the power profile, we can see the behavior that Stu elaborated on, the initial FlamMap portion only consumes about 280W on Nehalem and 370W on Harpertown, while the actual burn simulations run at about 325W and 390W respectively. The overall average power for Nehalem is roughly 20% less than Harpertown.
Figure 8 – FSPro Power Consumption
Again we come to the energy efficiency, and just like with VRAD, we will look at the energy consumed per simulation, rather than a throughput oriented metric. Nehalem burns half the energy per simulation of its predecessor.
Figure 9 – FSPro Energy Efficiency

Pages: « Prev   1 2 3 4 5 6 7 8 9 10 11 12   Next »

Discuss (52 comments)