The SPECpower Framework
The underlying framework for SPECpower is clearly the most important of the two contributions to the industry as some of the observations, goals and guidelines are likely to influence several generations of benchmarks, and other industry wide organizations. The methodology begins by stating that one of the goals is to use the same set of benchmarks for both pure performance and power-efficiency testing, to reduce the overall development costs. This is an eminently logical starting point, since there is little value in reinventing the wheel and finding yet another workload with all the right properties (easy to set up, meaningful to end users, inexpensive and open licensing, etc.).
SPECpower’s framework deals with extending the two common types of existing benchmarks: those that run to completion and are either 100% active or idle, and benchmarks where the throughput can be throttled or otherwise controlled. Generally, run to completion benchmarks are more common for single user or HPC workloads and examples include SPEC CPU, gaming benchmarks and media encoding. Server workloads tend to be throughput oriented, in response to requests from a client or front-end servers, typical examples include TPC-C and SPECjbb2005.
In general, throughput oriented benchmarks are strongly preferred because the workload can be throttled to yield graduated utilization levels (i.e. idle/0%, 10%, 20%…) which more accurately model real world situations – typical data centers are around 5-20% utilized although this is slowly changing with virtualization. The flexibily of a graduated workload generally allows any user to find the usage level that most closely tracks their own environment. The throttling mechanism used to simulate new server requests must closely match real world activity; an exponential distribution is typically used to model bursty and irregular requests.
For benchmarks which operate at 100% utilization with execution time as the primary performance metric, SPEC’s methodology augments the active profile information with the power consumption when in an idle state.
In either case, the testing provides a set of performance data (at the various load levels, where idle performance = 0) and a set of power data (average power draw at the different load levels). The performance data and average power data are separately summed, and the performance sum is divided by the average power sum – yielding an average performance/watt figure of merit.
The SPECpower methodology also goes to great lengths to discuss some of the more complicated factors in power-efficiency benchmarking, such as environmental conditions and test equipment. The two key pieces of test equipment are a power meter and a thermal sensor. The power meter must be fairly high performance to be suitable. It must log raw data faster than once a second, with internal averaging that is faster (1-2X the logging rate). Because of run-to-run performance variation, the meter only needs to be accurate to around 2%, which is reasonable for most meters. The trickiest problem is finding a meter that has a large enough dynamic range in measuring amperage to accomodate instantaneous spikes. Through the course of designing SPECpower, engineers (and end users such as myself) found extraordinarily large current spikes were problematic even for meters like those from Extech and would result in unreasonable values that were far too large, negative or otherwise nonsensical. SPEC determined that meters must be able to handle a current spike of 3X the maximum current of a system during the benchmark in order to produce accurate results. For a standard two socket server, the power draw is probably 3-5 amps depending on the system configuration, so the meter must accomodate relatively large spikes in current.
The thermal sensor is much more straight forward; it must log 4 times/minute, and the accuracy must be within half a degree Celsius.
The environmental conditions that SPEC discussed were quite wide ranging, including humidity, external air flow, temperature, air pressure and cooling method. The first two have little impact (unless the air flow is extremely abnormal). Temperature has a substantial impact since it will impact fan behavior and other factors, and the framework requires that the ambient temperature be a minimum of 68 degrees Fahrenheit or 20 degrees Celcius, as measured by the thermal sensor. Increasing air pressure slightly improves the ability of fans and other equipment to move heat out of the system, but should have relatively little impact if kept within normal ranges. SPECpower is designed for use with air-cooled systems and should not be used for water cooled systems. Water cooled systems behave quite differently and have a different set of costs and trade-offs associated with them that are not captured by SPECpower.