### But What is the Idea Behind Statistical Analysis?

For the purposes of this article, statistical theory is based on a concept called the normal distribution/curve. I’m sure you have heard of it. The premise is that in any *random* sample, the results will cluster about some mean, or average value. As you get closer to the mean, you get more results. As you get further away from the mean, you get fewer. In the ideal case, 95% of values are within 2 standard deviations either side of the mean. What this means is that the area underneath this “bell curve” gets smaller and smaller the further you move away from the mean (average). The further you move away from the mean, the less chance you have of seeing a result – not impossible mind, but very unlikely. This is the problem with using the highest score as a means of representing performance, you might have a high score that is close to the mean, or you may have an outlier, a 1-in-1000 score. Without knowing how all the results paned out though, you won’t know, and in nearly all of the reviews that I have read this ** small but important** bit of data isn’t disclosed.

In the interest of keeping this article to a manageable size, and on topic I will skip the statistics backgrounder as there are many texts that deal with statistics on various levels which can answer your questions better than I.

What we are interested in is the part that applies to analysing the results we are dealing with – comparing two samples. The tricky part is determining if the samples are dependent or independent. An independent sample is one in which objects from the two “populations” are unrelated. If however the two populations (what we are trying to measure) are related such that when an object (in our case a Winstone score) chosen from one population (group of scores) another is chosen from the other population, we have a *dependent* sample.

In this particular case, where we have a system where we are changing one variable (the video card), we would have a dependent sample. Also, note that we only have 25 samples, and so we use a T distribution rather than the normal (z) distribution. This talk about distributions is simply the shape of the bell curve that is generated from a sample of results. The curve we are using (the T distribution) has a slightly different curve to make up for small samples sizes (< 30). Why 30? Simple. The shape of the T distribution is very close to the normal distribution when your sample has 30 or more values – so you would use the normal curve for sample sizes greater than 30.

Resolving a question via statistical means involves generating hypotheses – a claim and an alternate. In our case, the claim (called the null hypothesis – *H _{0}*) is that the performance of Kyro 2 = GeForce 2 GTS. The alternate hypothesis (

*H*) is that the performance of Kyro 2 < GeForce 2 GTS. I should note that hypothesis are generally constructed in such a way that the alternative (

_{a}*H*) hypothesis is the one that the researcher generally believes to be the correct view.

_{a}We now introduce the rejection region. The rejection region is the area under the curve which a result falling in this region would lead us to believe that the alternate hypothesis was in fact the correct one. If our experimental results fall * mainly* within this region, then we would accept the proposal of the alternate
hypothesis as the chances of observing a large number of rare cases is very small.

Trouble looms however. It is always possible to make an incorrect decision. And we can never eliminate the possibility of making such an error. What we can do is minimise the likelihood of making an incorrect error. The table below indicates the things that can go wrong.

Decision | H_{0} is correct | H_{a} is correct |

Reject H(_{0} H is
correct)_{a} | Type I error | Correct |

Accept H_{0} | Correct | Type II error |

As I have indicated before, experimenters usually structure the hypotheses in
such a way that they believe the alternate hypothesis to be correct. What
we must therefore do is minimise the chances of * rejecting**H _{0}* when it is

**correct. By doing this we avoid touting our hypothesis as correct when it is not. But remember, there is always still a small chance that we will get it wrong.**

*actually*What we want to decide is if the mean scores obtained using the Kyro 2 card is different to the mean scores using the GeForce2 GTS video card. What we want to do is determine the difference between two means (*U _{1} – U_{2}*). We do this by considering a population of differences that has a mean (

*U*).

_{D}*H _{0}: U_{D} = D_{0}*

*H _{a: }U_{D} < D_{0}*

Now for some formulae. The test statistic is given by

*S _{D}*and

*X*are calculated values:

_{D},/p>

An explanation of the terms.

*n*_{d}is the number of sample differences (25)*S*is the sample standard deviation_{D}*D*is the hypothesised mean difference (and usually zero)_{0}- D is the difference between the two results

Plugging in the numbers (table of results):

*X _{D}* = 80.2/25 = 3.208

*S _{D}* = 3.134

For the sake of this article I will define the rejection region
using a 95% confidence interval – in other words I will be 95% sure that I am
correct. The value is taken from a series of pre calculated table.
For *n*-1 degrees of freedom (one-tailed), the value is 1.71. Therefore reject *H _{0}* if

*t*> 1.71

t= 3.208 – 0/(3.134/5^{)}

t= 3.208/0.6268

t= 5.156

As *t* *is greater* than 1.71, we ** reject** the null
hypothesis and accept the alternate, *that the performance of the Kyro 2 is less
than the GeForce GTS2 in Content Creation Winstone 2001*.

Big deal you say. I could have told you that by looking at the graph or even the original bar graph.

OK clever person, take a look at this graph

**Figure 6**

Normally this would appear in a “review” with the indication being that the Kyro 2 is beaten by the GeForce2 Pro (again). The line graph doesn’t help much.

**Figure 7**

It appears that again the Kyro is outpaced. But is it? Using the
same hypotheses as above and plugging in the numbers we get *t* = 1.361 – *which
is not larger than 1.71,* so

**we do not**reject

*H*this time. The upshot? For Business Winstone 2001, there is not enough evidence to support a conclusion that the GeForce GTS2 is faster.

_{0}What’s that you say? Don’t you mean that the cards are equal in
performance? No I don’t. Remember the discussion about Type I and
Type II errors? Just because we have **not** produced evidence to
support our theory, **it does not automatically mean that we embrace the
corollary**. *To do so would run the risk of introducing a type II
error!* We have to fall back upon the (rather unsatisfactory) statement
of not rejecting *H _{0}*, which is not the same as accepting
it. What we have is a situation where we have failed to prove our
hypothesis (remember, we framed the null and alternate hypotheses in such a way
as we believed

*H*to actually be the correct hypothesis). We then constructed our analysis in such a way as to minimise the chances of getting a type I error (rejecting

_{a}*H*when it is in fact correct). But this leads to a greater risk of producing type II errors so in order to minimise the chances of making this error, we are conservative in our conclusions. To form a definitive conclusion, we would have to redo our tests and analysis.

_{0}This is where statistical analysis methods come into their own.
Statistical analysis can tell us if a difference in performance is significant
or not within a series of *marginal* results. There is no guesswork,
and no "voodoo" numbers. We have been as methodical as we can
and based on the results *as run*, the results are a dead heat. And
this is in stark contrast to what some reviews would have you believe.

Pages: « Prev 1 2 3 4 5 Next »

Discuss (16 comments)