Testing Hypotheses Involving the Variances of Two Populations
Tests of hypotheses involving the variances of two populations are not as common as tests of hypotheses involving the means or proportions of two populations. Some types of situations in which variances of two populations need to be compared are:

when you need to compare degrees of uniformity of two populations; when the difference of their means or proportions is not a big issue, but the degree of variation is. If two drugs, or two food additives, or two processes lead to similar mean effects, but one produces a more uniform effect than the other, the more uniformly acting alternative may be preferred. For example, in the food industry, you may have two varieties of say, potato plant, both of which produce essentially equivalent mean yields in units of tons per hectare. However, if most of the potatoes produced by one variety are nearly the same size or weight or shape, whereas the other variety produces many small potatoes and large potatoes, then a processor might prefer the more uniform sized variety.

Recall that when we wished to compare the means of two populations with one or both of the samples having sizes less than 30, valid application of the tdistribution required that the two populations needed to have the same variances: _{1}^{2} = _{2}^{2}. One way to verify that the data is consistent with this requirement would be to carry out a hypothesis test based on the null hypothesis H_{0}: _{1}^{2} = _{2}^{2}.
So, the null hypothesis we will be testing here is
H_{0}: _{1}^{2} = _{2}^{2} (TPVARHT  1)
where
_{1}^{2} is the variance of population #1
and _{2}^{2} is the variance of population #2.
The test exploits the fact that if s_{1}^{2} is the variance of a random sample of size n_{1} drawn from population 1 which is normally distributed, and s_{2}^{2} is the variance of a random sample of size n_{2} drawn from population 2 which is also normally distributed, then the random variable
(TPVARHT  2)
has the socalled Fdistribution with
numerator degrees of freedom = _{1} = n_{1}  1
and
denominator degrees of freedom = _{2} = n_{2}  1
Note that if H_{0} is true, then _{1}^{2} = _{2}^{2} and so
(TPVARHT  3)
Further, to the extent that the data contradicts H_{0}, we expect this ratio (TPVARHT  3) to differ from 1 (since if _{1}^{2} = _{2}^{2}, perfectly representative samples would give s_{1}^{2} = s_{2}^{2} as well, and so F = 1 in that case). Thus, the hypothesis testing rules become:
Hypotheses:

reject H_{0} at a level of significance if:

pvalue

H_{0}: _{1}^{2} = _{2}^{2}
H_{A}: _{1}^{2} > _{2}^{2}

F > F_{}_{,}_{}_{1, }_{}_{2}
(singletailed rejection region)

Pr(F > test statistic value)

H_{0}: _{1}^{2} = _{2}^{2}
H_{A}: _{1}^{2} < _{2}^{2}

F < F_{1}_{}_{,}_{}_{1,}_{}_{2}
(singletailed rejection region)

Pr(F < test statistic value)

H_{0}: _{1}^{2} = _{2}^{2}
H_{A}: _{1}^{2} _{2}^{2}

F > F_{}_{/2,}_{}_{1,}_{}_{2} or F < F_{1}_{}_{/2,}_{}_{1,}_{}_{2}
(twotailed rejection region)

Smaller of
2 Pr(F > test statistic)
or
2 Pr(F < test statistic)

We will call the rules contained in this table "The Ftest."
Note that the test statistic, F, must always be a positive number because s_{1}^{2} and s_{2}^{2} are both always positive numbers. Details of the properties and use of the Fdistribution, and tables of critical values of the standard F random variable for right tail areas of 0.05 and 0.01 are given in the short document following this one. Because of the two degree of freedom numbers associated with the F random variable, each page of a printed table can give critical values for just one tail area and a selection of values of each of the degrees of freedom numbers. For calculations involving values of tail areas and degree of freedom numbers other than those represented by the abbreviated (though quite conventional) tables in the next document, you can use the FDIST() and FINV() functions supplied with Excel/97 or equivalent functionality in other computer applications.
The rules in the table above reflect the fact that the F distribution is not symmetric. As a result, some of the rejection criteria stated above (for the lefttailed and twotailed test) involve righttail areas which would normally be quite large numbers such as 0.95. whereas the standard tables typically give critical values of F only for righttail areas of 0.05 and 0.01. In such a situation, you can do one of three things:
(i) note the property of the Fdistribution that
(TPVARHT  4)
You can use this formula to give critical values of the F random variable for righttail areas of 0.95 using tables of critical values for righttail areas of 0.05, for example.
(ii) You can always rearrange the hypotheses for onetailed tests so that they are righttailed tests.
H_{0}: _{1}^{2} = _{2}^{2}
H_{A}: _{1}^{2} < _{2}^{2}
is equivalent to
H_{0}: _{2}^{2} = _{1}^{2}
H_{A}: _{2}^{2} > _{1}^{2}
(iii) Use a computer application such as Excel/97 to generate the precise critical values that you need. Such computerbased functions will work for any valid righttail areas and degree of freedom numbers.
Strategies (I) and (ii) really amount to the same thing here. Flipping the hypotheses as indicated in (ii) will result in a standardized test statistic, F, given by (TPVARHT  3) which is the reciprocal of the former value, and will have the two degree of freedom numbers swapped.
Before illustrating these formulas with some examples, we mention one caution voiced by many authors. The accuracy of the Ftest described above seems to be quite sensitive to deviations of the populations from being normally distributed. This means that checking whether the data is consistent with the populations being normally distributed should have a fairly high priority here.
In fact, as Devore and others caution, special care must be exercised when this Ftest is being used to assess whether the condition of equal variances is met prior to performing a ttest on the difference of two population means. There are several problems here. First, consistency with a normally distributed population is more difficult to assess reliably when only small samples are available. Secondly, the ttest is known to be quite insensitive to moderate departures from normality in the populations. As a result, it may happen that an inappropriately applied Ftest will erroneously indicate that _{1}^{2} _{2}^{2} when the ttest would have worked fine.
Example 1: The suspicion is voiced that apples left longer on the tree become less uniform in size. Is this suspicion supported by the Jonagold apple data for the first two harvest dates? (Refer to the standard data sets distributed earlier in the course.)
Solution
The variance of the apple weights is a measure of the uniformity (or lack of uniformity) of the population of apple weights. If the weights of the population of apples harvested on the first date has a variance of _{1}^{2} and the weights of the apples harvested on the second date have a variance _{2}^{2}, then rejecting H_{0} in
H_{0}: _{2}^{2} = _{1}^{2}
H_{A}: _{2}^{2} > _{1}^{2}
amounts to supporting the conclusion that _{2}^{2} > _{1}^{2} or that population 2 has a greater variability than does population 1.
The raw data is given in the standard data sets document. From it, we get for the first harvest date that
n_{1} = 60 = 219.73 g s_{1} = 42.88 g
and for the second harvest date
n_{2} = 55 = 257.27 g s_{2} = 52.35 g
(Actually, we don't really need the values of the sample means here.) Since the Ftest requires that the two populations be normally distributed, we prepare normal probability plots for the two sets of data:
While the points in these plots are not on the straightest possible lines, neither plot shows clear signs of the data deviating from normality in a serious way, so we are reasonably safe in using the Ftest here.
The standardized test statistic is
We have _{n} = 55  1 = 54 and _{D} = 60  1 = 59, and using = 0.05, the best we can do from the printed tables of critical values of the F random variable is
F_{0.05, 54, 59} F_{0.05, 40, 50} = 1.63
(Since the tables do not contain entries for _{n} = 54 and _{D} = 59, we took the entry for the closest more rigorous values, _{n} = 40 and _{D} = 50. We are using the symbols _{n} and _{D}, respectively, for the numerator and denominator degrees of freedom to avoid the confusion that would result from the numerical subscripts given that we've reversed the subscripts 1 and 2 in the hypotheses in this example relative to the template rules in the table earlier. )
So, we can reject H_{0} here at a level of significance of 0.05 if the calculated test statistic is greater than 1.63. But 1.490 is not greater than 1.63, and so we cannot reject H_{0}. (In fact, using the FDIST() function in Excel/97, we find that the pvalue for this hypothesis test is 0.0676, which while not inordinately large, is still a bit larger than the 0.05 that most practitioners would consider the largest allowable pvalue for rejecting a null hypothesis.) Thus, the best we can say is that the data presented is not strong evidence that apples harvested at a later date are less uniform than apples harvested at the earlier date.
Example 2: Refer to the standard data sets giving percentages of various amino acids in specimens of natural and artificial shark fin. A technologist wishes to test hypotheses to determine if the mean percentage of the amino acid alanine differs between the two shark fin preparations. However, since the sample sizes are just 15 in both cases, and therefore small, this will require the assumption that both populations have equal variances. What does the Ftest say about the validity of that assumption here?
Solution
We won't repeat the raw data here  it's available in the document containing the standard data sets (we are referring specifically to the data sets labeled SharkfinNatAla and SharkfinArtAla). We are really being asked to test the hypotheses:
H_{0}: _{nat}^{2} = _{art}^{2}
vs
H_{A}: _{nat}^{2} _{art}^{2}
From the data, we have
n_{nat} = 15 s_{nat} = 2.423
and
n_{art} = 15 s_{art} = 1.565
(
The s's are in units of percent.) Further, the normal probability plots for each set of data are:
These appear to be consistent with the populations being approximately normally distributed, so application of the Ftest to the above hypotheses should be valid.
The standard test statistic is calculated to be
Since this is a twotailed test, and the standard tables of critical values of the Fdistribution have values for tail areas of 0.05 and 0.01 only, we will be able to use those tables to test these hypotheses at a level of significance of either 0.10 or 0.02 only (that is, twice the single tail areas represented in the tables). We choose to use = 0.10 here.
So, H_{0} can be rejected at = 0.10 if either
or
(Again, we've had to go to the closest more rigorous entry in the table because it doesn't cover the precise degrees of freedom required in this problem.) Since 2.397 is not greater than 2.53, nor is 2.397 less than 0.395, we cannot reject H_{0} at a level of significance of 0.10. Thus, we can conclude that the data is not inconsistent with _{nat}^{2} = _{art}^{2}, and so this condition of validity for the ttest appears to be met. You shouldn't be too concerned that 2.397 is quite close to 2.53, because after all, we are working with quite a large value of here. In fact, using the FDIST() function in Excel/97, we get the pvalue for this test as
pvalue = 2 Pr(F > 2.397, _{n} = 14, _{D} = 14)
= 2(0.05677) = 0.1135
This is too large a value to seriously consider rejecting H_{0} here.
Example 3 A food technologist frequently carries out experiments in which participants are asked to rate various qualities of potential foods on numerical scales  for example, from 0 (for awful) to 10 (for excellent). She suspects that male participants tend to give more uniform responses than female participants. Before spending much time speculating on what this might mean about the relationship between sensory perception and gender, she decides that first she must devise an experiment to see if she can find evidence that the effect is real.
This is what she does (in our little fairy story…). She prepares a set of identical food specimens and asks 61 randomly selected men and 61 randomly selected women to rate the food specimens on a scale of 0 to 10. The resulting data is:
men:
6 2 7 3 3 4 6 3 5 4 3 6 2 5 3 6 2 5 5 2 5
4 4 8 6 4 2 7 4 2 3 3 7 4 4 6 3 5 2 4 5 2
4 3 2 3 3 1 3 2 4 2 5 3 6 2 4 3 4 4 5
women:
3 6 6 2 9 7 7 5 10 2 4 2 6 0 6 9 2 6 4 6 4
2 4 10 5 9 3 1 7 2 3 2 7 6 6 3 6 2 6 0 8 4
6 6 1 8 3 2 6 7 5 8 4 2 4 6 4 5 0 6 4
The standard deviation of the men's ratings is 1.584 and the standard deviations of the women's ratings is 2.523. (The mean ratings are somewhat different between the two samples as well, though that fact is not directly relevant here. You notice that none of the men gave a rating of 0 or 10, the most extreme possible, whereas several women gave those ratings. In the world in which these numbers were real data from an actual experiment, this might be suggestive of an effect. However, we wouldn't want to base a conclusion on only the occurrence of most extreme responses in the samples, hence the need to do a more systematic hypothesis test here.)
A test of hypotheses involving the variance of these responses should indicate whether the responses of male participants are more uniform than the responses of female participants. Carry out the appropriate hypothesis test and comment on the result.
Solution
It appears we need to test the hypotheses:
H_{0}: ^{2}_{women} = ^{2}_{men}
vs.
H_{A}: ^{2}_{women} > ^{2}_{men}
We can use the Ftest if we have some assurance that the responses are consistent with a normallydistributed population. Since the form of the responses are a single value chosen from a set of only eleven distinct whole number values, it's probably worthwhile to examine this assumption a bit.
I
t's quite easy to construct frequency histograms for the responses:
T
he responses by the men form a tighter distribution, but seem to show a bit of a right skew. The responses by the women don't appear to be particularly skewed, but form a rather jagged approximation to the desired bell shape. The corresponding normal probability plots are:
You can see some evidence of the skewing to the right in the normal probability plot of the men's responses there is a perceptible curvature to the path through the rough centers of each line of points. However, this is not a dominating feature and so we'll proceed as if approximate normality has been demonstrated for the men's responses. The normal probability plot of the women's responses raises no concerns.
So, now apply the Ftest. The value of the standardized test statistic is
We may reject H_{0} at a level of significance of 0.05 if
F > F_{0.05, 60, 60} = 1.53
(The decision to include 61 participants in each sample was to get _{1} = 60 and _{2} = 60, for which our Ftables have entries.)
But, 2.537 is greater than 1.53, and so we can reject H_{0}. The data supports the claim that the variance of the men's responses is less than the variance of the women's responses. This implies that the men's responses are more uniform than those of the women.
(NOTE: As with all examples in these course notes, this data is simulated to achieve certain pedagogical goals and should not be used to draw actual conclusions about the real world. For that, you would need to perform this experiment yourself using real world participants and materials!)
Confidence Interval Estimates Involving the Variances of Two Populations
Since by definition, there is a probability of 100(1  )% that
or, using (TPVARHT  2), that
(TPVARHT  5)
we have a way to come up with 100(1  )% confidence interval estimates involving the two population variances. Taking the left inequality in the preceding relation,
and rearranging the righthand side somewhat
you can see that we can write
Similarly, starting with the righthand inequality of (TPVARHT  5), we eventually get
Putting these last two inequalities together into a single intervallike expression then gives the final result
@ 100(1  )% (TPVARHT  6)
This is a formula for a confidence interval estimate of the ratio of the two population variances.
Example 4
Just a very brief example of how formula (TPVARHT  6) can be used. Consider the data in Example 3, just above. Using the existing Ftables, we can construct confidence interval estimates with either a 90% confidence level (singletail area of 0.05) or a 98% confidence level (single tail area of 0.01). Here we will choose the 90% confidence interval estimate.
For this, we need F_{0.05, 60, 60 } = 1.53, and
Thus, with s_{women} = 2.523 and s_{men} = 1.584, we get from (TPVARHT  6) the result
or
@ 90%
Thus, at a level of confidence of 90%, we conclude that the variance of the population of women's responses is between 1.659 and 3.882 times as large as the variance of the men's responses.
© David W. Sabo (1999)

Hypotheses Involving Two Population Variances

Page of

