Hypothesis testing is a statistical procedure used to provide evidence in favor of some statement (called a hypothesis). For instance, hypothesis testing might be used to assess whether a population parameter, such as a population mean, differs from a specified standard or previous value. In this chapter we discuss testing hypotheses about population means, proportions, and variances.
In order to illustrate how hypothesis testing works, we revisit several cases introduced in previous chapters and also introduce some new cases:
The Payment Time Case: The consulting firm uses hypothesis testing to provide strong evidence that the new electronic billing system has reduced the mean payment time by more than 50 percent.
The Cheese Spread Case: The cheese spread producer uses hypothesis testing to supply extremely strong evidence that fewer than 10 percent of all current purchasers would stop buying the cheese spread if the new spout were used.
The Electronic Article Surveillance Case: A company that sells and installs EAS systems claims that at most 5 percent of all consumers would never shop in a store again if the store subjected them to a false EAS alarm. A store considering the purchase of such a system uses hypothesis testing to provide extremely strong evidence that this claim is not true.
The Trash Bag Case: A marketer of trash bags uses hypothesis testing to support its claim that the mean breaking strength of its new trash bag is greater than 50 pounds. As a result, a television network approves use of this claim in a commercial.
The Valentine’s Day Chocolate Case: A candy company projects that this year’s sales of its special valentine box of assorted chocolates will be 10 percent higher than last year. The candy company uses hypothesis testing to assess whether it is reasonable to plan for a 10 percent increase in sales of the valentine box.
9.1: The Null and Alternative Hypotheses and Errors in Hypothesis Testing
One of the authors’ former students is employed by a major television network in the standards and practices division. One of the division’s responsibilities is to reduce the chances that advertisers will make false claims in commercials run on the network. Our former student reports that the network uses a statistical methodology called hypothesis testing to do this.
To see how this might be done, suppose that a company wishes to advertise a claim, and suppose that the network has reason to doubt that this claim is true. The network assumes for the sake of argument that the claim is not valid. This assumption is called the null hypothesis. The statement that the claim is valid is called the alternative, or research, hypothesis. The network will run the commercial only if the company making the claim provides sufficient sample evidence to reject the null hypothesis that the claim is not valid in favor of the alternative hypothesis that the claim is valid. Explaining the exact meaning of sufficient sample evidence is quite involved and will be discussed in the next section.
The Null Hypothesis and the Alternative Hypothesis
In hypothesis testing:
1 The null hypothesis, denoted H0, is the statement being tested. Usually this statement represents the status quo and is not rejected unless there is convincing sample evidence that it is false.
2 The alternative, or research, hypothesis, de noted Ha, is a statement that will be accepted only if there is convincing sample evidence that it is true.
Setting up the null and alternative hypotheses in a practical situation can be tricky. In some situations there is a condition for which we need to attempt to find supportive evidence. We then formulate (1) the alternative hypothesis to be the statement that this condition exists and (2) the null hypothesis to be the statement that this condition does not exist. To illustrate this, we consider the following case studies.
EXAMPLE 9.1: The Trash Bag Case1
A leading manufacturer of trash bags produces the strongest trash bags on the market. The company has developed a new 30-gallon bag using a specially formulated plastic that is stronger and more biodegradable than other plastics. This plastic’s increased strength allows the bag’s thickness to be reduced, and the resulting cost savings will enable the company to lower its bag price by 25 percent. The company also believes the new bag is stronger than its current 30-gallon bag.
The manufacturer wants to advertise the new bag on a major television network. In addition to promoting its price reduction, the company also wants to claim the new bag is better for the environment and stronger than its current bag. The network is convinced of the bag’s environmental advantages on scientific grounds. However, the network questions the company’s claim of increased strength and requires statistical evidence to justify this claim. Although there are various measures of bag strength, the manufacturer and the network agree to employ “breaking strength.” A bag’s breaking strength is the amount of a representative trash mix (in pounds) that, when loaded into a bag suspended in the air, will cause the bag to rip or tear. Tests show that the current bag has a mean breaking strength that is very close to (but does not exceed) 50 pounds. The new bag’s mean breaking strength μ is unknown and in question. The alternative hypothesis Ha is the statement for which we wish to find supportive evidence. Because we hope the new bags are stronger than the current bags, Ha says that μ is greater than 50. The null hypothesis states that Ha is false. Therefore, H0 says that μ is less than or equal to 50. We summarize these hypotheses by stating that we are testing
H0: μ ≤ 50 versus Ha: μ > 50
The network will run the manufacturer’s commercial if a random sample of n new bags provides sufficient evidence to reject H0: μ ≤ 50 in favor of Ha: μ > 50.
EXAMPLE 9.2: The Payment Time Case
Recall that a management consulting firm has installed a new computer-based, electronic billing system for a Hamilton, Ohio, trucking company. Because of the system’s advantages, and because the trucking company’s clients are receptive to using this system, the management consulting firm believes that the new system will reduce the mean bill payment time by more than 50 percent. The mean payment time using the old billing system was approximately equal to, but no less than, 39 days. Therefore, if μ denotes the mean payment time using the new system, the consulting firm believes that μ will be less than 19.5 days. Because it is hoped that the new billing system reduces mean payment time, we formulate the alternative hypothesis as Ha: μ < 19.5 and the null hypothesis as H0: μ ≥ 19.5. The consulting firm will randomly select a sample of n invoices and determine if their payment times provide sufficient evidence to reject H0: μ ≥ 19.5 in favor of Ha: μ < 19.5. If such evidence exists, the consulting firm will conclude that the new electronic billing system has reduced the Hamilton trucking company’s mean bill payment time by more than 50 percent. This conclusion will be used to help demonstrate the benefits of the new billing system both to the Hamilton company and to other trucking companies that are considering using such a system.
EXAMPLE 9.3: The Valentine’s Day Chocolate Case 2
A candy company annually markets a special 18 ounce box of assorted chocolates to large retail stores for Valentine’s Day. This year the candy company has designed an extremely attractive new valentine box and will fill the box with an especially appealing assortment or chocolates. For this reason, the candy company subjectively projects—based on past experience and knowledge of the candy market—that sales of its valentine box will be 10 percent higher than last year. However, since the candy company must decide how many valentine boxes to produce, the company needs to assess whether it is reasonable to plan for a 10 percent increase in sales.
Before the beginning of each Valentine’s Day sales season, the candy company sends large retail stores information about its newest valentine box of assorted chocolates. This information includes a description of the box of chocolates, as well as a preview of advertising displays that the candy company will provide to help retail stores sell the chocolates. Each retail store then places a single (nonreturnable) order of valentine boxes to satisfy its anticipated customer demand for the Valentine’s Day sales season. Last year the mean order quantity of large retail stores was 300 boxes per store. If the projected 10 percent sales increase will occur, the mean order quantity, μ, of large retail stores this year will be 330 boxes per store. Therefore, the candy company wishes to test the null hypothesis H0: μ = 330 versus the alternative hypothesis Ha: μ ≠ 330.
To perform the hypothesis test, the candy company will randomly select a sample of n large retail stores and will make an early mailing to these stores promoting this year’s valentine box. The candy company will then ask each retail store to report how many valentine boxes it anticipates ordering. If the sample data do not provide sufficient evidence to reject H0: μ = 330 in favor of Ha: μ ≠ 330, the candy company will base its production on the projected 10 percent sales increase. On the other hand, if there is sufficient evidence to reject H0: μ = 330, the candy company will change its production plans.
We next summarize the sets of null and alternative hypotheses that we have thus far considered.
The alternative hypothesis Ha: μ > 50 is called a one-sided, greater than alternative hypothesis, whereas Ha: μ < 19.5 is called a one-sided, less than alternative hypothesis, and Ha: μ ≠ 330 is called a two-sided, not equal to alternative hypothesis. Many of the alternative hypotheses we consider in this book are one of these three types. Also, note that each null hypothesis we have considered involves an equality. For example, the null hypothesis H0: μ ≤ 50 says that μ is either less than or equal to 50. We will see that, in general, the approach we use to test a null hypothesis versus an alternative hypothesis requires that the null hypothesis involve an equality.
The idea of a test statistic
Suppose that in the trash bag case the manufacturer randomly selects a sample of n = 40 new trash bags. Each of these bags is tested for breaking strength, and the sample mean of the 40 breaking strengths is calculated. In order to test H0: μ ≤ 50 versus Ha: μ > 50, we utilize the test statistic
The test statistic z measures the distance between and 50. The division by says that this distance is measured in units of the standard deviation of all possible sample means. For example, a value of z equal to, say, 2.4 would tell us that is 2.4 such standard deviations above 50. In general, a value of the test statistic that is less than or equal to zero results when is less than or equal to 50. This provides no evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ is probably less than or equal to 50. However, a value of the test statistic that is greater than zero results when is greater than 50. This provides evidence to support rejecting H0 in favor of Ha because the point estimate indicates that μ might be greater than 50. Furthermore, the farther the value of the test statistic is above 0 (the farther is above 50), the stronger is the evidence to support rejecting H0 in favor of Ha.
Hypothesis testing and the legal system
If the value of the test statistic z is far enough above 0, we reject H0 in favor of Ha. To see how large z must be in order to reject H0, we must understand that a hypothesis test rejects a null hypothesis H0 only if there is strong statistical evidence against H0. This is similar to our legal system, which rejects the innocence of the accused only if evidence of guilt is beyond a reasonable doubt. For instance, the network will reject H0: μ ≤ 50 and run the trash bag commercial only if the test statistic z is far enough above 0 to show beyond a reasonable doubt that H0: μ ≤ 50 is false and Ha: μ > 50 is true. A test statistic that is only slightly greater than 0 might not be convincing enough. However, because such a test statistic would result from a sample mean that is slightly greater than 50, it would provide some evidence to support rejecting H0: μ ≤ 50, and it certainly would not provide strong evidence sup porting H0: μ ≤ 50. Therefore, if the value of the test statistic is not large enough to convince us to reject H0, we do not say that we accept H0. Rather we say that we do not reject H0 because the evidence against H0 is not strong enough. Again, this is similar to our legal system, where the lack of evidence of guilt beyond a reasonable doubt results in a verdict of not guilty, but does not prove that the accused is innocent.