Hypothesis Testing

Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution. First, a tentative assumption is made about the parameter or distribution. This assumption is called the null hypothesis and is denoted by H0. An alternative hypothesis (denoted Ha), which is the opposite of what is stated in the null hypothesis, is then defined. The hypothesis-testing procedure involves using sample data to determine whether or not H0 can be rejected. If H0 is rejected, the statistical conclusion is that the alternative hypothesis Ha is true.

For example, assume that a radio station selects the music it plays based on the assumption that the average age of its listening audience is 30 years. To determine whether this assumption is valid, a hypothesis test could be conducted with the null hypothesis given as H0: = 30 and the alternative hypothesis given as Ha: 30. Based on a sample of individuals from the listening audience, the sample mean age, , can be computed and used to determine whether there is sufficient statistical evidence to reject H0. Conceptually, a value of the sample mean that is "close" to 30 is consistent with the null hypothesis, while a value of the sample mean that is "not close" to 30 provides support for the alternative hypothesis. What is considered "close" and "not close" is determined by using the sampling distribution of .

Ideally, the hypothesis-testing procedure leads to the acceptance of H0 when H0 is true and the rejection of H0 when H0 is false. Unfortunately, since hypothesis tests are based on sample information, the possibility of errors must be considered. A type I error corresponds to rejecting H0 when H0 is actually true, and a type II error corresponds to accepting H0 when H0 is false. The probability of making a type I error is denoted by , and the probability of making a type II error is denoted by .

In using the hypothesis-testing procedure to determine if the null hypothesis should be rejected, the person conducting the hypothesis test specifies the maximum allowable probability of making a type I error, called the level of significance for the test. Common choices for the level of significance are = 0.05 and = 0.01. Although most applications of hypothesis testing control the probability of making a type I error, they do not always control the probability of making a type II error. A graph known as an operating-characteristic curve can be constructed to show how changes in the sample size affect the probability of making a type II error.

A concept known as the p-value provides a convenient basis for drawing conclusions in hypothesis-testing applications. The p-value is a measure of how likely the sample results are, assuming the null hypothesis is true; the smaller the p-value, the less likely the sample results. If the p-value is less than , the null hypothesis can be rejected; otherwise, the null hypothesis cannot be rejected. The p-value is often called the observed level of significance for the test.

A hypothesis test can be performed on parameters of one or more populations as well as in a variety of other situations. In each instance, the process begins with the formulation of null and alternative hypotheses about the population. In addition to the population mean, hypothesis-testing procedures are available for population parameters such as proportions, variances, standard deviations, and medians.

Hypothesis tests are also conducted in regression and correlation analysis to determine if the regression relationship and the correlation coefficient are statistically significant. A goodness-of-fit test refers to a hypothesis test in which the null hypothesis is that the population has a specific probability distribution, such as a normal probability distribution. Nonparametric statistical methods also involve a variety of hypothesis-testing procedures.

Now that we have the foundation for what a hypothesis test is lets understand the method in which we use it. The first step in hypothesis testing is to specify the null hypothesis (H0) and the alternative hypothesis (H1). If the research concerns whether one method of presenting pictorial stimuli leads to better recognition than another, the null hypothesis would most likely be that there is no difference between methods (H0: 1 - 2 = 0). The alternative hypothesis would be H1: 1 2. If the research concerned the correlation between grades and SAT scores, the null hypothesis would most likely be that there is no correlation (H0: r= 0). The alternative hypothesis would be H1: r 0.

2. The next step is to select a significance level. Typically the .05 or the .01 level is used.

3. The third step is to calculate a statistic analogous to the parameter specified by the null hypothesis. If the null hypothesis were defined by the parameter 1- 2, then the statistic M1 - M2 would be computed.

4. The fourth step is to calculate the probability value (often called the p value) which is the probability of obtaining a statistic as different or more different from the parameter specified in the null hypothesis as the statistic computed from the data. The calculations are made assuming that the null hypothesis is true.

5. The probability value computed in Step 4 is compared with the significance level chosen in Step 2. If the probability is less than or equal to the significance level, then the null hypothesis is rejected; if the probability is greater than the significance level then the null hypothesis is not rejected. When the null hypothesis is rejected, the outcome is said to be "statistically significant"; when the null hypothesis is not rejected then the outcome is said be "not statistically significant."

6. If the outcome is statistically significant, then the null hypothesis is rejected in favor of the alternative hypothesis. If the rejected null hypothesis were that 1- 2 = 0, then the alternative hypothesis would be that 1 2. If M1 were greater than M2 then the researcher would naturally conclude that 1 2.

7. The final step is to describe the result and the statistical conclusion in an understandable way. Be sure to present the descriptive statistics as well as whether the effect was significant or not. For example, a significant difference between a group that received a drug and a control group might be described as follow:

Subjects in the drug group scored significantly higher (M = 23) than did subjects in the control group (M = 17), t(18) = 2.4, p = 0.027.

The statement that "t(18) =2.4" has to do with how the probability value (p) was calculated. A small minority of researchers might object to two aspects of this wording. First, some believe that the significance level rather than the probability level should be reported. So since the alternative hypothesis was stated as 1 2, some might argue that it can only be concluded that the population means differ and not that the population mean for the drug group is higher than the population mean for the control group.

This argument is misguided. Intuitively, there are strong reasons for inferring that the direction of the difference in the population is the same as the difference in the sample. There is also a more formal argument. A nonsignificant effect might be described as follows:

Although subjects in the drug group scored higher (M = 23) than did subjects in the control group, (M = 20), the difference between means was not significant, t(18) = 1.4, p = .179.

It would not have been correct to say that there was no difference between the performance of the two groups. There was a difference. It is just that the difference was not large enough to rule out chance as an explanation of the difference. It would also have been incorrect to imply that there is no difference in the population. Be sure not to accept the null hypothesis.

We may use this example to further help you understand the application of hypothesis testing. The first step in hypothesis testing is to specify the null hypothesis and the alternate hypothesis. In testing hypotheses about , the null hypothesis is a hypothesized value of . Suppose the mean score of all 10-year old children on an anxiety scale were 70. If a researcher were interested in whether 10-year old children with alcoholic parents had a different mean score on the anxiety scale, then the null and alternative hypotheses would be:

H0: alcoholic = 70
H1: alcoholic 70

(2) The second step is to choose a significance level. Assume the .05 level is chosen.

(3) The third step is to compute the mean. Assume M = 8.1.

(4) The fourth step is to compute p, the probability (or probability value) of obtaining a difference between M and the hypothesized value of (7.0) as large or larger than the difference obtained in the experiment. Applying the general formula to this problem,

The sample size (N) and the population standard deviation (s) are needed to calculate .

Assume that N = 16 and s= 2.0. Then, and

A z table can be used to compute the probability value, p = .028. A graph of the sampling distribution of the mean is shown below. The area 8.1 - 7.0 = 1.1 or more units from the mean is shaded. The shaded area is .028 of the total area.

 

(5) The probability computed in Step 4 is compared to the significance level stated in Step 2. Since the probability value (.028) is less than the significance level (.05) the effect is statistically significant.

(6) Since the effect is significant, the null hypothesis is rejected. It is concluded that the mean anxiety score of 10-year-old children with alcoholic parents is higher than the population mean.

(7) The results might be described in a report as follows:

The mean score of children of alcoholic parents (M = 8.1) was significantly higher than the population mean ( = 7.0), z = 2.2, p = .028.


SUMMARY OF COMPUTATIONAL STEPS:
1. Specify the null hypothesis and an alternative hypothesis.
2. Compute M = sigmaX/N.
3. Compute = .
4. Compute where M is the sample mean and is the hypothesized value of the population.
5. Use a z table to determine p from z.

ASSUMPTIONS ARE:
1. Normal distribution
2. Scores are independent
3. s is known.

I will attempt to explain this in terms that should make clear sense to exactly what are use for. Hypothesis tests are procedures for making rational decisions about the reality of effects.

Rational Decisions

Most decisions require that an individual select a single alternative from a number of possible alternatives. The decision is made without knowing whether or not it is correct; that is, it is based on incomplete information. For example, a person either takes or does not take an umbrella to school based upon both the weather report and observation of outside conditions. If it is not currently raining, this decision must be made with incomplete information.

A rational decision is characterized by the use of a procedure which insures the likelihood or probability that success is incorporated into the decision-making process. The procedure must be stated in such a fashion that another individual, using the same information, would make the same decision.

One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is stranded on a planet without his communicator and is unable to get back to the Enterprise. Spock has assumed command and is being attacked by Klingons (who else). Spock asks for and receives information about the location of the enemy, but is unable to act because he does not have complete information. Captain Kirk arrives at the last moment and saves the day because he can act on incomplete information. This story goes against the concept of rational man. Spock, being the ultimate rational man, would not be immobilized by indecision. Instead, he would have selected the alternative which realized the greatest expected benefit given the information available. If complete information were required to make decisions, few decisions would be made by rational men and women. This is obviously not the case. The script writer misunderstood Spock and rational man.

Effects

When a change in one thing is associated with a change in another, we have an effect. The changes may be either quantitative or qualitative, with the hypothesis testing procedure selected based upon the type of change observed. For example, if changes in salt intake in a diet are associated with activity level in children, we say an effect occurred. In another case, if the distribution of political party preference (Republicans, Democrats, or Independents) differs for sex (Male or Female), then an effect is present. Much of the behavioral science is directed toward discovering and understanding effects. The effects discussed in the remainder of this text appear as various statistics including: differences between means, contingency tables, and correlation coefficients.

GENERAL PRINCIPLES

All hypothesis tests conform to similar principles and proceed with the same sequence of events.

Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.

An analogous situation exists with respect to hypothesis testing in statistics. In hypothesis testing one wishes to show real effects of an experiment. By showing that the experimental results were unlikely, given that there were no effects, one may decide that the effects are, in fact, real. The hypothesis that there were no effects is called the NULL HYPOTHESIS. The symbol H0 is used to abbreviate the Null Hypothesis in statistics. Note that, unlike geometry, we cannot prove the effects are real, rather we may decide the effects are real.

For example, suppose the following probability model (distribution) described the state of the world. In this case the decision would be that there were no effects; the null hypothesis is true.

Event A might be considered fairly likely, given the above model was correct. As a result the model would be retained, along with the NULL HYPOTHESIS. Event B on the other hand is unlikely, given the model. Here the model would be rejected, along with the NULL HYPOTHESIS.