**Hypothesis Testing**

Hypothesis
testing is a form of statistical inference that uses data from a sample to draw
conclusions about a population parameter
or a population probability distribution. First, a tentative assumption is made
about the parameter or distribution. This assumption is called the null
hypothesis and is denoted by *H*_{0}. An alternative hypothesis
(denoted *H _{a}*), which is the opposite of what is stated in the
null hypothesis, is then defined. The hypothesis-testing procedure involves
using sample data to determine whether or not

For example,
assume that a radio station selects the music it plays based on the assumption
that the average age of its listening audience is 30 years. To determine
whether this assumption is valid, a hypothesis test could be conducted with the
null hypothesis given as *H*_{0}: =
30 and the alternative hypothesis given as *H _{a}*: 30.
Based on a sample of individuals from the listening audience, the sample mean
age, ,
can be computed and used to determine whether there is sufficient statistical
evidence to reject

Ideally, the
hypothesis-testing procedure leads to the acceptance of *H*_{0}
when *H*_{0} is true and the rejection of *H*_{0}
when *H*_{0} is false. Unfortunately, since hypothesis tests are
based on sample information, the possibility of errors must be considered. A
type I error corresponds to rejecting *H*_{0} when *H*_{0}
is actually true, and a type II error corresponds to accepting *H*_{0}
when *H*_{0} is false. The probability of making a type I error is
denoted by ,
and the probability of making a type II error is denoted by .

In using the
hypothesis-testing procedure to determine if the null hypothesis should be
rejected, the person conducting the hypothesis test specifies the maximum
allowable probability of making a type I error, called the level of
significance for the test. Common choices for the level of significance are =
0.05 and =
0.01. Although most applications of hypothesis testing control the probability
of making a type I error, they do not always control the probability of making
a type II error. A graph known as an operating-characteristic curve can be
constructed to show how changes in the sample size affect the probability of
making a type II error.

A concept
known as the *p*-value provides a convenient basis for drawing conclusions
in hypothesis-testing applications. The *p*-value is a measure of how
likely the sample results are, assuming the null hypothesis is true; the
smaller the *p*-value, the less likely the sample results. If the *p*-value
is less than ,
the null hypothesis can be rejected; otherwise, the null hypothesis cannot be
rejected. The *p*-value is often called the observed level of significance
for the test.

A hypothesis
test can be performed on parameters of one or more populations as well as in a
variety of other situations. In each instance, the process begins with the
formulation of null and alternative hypotheses about the population. In
addition to the population mean, hypothesis-testing procedures are available
for population parameters such as proportions, variances, standard deviations,
and medians.

Hypothesis
tests are also conducted in regression and correlation analysis to determine if
the regression relationship and the correlation coefficient are statistically
significant. A goodness-of-fit test refers to a hypothesis test in which the
null hypothesis is that the population has a specific probability distribution,
such as a normal probability distribution. Nonparametric statistical methods
also involve a variety of hypothesis-testing procedures.

Now that we have the foundation for
what a hypothesis test is lets understand the method in which we use it. The first step in hypothesis testing is to
specify the null hypothesis (H_{0}) and the alternative hypothesis (H_{1}).
If the research concerns whether one method of presenting pictorial stimuli
leads to better recognition than another, the null hypothesis would most likely
be that there is no difference between methods (H_{0}: µ_{1 }-
µ_{2} = 0). The alternative hypothesis would be H_{1}: µ_{1}
µ_{2}.
If the research concerned the correlation between grades and SAT scores, the
null hypothesis would most likely be that there is no correlation (H_{0}:
r= 0). The
alternative hypothesis would be H_{1}: r 0.

2. The next step is to select a significance level. Typically the .05 or the .01 level is used.

3. The third step is to calculate a statistic analogous
to the parameter specified by the null hypothesis. If the null hypothesis were
defined by the parameter µ_{1}- µ_{2}, then the statistic M_{1}
- M_{2} would be computed.

4. The fourth step is to calculate the probability value (often called the p value) which is the probability of obtaining a statistic as different or more different from the parameter specified in the null hypothesis as the statistic computed from the data. The calculations are made assuming that the null hypothesis is true.

5. The probability value computed in Step 4 is compared with the significance level chosen in Step 2. If the probability is less than or equal to the significance level, then the null hypothesis is rejected; if the probability is greater than the significance level then the null hypothesis is not rejected. When the null hypothesis is rejected, the outcome is said to be "statistically significant"; when the null hypothesis is not rejected then the outcome is said be "not statistically significant."

6. If the outcome is statistically significant,
then the null hypothesis is rejected in favor of the alternative hypothesis. If
the rejected null hypothesis were that µ_{1}- µ_{2} = 0, then
the alternative hypothesis would be that µ_{1}
µ_{2}. If M_{1} were greater than M_{2} then the
researcher would naturally conclude that µ_{1}
µ_{2}.

7. The final step is to describe the result and the statistical conclusion in an understandable way. Be sure to present the descriptive statistics as well as whether the effect was significant or not. For example, a significant difference between a group that received a drug and a control group might be described as follow:

Subjects in the drug group scored significantly higher (M = 23) than did subjects in the control group (M = 17), t(18) = 2.4, p = 0.027.

The statement that "t(18) =2.4" has to do
with how the probability value (p) was calculated. A small minority of
researchers might object to two aspects of this wording. First, some believe
that the significance level rather than the probability level should be
reported. So since the alternative hypothesis was stated as µ_{1} µ_{2},
some might argue that it can only be concluded that the population means differ
and not that the population mean for the drug group is higher than the
population mean for the control group.

This argument is misguided. Intuitively, there are strong reasons for inferring that the direction of the difference in the population is the same as the difference in the sample. There is also a more formal argument. A nonsignificant effect might be described as follows:

Although subjects in the drug group scored higher (M = 23) than did subjects in the control group, (M = 20), the difference between means was not significant, t(18) = 1.4, p = .179.

It would not have been correct to say that there was no difference between the performance of the two groups. There was a difference. It is just that the difference was not large enough to rule out chance as an explanation of the difference. It would also have been incorrect to imply that there is no difference in the population. Be sure not to accept the null hypothesis.

We may use this example to further help you understand the application of hypothesis testing. The first step in hypothesis testing is to specify the null hypothesis and the alternate hypothesis. In testing hypotheses about µ, the null hypothesis is a hypothesized value of µ. Suppose the mean score of all 10-year old children on an anxiety scale were 70. If a researcher were interested in whether 10-year old children with alcoholic parents had a different mean score on the anxiety scale, then the null and alternative hypotheses would be:

H_{0}: µ_{alcoholic}
= 70

H_{1}: µ_{alcoholic} 70

(2) The second step is to choose a significance level. Assume the .05 level is chosen.

(3) The third step is to compute the mean. Assume M = 8.1.

(4) The fourth step is to compute p, the
probability (or probability value) of obtaining a difference between M and the
hypothesized value of µ (7.0) as large or larger than the difference obtained
in the experiment. Applying the general formula to this problem,

The sample size (N) and the population standard deviation (s) are needed to calculate .

Assume that N = 16 and s= 2.0. Then, and

A z table can be used to compute the probability value, p = .028. A graph of
the sampling distribution of the mean is shown below. The area 8.1 - 7.0 = 1.1
or more units from the mean is shaded. The shaded area is .028 of the total
area.

(5) The probability computed in Step 4 is compared to the significance level stated in Step 2. Since the probability value (.028) is less than the significance level (.05) the effect is statistically significant.

(6) Since the effect is significant, the null hypothesis is rejected. It is concluded that the mean anxiety score of 10-year-old children with alcoholic parents is higher than the population mean.

(7) The results might be described in a report as follows:

The mean score of children of alcoholic parents (M = 8.1) was significantly higher than the population mean ( µ= 7.0), z = 2.2, p = .028.

**SUMMARY OF COMPUTATIONAL STEPS:**

1. Specify the null hypothesis and an alternative hypothesis.

2. Compute M = X/N.

3. Compute = .

4. Compute where
M is the sample mean and µ is the hypothesized value of the population.

5. Use a z table to determine p from z.

**ASSUMPTIONS ARE:**

1. Normal distribution

2. Scores are independent

3. s is
known.

I will attempt to explain this in terms that should make clear sense to exactly what are use for. Hypothesis tests are procedures for making rational decisions about the reality of effects.

Most decisions require that an individual select a single alternative from a number of possible alternatives. The decision is made without knowing whether or not it is correct; that is, it is based on incomplete information. For example, a person either takes or does not take an umbrella to school based upon both the weather report and observation of outside conditions. If it is not currently raining, this decision must be made with incomplete information.

A rational decision is characterized by the use of a procedure which insures the likelihood or probability that success is incorporated into the decision-making process. The procedure must be stated in such a fashion that another individual, using the same information, would make the same decision.

One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is stranded on a planet without his communicator and is unable to get back to the Enterprise. Spock has assumed command and is being attacked by Klingons (who else). Spock asks for and receives information about the location of the enemy, but is unable to act because he does not have complete information. Captain Kirk arrives at the last moment and saves the day because he can act on incomplete information. This story goes against the concept of rational man. Spock, being the ultimate rational man, would not be immobilized by indecision. Instead, he would have selected the alternative which realized the greatest expected benefit given the information available. If complete information were required to make decisions, few decisions would be made by rational men and women. This is obviously not the case. The script writer misunderstood Spock and rational man.

When a change in one thing is associated with a change in another, we have an effect. The changes may be either quantitative or qualitative, with the hypothesis testing procedure selected based upon the type of change observed. For example, if changes in salt intake in a diet are associated with activity level in children, we say an effect occurred. In another case, if the distribution of political party preference (Republicans, Democrats, or Independents) differs for sex (Male or Female), then an effect is present. Much of the behavioral science is directed toward discovering and understanding effects. The effects discussed in the remainder of this text appear as various statistics including: differences between means, contingency tables, and correlation coefficients.

All hypothesis tests conform to similar principles and proceed with the same sequence of events.

- A model of the world is created in which there are no effects. The experiment is then repeated an infinite number of times.
- The results of the experiment are compared with the model of step one. If, given the model, the results are unlikely, then the model is rejected and the effects are accepted as real. If, the results could be explained by the model, the model must be retained. In the latter case no decision can be made about the reality of effects.

Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.

An analogous situation exists with
respect to hypothesis testing in statistics. In hypothesis testing one wishes
to show real effects of an experiment. By showing that the experimental results
were unlikely, given that there were no effects, one may decide that the
effects are, in fact, real. The hypothesis that there were no effects is called
the **NULL HYPOTHESIS**. The symbol H_{0} is used to abbreviate the
Null Hypothesis in statistics. Note that, unlike geometry, we cannot prove the
effects are real, rather we may decide the effects are real.

For example, suppose the following probability model (distribution) described the state of the world. In this case the decision would be that there were no effects; the null hypothesis is true.

Event A might be considered fairly likely, given the above model was correct. As a result the model would be retained, along with the NULL HYPOTHESIS. Event B on the other hand is unlikely, given the model. Here the model would be rejected, along with the NULL HYPOTHESIS.