The Central Limit Theorem is a statement about the characteristics of the sampling distribution of means of random samples from a given population. That is, it describes the characteristics of the distribution of values we would obtain if we were able to draw an infinite number of random samples of a given size from a given population and we calculated the mean of each sample.

The Central Limit Theorem consists of three statements:

[1] The mean of the sampling distribution of means is equal to the mean of the population from which the samples were drawn.

[2] The variance of the sampling distribution of means is equal to the variance of the population from which the samples were drawn divided by the size of the samples.

[3] If the original population is distributed normally (i.e. it is bell shaped), the sampling distribution of means will also be normal. If the original population is not normally distributed, the sampling distribution of means will increasingly approximate a normal distribution as sample size increases. (That is, when increasingly large samples are drawn)

     The accompanying figure illustrates the statements of the central limit theorem both algebraically and graphically.

Image for Central Limit Theorem

 

   The central limit theorem is one of the most remarkable results of the theory of probability. In its simplest form, the theorem states that the sum of a large number of independent observations from the same distribution has, under certain general conditions, an approximate normal distribution. Moreover, the approximation steadily improves as the number of observations increases. The theorem is considered the heart of probability theory, although a better name would be normal convergence theorem.
     For example, suppose an ordinary coin is tossed 100 times and the number of heads is counted. This is equivalent to scoring 1 for a head and 0 for a tail and computing the total score. Thus, the total number of heads is the sum of 100 independent, identically distributed random variables. By the central limit theorem, the distribution of the total number of heads will be, to a very high degree of approximation, normal. This illustrated graphically by repeating this experiment many times. The results of this experiment are displayed in a diagram. The percentage computed over the number of experiments is arranged along the vertical axis, and the total score or the number of heads is arranged along the horizontal axis. After a large number of repetitions a curve appears that looks like the normal curve.
     It has been empirically observed that various natural phenomena, such as the height of individuals, follow approximately a normal distribution. A suggested explanation is that these phenomena are sums of a large number of independent random effects and hence are approximately normally distributed by the central limit theorem.

 

 

 

approximation of the binomial distribution by the normal distribution

 

 

 

 

 

 

 

 

 

 

     

 

.  

 

Histogram illustrating the Central Limit Theorem

 

     The following will explain the Central Limit Theorem in detail.

N.B.: The above assumes that the sample is randomly drawn from the population.

Examples.

1.      If weights are normally distributed with mean μ = 145 and standard deviation = 30, what is the probability that the mean of a sample of twelve weights [¯x] is between 150 and 175?

Solution.

[¯x] is approximately normally distributed with mean μ [¯x] = 145 and standard deviation σ [¯x] = 30/[√12]. Therefore we form z = (150-145)/(30/[√12]) = .58 and z = (175-145)/(30/[√12]) = 3.46. From the normal table, the area to the left of 3.46 is .9997, the area to the le ft of .58 is .7190, hence the area between those two z-scores is .9997-.7190 =. 2807. This is the probability that [¯x] is between 150 and 175.                                         

  1. The National Institute for Occupational Safety and Health (NIOSH) recently completed a study to evaluate the level of exposure of workers to the chemical dioxin, 2,3,7,8-TCDD. The distribution of TCDD levels in parts per trillion (ppt) of production workers at a Newark, New Jersey, chemical plant had a mean of 293 ppt and a standard deviation of 847 ppt (Chemosphere, Vol. 20, 1990). In a random sample of n = 50 workers selected at the New Jersey plant let y-bar represent the sample mean TCDD level. Find the mean and standard deviation of the sampling distribution y-bar.

Solution.

  E [y-bar] = μ = 293 ppt.

  σ [y-bar] = σ/√n = 847/√50 = 119.783

  1. An article in Industrial Engineering (Aug. 1990) discussed the importance of modeling machine downtime correctly in simulation studies. As an illustration, the researcher considered a single-machine-tool system with repair times (in minutes) that can be modeled by gamma distribution with parameters ά = 1 and β = 60. Of interest is the mean repair time, y-bar, of a sample of 100 machine breakdowns. Find E (y-bar) and Var (y-bar). Also what probability distribution provides the best model of the sampling distribution of y-bar?

Solution.

E (y-bar) = μ  = 60  

 Var (y-bar) = (60/√100) 2 = 36

It is expected to be normal distribution because n is large.

 

Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped. Examples of normal distributions are shown below. Notice that they differ in how spread out they are. The area under each curve is the same. The height of a normal distribution can be specified mathematically in terms of two parameter: the mean (m) and the standard deviation (s).

                                                                                                              

 

 

 

 

 

 

 

 

Normal Distribution Curves.

     The height (ordinate) of a normal curve is defined as: where m is the mean and s is the standard deviation, p is the constant 3.14159, and e is the base of natural logarithms and is equal to 2.718282.
x can take on any value from -infinity to +infinity.

f(x) is very close to 0 if x is more than three standard deviations from the mean (less than -3 or greater than +3).

      The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. Normal distributions can be transformed to standard normal distributions by the formula:

where X is a score from the original normal distribution, m is the mean of the original normal distribution, and s is the standard deviation of original normal distribution. The standard normal distribution is sometimes called the z distribution. A z score always reflects the number of standard deviations above or below the mean a particular score is. For instance, if a person scored a 70 on a test with a mean of 50 and a standard deviation of 10, then they scored 2 standard deviations above the mean. Converting the test scores to z scores, an X of 70 would be:


So, a z score of 2 means the original score was 2 standard deviations above the mean. Note that the z distribution will only be a normal distribution if the original distribution (X) is normal.

Applying the formula will always produce a transformed variable with a mean of zero and a standard deviation of one. However, the shape of the distribution will not be affected by the transformation. If X is not normal then the transformed distribution will not be normal either. One important use of the standard normal distribution is for converting between scores from a normal distribution and percentile ranks.

Areas under portions of the standard normal distribution are shown to below. About .68 (.34 + .34) of the distribution is between -1 and 1 while about .96 of the distribution is between -2 and 2.

 

 

 

 

 

 

One reason the normal distribution is important is that many psychological and educational variables are distributed approximately normally. Measures of reading ability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close. A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality. Finally, if the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth from raw scores to percentiles.

     If the mean and standard deviation of a normal distribution are known, it is relatively easy to figure out the percentile rank of a person obtaining a specific score. To be more concrete, assume a test in Introductory Psychology is normally distributed with a mean of 80 and a standard deviation of 5. What is the percentile rank of a person who received a score of 70 on the test? Mathematical statisticians have developed ways of determining the proportion of a distribution that is below a given number of standard deviations from the mean. They have shown that only 2.3% of the population will be less than or equal to a score two standard deviations below the mean. In terms of the Introductory Psychology test example, this means that a person scoring 70 would be in the 2.3rd percentile.

 

 

 

This graph shows the distribution of scores on the test. The shaded area is 2.3% of the total area. The proportion of the area below 70 is equal to the proportion of the scores below 70.

     Similarly, the proportion of the area below 75 is the same as the proportion of scores below 75.

 

 

Mathematical statisticians have determined that 15.9% of the scores in a normal distribution are lower than a score 1 standard deviation below the mean. Since 75 is 1 standard deviation below the mean, the proportion of the scores below 75 is .159. Therefore, a person scoring 75 would have a percentile rank score of 15.9.
The table shown below gives the proportion of the scores below various values of z. z is computed with the formula where z is the number of standard deviations above the mean (m) the score X is. The standard deviation is s. When z is negative it means that X is below the mean. Thus, a z of -2 means that X is -2 standard deviations above the mean, which is the same thing as being +2 standard deviations below the mean. To take another example, what is the percentile rank of a person receiving a score of 90 on the test? The graph shows that most people scored below 90. Since 90 is 2 standard deviations above the mean [z = (90 - 80)/5 = 2] it can be determined from the table that a z score of 2 is equivalent to the 97.7th percentile: The proportion of people scoring below 90 is thus .977.

 

 

 

 

 

What score on the Introductory Psychology test would it have taken to be in the 75th percentile?
The answer is computed by reversing the steps in the previous problems. First, determine how many standard deviations above the mean one would have to be to be in the 75th percentile. This can be found by using a z table and finding the z associated with .75. The value of z is .674. Thus, one must be .674 standard deviations above the mean to be in the 75th percentile. Since the standard deviation is 5, one must be (5)(.674) = 3.37 points above the mean. Since the mean is 80, a score of 80 + 3.37 = 83.37 is necessary. Rounding off, a score of 83 is needed to be in the 75th percentile. Since , a little algebra demonstrates that X = m+ z s. For the present example, X = 80 + (.674)(5) = 83.37 as just shown.

 

 

 

 

 

 

 

Examples.

  1. If a test is normally distributed with a mean of 60 and a standard deviation of 10, what proportion of the scores are above 85?

Solution.

z = (x – μ)/σ = (85-60)/10 = 2.5

Using a z- table we calculate that .9938 of the scores are less than or equal to a score 2.5 standard deviations above the mean. It follows that only 1-.9938 = .0062 of the scores are above a score 2.5 standard deviations above the mean. Therefore, only .0062 of the scores are above 85.

 

  1. Suppose y is a normally distributed random variable with mean 10 and standard deviation 2.1. Find P(y≥11).

Solution

z = (x – μ)/σ = (11-10)/2.1 = .48

Using the z- table we calculate area = .1844, therefore P(y≥11) = .5-.1844 = .3156.

 

 

 

 

 

 

References:

Mendenhall, William, and Terry Sincich. Statistics for Engineering and the Sciences.                       4th   Ed. New Jersey: Prentice-Hall, 1995.                      

Berries Statistics Page “The Central Limit Theorem.” <http://huizen.dds.nl/~berrie/clt.html>. 20 Jan. 2001.

Hoffman, Howard S. “The Internet Glossary of Statistical Terms.” <http://www.animatedsoftware.com/statglos/sgcltheo.htm>. 20 Jan. 2001.

Campbell, John. “Lecture Summaries for 800 – 072: The Central Limit Theorem.” <http://spider.cns.uni.edu/~campbell/stat/clt.html>. 20 Jan. 2001.