The Central Limit Theorem is a random sample of n observations drawn from a population with finite mean u and variance sigma square, then, when n is sufficiently large, the sampling distribution of the sample mean y can be approximated by a normal density function. For example: let y1, y2…. yn be a random sample from any unknown population (i.e. the distribution representing it is unknown) with population mean u (i.e. the unknown distribution and variance sigma square has mean u and variance sigma square, then for large n (n > 30), the probability distribution of y bar, the sampling distribution of y bar or the relative frequency distribution of y bar is approximately normal and has mean equal to the population mean (i.e. mean of y bar = E (y bar) = u = pop mean) has a variance equal to the population variance divided by n (i.e. variance of y bar equal sigma square divided n by n equal population variance divided by n) the standard deviation of y bar equal sigma divided by the square root of n equal the population standard deviation divided by the square root of n.
The importance of the central limit theorem is that we can use the normal distribution to approximate the sampling distribution of the sample mean y bar as long as the population possesses a finite mean and variance, and the number n of measurements in the sample is sufficiently large. How large the sample size must be will depend on the nature of the sampled population. The sampling distribution of any linear function of normally distributed random variables, even those that are correlated and have different means and variance, is a normal distribution.
Weak Law of Large Numbers:
If x1, x2,... are independent, identically distributed random quantities, with
expected value E [xi] = u, then the mean of the first N quantities converges in
probability to u, as N grows without bound.
Strong Law of Large Numbers:
Under the same hypothesis, the mean of the first N quantities converges almost
surely to u, as N grows without bound.
A fundamental law in probability theory and statistics stating that if an event or probability p is observed repeatedly during independent repetitions the proportion of the observed frequency of that event to the number of repetitional converges towards p as the number of repetitions become large. The law also provides the basis for evaluating the power of statistical tests including the significance of quantities of information.
The Law of Large Numbers says that in repeated, independent trials with the same probability p of success in each trial, the chance that the percentage of successes differs from the probability p by more than a fixed positive amount, e > 0, converges to zero as the number of trials n goes to infinity, for every positive e. Note two things:
The difference between the number of successes and the number of trials times the chance of success in each trial (the expected number of successes) tends to grow as the number of trials increases. (In fact, this difference tends to grow like the square root of the number of trials.)
Although the chance of a large difference between the percentage of successes and the chance of success gets smaller and smaller as n grows, nothing prevents the difference from being large in some sequences of trials. The assumption that this difference always tends to zero, as opposed to this difference having a large probability of being arbitrarily close to zero, is the difference between the Law of Large Numbers which is a mathematical theorem, and the Empirical Law of Averages, which is an assumption about how the world works that lies at the base of the Frequency Theory of probability. The distribution of the number of successes in n independent trials with probability p of success in each trial is Binomial, with parameters n and p.
The Exponential Probability Distribution is a gamma density
function with a = 1:
f(y) = (e^ -y/b)/b (0 < y < infinity)
with mean and variance
m = b s^2 = b^2
Example:
From past experience, a manufacturer knows that the relative frequency distribution of the length of time (in months) between major customers product complaints can be modeled by a gamma density function with a = 2 and b = 4. Fifteen months after the manufacturer tightened its quality control requirements, the first complaint arrived. Does this suggest that the mean time between major customer complaints may have increased?
Solution:
We want to determine whether the observed value of y = 15 months, or some larger value of y, would be improbable if, in fact, a = 2 and b = 4. We do not give a table of areas under the gamma density function in this text, but we can obtain some idea of the magnitude of P (y > 15) by calculating the mean and standard deviation for the gamma density function when a = 2 and b = 4. Thus,
m = ab = (2)(4) = 8
s^2 = ab^2 =
(2)(4)^2 = 32
s = 5.7
Since y = 15 months lies barely more than 1 standard deviation beyond the mean (m + s = 8 + 5.7 = 13.7 months), we would not regard 15 months as an unusually large value of y. Consequently, we would conclude that there is insufficient evidence to indicate that the company’s new quality control program has been effective in increasing the mean time between complaints.
The most general normal distribution is

where:

The normal
distribution:
Is center-symmetric
Is normalized (area
under curve equals one)
Has a most probable
value of xmp=m
Has a width
indicated by s
The normal
distribution is often given in terms of the generalized parameter, z

where:

Combined, these two formulas are equivalent to the general form.
Estimates Using the Normal Distribution
When we calculate the average and standard deviation from our data set, we are estimating the parameters of the parent population

In this case

Areas under the normal distribution tell us the probability of occurrence. Areas are tabulated as a function of z (commonly found in tables). The area is that of the integral equation

Normal
Distribution

Normal Distribution
1. A normal distribution is bell-shaped.
2. It is a symmetric distribution where the mean, median, and mode all coincide.
3. In the population, many variables such as height and weight have distributions that are approximately normal.
4. Although normal distributions can have different means and variances, the distribution of cases about the mean is always the same.
5. The standard normal distribution allows us to locate an observation within a distribution. This distribution has a mean of 0 and a standard deviation of 1.
The log normal distribution has density
f(x) = 1/(sqrt(2 pi) sigma x) e^-((log x - mu)^2 / (2 sigma^2))
where &mu and &sigma are the mean and
standard deviation of the logarithm.
Log Normal Distribution
The log normal distribution has density
f(x) = 1/(sqrt(2 pi) sigma x) e^-((log x - mu)^2 / (2 sigma^2))
where &mu and &sigma are the mean and standard deviation of the logarithm
The Weibull Distribution
The two-parameter Weibull distribution is often used to characterize wind regimes because it has been found to provide a good fit with measured wind data. The probability density function is given by the following equation:
![]()
where v is the wind speed, k is a unit less shape factor, and c is a scale parameter with the same units as v. The cumulative distribution function is given by the following equation:
![]()
The two parameters c and k are related to the average wind speed by the following relation:
![]()
where G is the gamma function.
Any Weibull distribution can therefore be described by the average wind speed and the Weibull k value. As shown in Figure 1, lower k values correspond to broader distributions.

Figure 1: The probability density function of the Weibull distribution for three values of the shape factor k. For each case, the average wind speed is 6 m/s.
Figure 2 shows a typical distribution of wind speeds measured at Boston, Massachusetts, as well as the Weibull distribution that best fits the measured data. The Weibull k value has been calculated for several locations in the U.S.

Figure 2: The distribution of wind speeds measured at Boston, Massachusetts and the best-fit Weibull distribution, with an average of 5.4 m/s and a k value of 2.4.
The Gamma Distribution Function
-kX
---
k k k-1 u
(---) X e
u
f(X) = ------------------
Gamma(k)
k - the order of the gamma distribution
k = 1 -> f(x) = exponential distribution
k = inf -> f(x) = distribution of no variance
k = integer -> "f (x) is the distribution that results from creating a spike train of rate k/u with a Poisson process and then deleting all but every kth spike."
An example plot:

Poisson Distribution
The Poisson Distribution is a discrete distribution that represents the number of trials for a random experiment. (How many times something happens is unknown)
Number of orders your company receives in a week.
Number of defects a product has.
The Poisson Distribution has only one parameter, m, the mean. The standard deviation is the square root of the mean.
The shape of the distribution depends on the size of the mean. If the mean is large the distribution is approximately normal. For small means the distribution is skewed toward the lower values.

For the Poisson distribution we are usually trying to find out the probability of something happening. The formula that we will use is listed in the box below.
|
P(X=a) = |
Example 1 On average each car has three defects. Given that the number of defects follows a Poisson distribution, what is the probability that the car will be defect free?
m = 3 and a = 0
Pr(X=0) = e-3[30/0!]
The part in [] = 1 so Pr(X=0) = e-3 = .05
Example 2 Using the information above, find the probability that there will be no more than one defect.
We can have either 0 or 1 defect.
Pr(X=0) +Pr(X=1) = .05 +
e-3[31/1!] = .05 +.05*3 = .05 +.15 = .2
Example 3 While working at Foley Library, you find that an average of 30 people per hour check out books during exam week. What is the probability that no one arrives in the next 10 minutes?
We have a unit
problem. Our mean is given in hours, but the question is in
minutes. We need to convert the mean from hours to minutes. If 30
people arrive per hour, then on average 5 people will arrive every ten
minutes. So, our mean is 5. We want Pr(X=0) =e-5*(50/0!)
=.007 or 0.7%.
A Uniform Distribution is one for which the probability of occurrence is the same for all values of X. It is sometimes called a rectangular distribution. For example, if a fair die is thrown, the probability of obtaining any one of the six possible outcomes is 1/6. Since all outcomes are equally probable, the distribution is uniform. If a uniform distribution is divided into equally spaced intervals, there will be an equal number of members of the population in each interval.
If we want a
random variate X uniformly distributed on the interval [a,b], a
reasonable guess for generating X is given by
![]()
where R is
uniformly distributed on (0,1).
If we follow the
steps outlined in previous section, we get the same result.
pdf for X
![]()
steps:
Step 1.
the cdf

Step 2.
Set F(X)
= (X - a)/(b - a) = R
Step 3.
Solve for X
is terms of R yields
![]()
which is the same
as the earlier guess.
Step 4.
Generate
as
needed, calcualte
using
the function obtained.
The Erlang distribution, similar to distributions like the Exponential,
Normal, Uniform, Weibull, etc, is a distribution representing a continuous
random variable. The distribution is named after Agner Krarup Erlang, a Danish
mathematician who worked for the Copenhagen Telephone Company on problems
including loss and waiting times in telephone calls. A.K. Erlang's first paper “The
theory of probability and telephone conversations” in 1909 describes the
problems in detail. First some of the characteristics and properties of the
Erlang distribution are explained. The Erlang distribution is characterized by
two parameters k and l, and hence it is generally described as
the Erlang (k,l) distribution.
Let X be a
random variable that is distributed according to an Erlang (k,l) distribution.
Then the cumulative distribution function (CDF) of X, F (x), is given by

The probability
density function (PDF), f(x), is given by

Note that for the
special case of k = 1, the Erlang distribution reduces to the
exponential distribution. A graph of f(x) is depicted in Figure 1 for l
= 30 and k = 12.

Figure 1: The PDF of
an Erlang (k,l) random variable
The Laplace
Satieties Transform of X has a closed-form and is given by
.
The mean of the
random variable X is

and the variance of X
is
.
The hazard rate
function h(x) is

Note that h(x) is an increasing function of x. Therefore X is said to be a random variable with an increasing hazard rate. That implies if X represents the processing times then it means that as the processing continues, there is an increasing probability of immediate process completion. Similarly if X represents lifetimes then the probability of dying immediately increases with age. Many other distributions do not posses the increasing hazard rate property. Therefore the Erlang distribution is generally an appropriate distribution for modeling processing times and lifetimes. Another useful result is the sum of Erlang random variables. Consider n independent random variables, X1, X2, ..., Xn, such that Xi (for i = 1,2,...,n) is distributed according to an Erlang(Ki ,l) distribution. Define the random variable Z as Z = X1 + X2 + . . . +Xn. Then Z is distributed according to an Erlang(K,l) distribution, where K = K1 + K2 + . . . + Kn.
The average value, computed by dividing the sum of a set of values by the number of values in the set. Though the mean is the mathematical average of a set of numbers, mean, median and mode may all be used in various contexts to indicate an average, typical or likely condition; however, they differ. Example: In the number set 1, 2, 2, 4, 5, 8, 13, the mean is 5 (35 ÷ 7); the median is 4 (the middle value); and the mode is 2 (the most frequent value).
The median is a
statistical term for the middle or most central value in a set of numbers
halfway between two outermost points.
Standard
Deviation
Standard Deviation: A measure of the dispersion among the elements in a set of data.
Standard deviation can be defined as follows:

Example:
Problem: The grades on test #1 were as follows:
Grades = [80 85 79 90 95 98 92]
Find the average (population mean) and the standard deviation.
Solution:

Further Applications with Standard Deviation
Problem: Using the above values for mean and standard
deviation, calculate the percentage of the data points that are within one standard
deviation of the mean.
Mean = 88
Standard Deviation = 7.32
Solution: With the information previously attained, an upper and lower bound for the mean can be established by simply adding and subtracting the standard deviation from the mean.
Upper Bound = 88 + 7.32 = 95.3
Lower Bound = 88 - 7.32 = 80.7
Four of the seven data items will fall within this range of plus or minus one standard deviation. Therefore,
(4 Items Inside/ 7 Total Items) * 100 % = 58 %
It can therefore be said that approximately 60% of the data lies within + / - 7.32 from the average.
One measure of variability that makes use of the squared deviations is the variance. The variance is the approximate average of the squared deviations of observed scores from their mean. The variance of a sample is represented by (s) squared. It is calculated as follows:
![]()
Covariance
The covariance is defined
![]()
The bilinearity of the covariance on random variables underlies the bilinearity of the covariance on random vectors.
Whenever A and B are fixed conformable linear transformations, we have the sandwich rule:
![]()
To prove this use the properties of expectation

The covariance also satisfies
![]()
Cumulative Distribution Function
of a discrete Random Variable is:
![]()
Cumulative Distribution Function of a continuous Random Variable is:

Probability Density
Probability density is the probability that x lies in an interval between x and x + dx
Probability density is a continuous function
The expected value of a variable is the long-run average value of that variable. The expected value of a statistic is therefore the mean of the sampling distribution of the statistic. If the expected value of a statistic is the parameter the statistic is estimating, the statistic is an unbiased estimate of the parameter. Expected values of variables are indicated by an "E" with the variable enclosed in brackets. Thus, E[X] is read as the expected value of X.
Some basic rules of expected values are shown below:
1) E[X] = m where m is the mean of X.
2)
where s2 is the variance of X.
3)![]()
4) E[X + Y] = E[X] + E[Y]
5) E[XY] = E[X]E[Y] if X and Y are independent
6) In general, E[X/Y] does not equal E[X]/E[Y]
The procedure of obtaining the best fit to a given data set is called
regression.
Consider the linear
curve fitting first, with the equation:
![]()
to be fit through
m data points, that is (x1, y1), (x2, y2), ..., (xm, ym). The deviation of a
point (xi, yi) from that calculated from the linear equation is:
![]()
The least-squares
problem is then to find the values of a0 and a1, so as to
![]()
This yields two
equations:


These equations
can be written in the matrix form as:

This equation
form can be directly extended to n dimension.
The above
derivation is for a linear regression problem. Equations such as y=b x^r can be
linearized by using:
![]()
and setting
![]()
we have a
linearized equation:
![]()
Analysis of variance is a method for testing hypotheses about means. It is the most widely-used method of statistical inference for the analysis of experimental data. The basic idea is to compare alternative estimates of the population variance for the purpose of deciding whether the groups have different means. The alternative estimates are predicated on the notion that the mean for certain groups within the population is the same. Under that working supposition there should be no difference in variance estimates constructed from the different sample groups corresponding to the grouping in the population.
The four most common graphical methods are:
Histograms: are essentially bar graphs in which the categories are classes.

Frequency Histograms: the heights of the bars are determined by the class frequency

Bar charts: give the frequency (or relative frequency) corresponding to each category, with the height or length of the bar proportional to the category frequency (or relative frequency).

charts (pie charts): Circle divide a complete circle (a pie) into slices, one corresponding to each category, with the central angle of the slice proportional to the category relative frequency.

A statistical measure of goodness of fit, independence, or homogeneity. The Chi-square test can be used to determine whether a sample of data was drawn from a normally distributed population by comparing the sample's frequency distribution with the normal distribution. It can also be used to determine whether two variables are independent by comparing their observed joint occurrence with their expected joint occurrence, assuming independence. Finally, it can be used to determine whether or not categories of a single variable are represented in the same proportions in two or more populations.
If N
measurements yi are compared to some model or theory
predicting values gi, and if the measurements are assumed
normally distributed around gi, uncorrelated and with
variances
, then the
sum
![]()
follows the
(chi-square)
distribution with N degrees of freedom. The
test is
said to have the significance level
if the sum
above is equal to the quantile
of the
distribution:
![]()
Integral curves
for the
distribution
exist in computer libraries or are tabulated in the literature. Note that the
test may express little about the inherent assumptions; wrong hypotheses or
measurements can, but need not cause large
's. The
correct statement to make about a measured s is ``
is the
probability of finding a
as large as
s or larger''.
A t test is designed to determine the significance of the difference in means of some measure from two groups. The t test is not appropriate for mean gains but only raw mean scores. There are two kinds of T-tests. One is for non-paired and the other is for paired data. The T-test is will only work when the there are no more than two classifications or groups of the nominal variable (male-female, Democrat-Republican, in program-not in program.)
The formula for the t test is below:

The F-test is used to test for differences among sample variance
The formula for F is:

In comparing two
independent samples of size N1 and N2 the
F Test provides a measure for the probability that they have the same
variance. The estimators of the variance are s12
and s22. We define as test statistic their ratio T
= s12/ s22, which
follows an F Distribution with f1= N1-1 and
f2= N2-1 degrees of freedom. One can
formulate the F test for three different hypotheses, defined by:
