The Central Limit Theorem is a random sample of n observations drawn from a population with finite mean u and variance sigma square, then, when n is sufficiently large, the sampling distribution of the sample mean y can be approximated by a normal density function. For example: let y1, y2…. yn be a random sample from any unknown population (i.e. the distribution representing it is unknown) with population mean u (i.e. the unknown distribution and variance sigma square has mean u and variance sigma square, then for large n (n > 30), the probability distribution of y bar, the sampling distribution of y bar or the relative frequency distribution of y bar is approximately normal and has mean equal to the population mean (i.e. mean of y bar = E (y bar) = u = pop mean) has a variance equal to the population variance divided by n (i.e. variance of y bar equal sigma square divided n by n equal population variance divided by n) the standard deviation of y bar equal sigma divided by the square root of n equal the population standard deviation divided by the square root of n.

The importance of the central limit theorem is that we can use the normal distribution to approximate the sampling distribution of the sample mean y bar as long as the population possesses a finite mean and variance, and the number n of measurements in the sample is sufficiently large. How large the sample size must be will depend on the nature of the sampled population. The sampling distribution of any linear function of normally distributed random variables, even those that are correlated and have different means and variance, is a normal distribution.

 

The Law of Large Numbers

 

Weak Law of Large Numbers:
If x1, x2,... are independent, identically distributed random quantities, with expected value E [xi] = u, then the mean of the first N quantities converges in probability to u, as N grows without bound.

 

Strong Law of Large Numbers:
Under the same hypothesis, the mean of the first N quantities converges almost surely to u, as N grows without bound.

A fundamental law in probability theory and statistics stating that if an event or probability p is observed repeatedly during independent repetitions the proportion of the observed frequency of that event to the number of repetitional converges towards p as the number of repetitions become large. The law also provides the basis for evaluating the power of statistical tests including the significance of quantities of information.

The Law of Large Numbers says that in repeated, independent trials with the same probability p of success in each trial, the chance that the percentage of successes differs from the probability p by more than a fixed positive amount, e > 0, converges to zero as the number of trials n goes to infinity, for every positive e. Note two things:

The difference between the number of successes and the number of trials times the chance of success in each trial (the expected number of successes) tends to grow as the number of trials increases. (In fact, this difference tends to grow like the square root of the number of trials.)

Although the chance of a large difference between the percentage of successes and the chance of success gets smaller and smaller as n grows, nothing prevents the difference from being large in some sequences of trials. The assumption that this difference always tends to zero, as opposed to this difference having a large probability of being arbitrarily close to zero, is the difference between the Law of Large Numbers which is a mathematical theorem, and the Empirical Law of Averages, which is an assumption about how the world works that lies at the base of the Frequency Theory of probability. The distribution of the number of successes in n independent trials with probability p of success in each trial is Binomial, with parameters n and p.

 

The Exponential Probability Distribution

 

The Exponential Probability Distribution is a gamma density function with a = 1:

            f(y) =  (e^ -y/b)/b (0 <  y <  infinity)

            with mean and variance

            m = b        s^2  =  b^2

 

Example:

 

            From past experience, a manufacturer knows that the relative frequency distribution of the length of time (in months) between major customers product complaints can be modeled by a gamma density function with a = 2 and b = 4. Fifteen months after the manufacturer tightened its quality control requirements, the first complaint arrived. Does this suggest that the mean time between major customer complaints may have increased?

 

Solution:

 

            We want to determine whether the observed value of y = 15 months, or some larger value of y, would be improbable if, in fact, a = 2 and b = 4. We do not give a table of areas under the gamma density function in this text, but we can obtain some idea of the magnitude of P (y > 15) by calculating the mean and standard deviation for the gamma density function when a = 2 and b = 4. Thus,

 

m = ab = (2)(4) = 8

s^2 = ab^2 = (2)(4)^2 = 32

s = 5.7

 

            Since y = 15 months lies barely more than 1 standard deviation beyond the mean (m + s = 8 + 5.7 = 13.7 months), we would not regard 15 months as an unusually large value of y. Consequently, we would conclude that there is insufficient evidence to indicate that the company’s new quality control program has been effective in increasing the mean time between complaints.

 

 

 

Normal Distribution

The most general normal distribution is

where:

The normal distribution:

Is center-symmetric

Is normalized (area under curve equals one)

Has a most probable value of xmp=m

Has a width indicated by s

The normal distribution is often given in terms of the generalized parameter, z

where:

Combined, these two formulas are equivalent to the general form.

Estimates Using the Normal Distribution

When we calculate the average and standard deviation from our data set, we are estimating the parameters of the parent population

In this case

Areas under the normal distribution tell us the probability of occurrence. Areas are tabulated as a function of z (commonly found in tables). The area is that of the integral equation

Normal Distribution

normdist.gif - 6.3 K

 

 

 

 

Normal Distribution

 

1. A normal distribution is bell-shaped.

2. It is a symmetric distribution where the mean, median, and mode all coincide.

3. In the population, many variables such as height and weight have distributions that are approximately normal.

4. Although normal distributions can have different means and variances, the distribution of cases about the mean is always the same.

5. The standard normal distribution allows us to locate an observation within a distribution. This distribution has a mean of 0 and a standard deviation of 1.

The log normal distribution has density

f(x) = 1/(sqrt(2 pi) sigma x) e^-((log x - mu)^2 / (2 sigma^2))

where &mu and &sigma are the mean and standard deviation of the logarithm.

 

Log Normal Distribution

 

The log normal distribution has density

f(x) = 1/(sqrt(2 pi) sigma x) e^-((log x - mu)^2 / (2 sigma^2))

where &mu and &sigma are the mean and standard deviation of the logarithm

 

The Weibull Distribution

 

The two-parameter Weibull distribution is often used to characterize wind regimes because it has been found to provide a good fit with measured wind data. The probability density function is given by the following equation:

where v is the wind speed, k is a unit less shape factor, and c is a scale parameter with the same units as v. The cumulative distribution function is given by the following equation:

The two parameters c and k are related to the average wind speed by the following relation:

where G is the gamma function.

Any Weibull distribution can therefore be described by the average wind speed and the Weibull k value. As shown in Figure 1, lower k values correspond to broader distributions.

 

 

 

 

Figure 1: The probability density function of the Weibull distribution for three values of the shape factor k. For each case, the average wind speed is 6 m/s.

Figure 2 shows a typical distribution of wind speeds measured at Boston, Massachusetts, as well as the Weibull distribution that best fits the measured data. The Weibull k value has been calculated for several locations in the U.S.

Figure 2: The distribution of wind speeds measured at Boston, Massachusetts and the best-fit Weibull distribution, with an average of 5.4 m/s and a k value of 2.4.

 

 

The Gamma Distribution Function

 

-kX

---

k  k   k-1    u

(---)   X     e

u

f(X) = ------------------

Gamma(k)

 

 

 

k - the order of the gamma distribution

k = 1 -> f(x) = exponential distribution

k = inf -> f(x) = distribution of no variance

k = integer -> "f (x) is the distribution that results from creating a spike train of rate k/u with a Poisson process and then deleting all but every kth spike."

 

 

An example plot:

 

 

 

 

 

 

 

Poisson Distribution

The Poisson Distribution is a discrete distribution that represents the number of trials for a random experiment. (How many times something happens is unknown)

 

Examples

Number of orders your company receives in a week.

Number of defects a product has.

The Poisson Distribution has only one parameter, m, the mean.  The standard deviation is the square root of the mean.

The shape of the distribution depends on the size of the mean. If the mean is large the distribution is approximately normal.  For small means the distribution is skewed toward the lower values.

wpe1A.jpg (25374 bytes)

For the Poisson distribution we are usually trying to find out the probability of something happening.  The formula that we will use is listed in the box below.

 

P(X=a) =  where m is the mean.

 

Example 1 On average each car has three defects.  Given that the number of defects follows a Poisson distribution, what is the probability that the car will be defect free?

m = 3 and a = 0

Pr(X=0) = e-3[30/0!]   The part in [] = 1 so Pr(X=0) = e-3 = .05

 

Example 2 Using the information above, find the probability that there will be no more than one defect.

 

We can have either 0 or 1 defect.

Pr(X=0) +Pr(X=1) = .05 + e-3[31/1!] = .05 +.05*3 = .05 +.15 = .2

 

Example 3 While working at Foley Library, you find that an average of 30 people per hour check out books during exam week.  What is the probability that no one arrives in the next 10 minutes?

 

We have a unit problem.  Our mean is given in hours, but the question is in minutes.  We need to convert the mean from hours to minutes.  If 30 people arrive per hour, then on average 5 people will arrive every ten minutes.  So, our mean is 5.  We want Pr(X=0) =e-5*(50/0!) =.007 or 0.7%.

 

 

 

Uniform Distribution

 

A Uniform Distribution is one for which the probability of occurrence is the same for all values of X. It is sometimes called a rectangular distribution. For example, if a fair die is thrown, the probability of obtaining any one of the six possible outcomes is 1/6. Since all outcomes are equally probable, the distribution is uniform. If a uniform distribution is divided into equally spaced intervals, there will be an equal number of members of the population in each interval.

 

If we want a random variate X uniformly distributed on the interval [a,b], a reasonable guess for generating X is given by

displaymath2421

where R is uniformly distributed on (0,1).

If we follow the steps outlined in previous section, we get the same result.

pdf for X

displaymath2422

steps:

Step 1.

the cdf

displaymath1942

Step 2.

Set F(X) = (X - a)/(b - a) = R

Step 3.

Solve for X is terms of R yields

displaymath2421

which is the same as the earlier guess.

Step 4.

 

Generate tex2html_wrap_inline2111as needed, calcualte tex2html_wrap_inline2443using the function obtained.

 

 

The Erlang distribution

 

The Erlang distribution, similar to distributions like the Exponential, Normal, Uniform, Weibull, etc, is a distribution representing a continuous random variable. The distribution is named after Agner Krarup Erlang, a Danish mathematician who worked for the Copenhagen Telephone Company on problems including loss and waiting times in telephone calls. A.K. Erlang's first paper “The theory of probability and telephone conversations” in 1909 describes the problems in detail. First some of the characteristics and properties of the Erlang distribution are explained. The Erlang distribution is characterized by two parameters k and l, and hence it is generally described as the Erlang (k,l) distribution.

Let X be a random variable that is distributed according to an Erlang (k,l) distribution. Then the cumulative distribution function (CDF) of X, F (x), is given by

 

The probability density function (PDF), f(x), is given by

 

Note that for the special case of k = 1, the Erlang distribution reduces to the exponential distribution. A graph of f(x) is depicted in Figure 1 for l = 30 and k = 12.

 

img3.gif (2449 bytes)

 

Figure 1: The PDF of an Erlang (k,l) random variable

The Laplace Satieties Transform of X has a closed-form and is given by

.

The mean of the random variable X is

and the variance of X is

.

 

The hazard rate function h(x) is

 

 

Note that h(x) is an increasing function of x. Therefore X is said to be a random variable with an increasing hazard rate. That implies if X represents the processing times then it means that as the processing continues, there is an increasing probability of immediate process completion. Similarly if X represents lifetimes then the probability of dying immediately increases with age. Many other distributions do not posses the increasing hazard rate property. Therefore the Erlang distribution is generally an appropriate distribution for modeling processing times and lifetimes. Another useful result is the sum of Erlang random variables. Consider n independent random variables, X1, X2, ..., Xn, such that Xi (for i = 1,2,...,n) is distributed according to an Erlang(Ki ,l) distribution. Define the random variable Z as Z = X1 + X2 + . . . +Xn. Then Z is distributed according to an Erlang(K,l) distribution, where K = K1 + K2 + . . . + Kn.

 

Mean

 

        The average value, computed by dividing the sum of a set of values by the number of values in the set. Though the mean is the mathematical average of a set of numbers, mean, median and mode may all be used in various contexts to indicate an average, typical or likely condition; however, they differ. Example: In the number set 1, 2, 2, 4, 5, 8, 13, the mean is 5 (35 ÷ 7); the median is 4 (the middle value); and the mode is 2 (the most frequent value).

 

Median

 

The median is a statistical term for the middle or most central value in a set of numbers halfway between two outermost points.

 

Standard Deviation

 

Standard Deviation: A measure of the dispersion among the elements in a set of data. 

 

 

Standard deviation can be defined as follows:

 


Example:

Problem: The grades on test #1 were as follows:
Grades = [80 85 79 90 95 98 92]
Find the average (population mean) and the standard deviation.

Solution:



Further Applications with Standard Deviation

Problem: Using the above values for mean and standard deviation, calculate the percentage of the data points that are within one standard deviation of the mean.
Mean = 88
Standard Deviation = 7.32

Solution: With the information previously attained, an upper and lower bound for the mean can be established by simply adding and subtracting the standard deviation from the mean.

Upper Bound = 88 + 7.32 = 95.3

Lower Bound = 88 - 7.32 = 80.7

 

Four of the seven data items will fall within this range of plus or minus one standard deviation. Therefore,

(4 Items Inside/ 7 Total Items) * 100 % = 58 %

It can therefore be said that approximately 60% of the data lies within + / - 7.32 from the average.

 

Variance

 

One measure of variability that makes use of the squared deviations is the variance.  The variance is the approximate average of the squared deviations of observed scores from their mean. The variance of a sample is represented by (s) squared.  It is calculated as follows: 

 

           

 

 Covariance

The covariance is defined

\begin{displaymath}\cov(\bmz,\bmy) = E[(\bmz-E\bmz)(\bmy-E\bmy)^T].\end{displaymath}


The bilinearity of the covariance on random variables underlies the bilinearity of the covariance on random vectors.

Whenever A and B are fixed conformable linear transformations, we have the sandwich rule:

\begin{eqnarray*}\cov(A\bmz,B\bmy)=A\cov(\bmz,\bmy)B^T.
\end{eqnarray*}


To prove this use the properties of expectation

\begin{eqnarray*}\cov (A\bmz, B\bmy)
&=& E(A\bmz-E(A\bmy))( B\bmy-E(B\bmy))^T \...
...mz) )(\bmy-E(\bmy))^TB^T \qm{ lin} \\
&=& A\cov (\bmz,\bmy)B^T
\end{eqnarray*}


The covariance also satisfies

\begin{eqnarray*}\cov(\bmy,\bmz)=\cov(\bmz,\bmy)',
\end{eqnarray*}

 

 

 

Cumulative Distribution Function

 

 

of a discrete Random Variable is:

Cumulative Distribution Function of a continuous Random Variable is:

Probability Density

 

Probability density is the probability that x lies in an interval between x and x + dx

Probability density is a continuous function

 

Expected value

 

The expected value of a variable is the long-run average value of that variable. The expected value of a statistic is therefore the mean of the sampling distribution of the statistic. If the expected value of a statistic is the parameter the statistic is estimating, the statistic is an unbiased estimate of the parameter. Expected values of variables are indicated by an "E" with the variable enclosed in brackets. Thus, E[X] is read as the expected value of X.

 

Some basic rules of expected values are shown below:

 

1) E[X] = m where m is the mean of X.

2)    where s2 is the variance of X.

3)

4) E[X + Y] = E[X] + E[Y]

5) E[XY] = E[X]E[Y] if X and Y are independent

6) In general, E[X/Y] does not equal E[X]/E[Y]


Least Square Fit

 

The procedure of obtaining the best fit to a given data set is called regression.

Consider the linear curve fitting first, with the equation:

eqnarray6

to be fit through m data points, that is (x1, y1), (x2, y2), ..., (xm, ym). The deviation of a point (xi, yi) from that calculated from the linear equation is:

eqnarray8

The least-squares problem is then to find the values of a0 and a1, so as to

eqnarray10

This yields two equations:

 

eqnarray14

eqnarray18

 

These equations can be written in the matrix form as:

 

eqnarray22

 

This equation form can be directly extended to n dimension.

The above derivation is for a linear regression problem. Equations such as y=b x^r can be linearized by using:

eqnarray38

and setting

eqnarray40

we have a linearized equation:

eqnarray45

 

 

Analysis of variance

 

Analysis of variance is a method for testing hypotheses about means. It is the most widely-used method of statistical inference for the analysis of experimental data. The basic idea is to compare alternative estimates of the population variance for the purpose of deciding whether the groups have different means. The alternative estimates are predicated on the notion that the mean for certain groups within the population is the same. Under that working supposition there should be no difference in variance estimates constructed from the different sample groups corresponding to the grouping in the population.

 

 

The four most common graphical methods are:

 

Histograms: are essentially bar graphs in which the categories are classes.

 

 

Frequency Histograms: the heights of the bars are determined by the class frequency

 

 

Bar charts: give the frequency (or relative frequency) corresponding to each category, with the height or length of the bar proportional to the category frequency (or relative frequency).

 

 

charts (pie charts):  Circle divide a complete circle (a pie) into slices, one corresponding to each category, with the central angle of the slice proportional to the category relative frequency.

 

 

 

 

Chi-Square Test

 

A statistical measure of goodness of fit, independence, or homogeneity. The Chi-square test can be used to determine whether a sample of data was drawn from a normally distributed population by comparing the sample's frequency distribution with the normal distribution.  It can also be used to determine whether two variables are independent by comparing their observed joint occurrence with their expected joint occurrence, assuming independence. Finally, it can be used to determine whether or not categories of a single variable are represented in the same proportions in two or more populations.

 

If N measurements yi are compared to some model or theory predicting values gi, and if the measurements are assumed normally distributed around gi, uncorrelated and with variances , then the sum

follows the (chi-square) distribution with N degrees of freedom. The test is said to have the significance level if the sum above is equal to the quantile of the distribution:

Integral curves for the distribution exist in computer libraries or are tabulated in the literature. Note that the test may express little about the inherent assumptions; wrong hypotheses or measurements can, but need not cause large 's. The correct statement to make about a measured s is `` is the probability of finding a as large as s or larger''.

 

 

T Test

 

A t test is designed to determine the significance of the difference in means of some measure from two groups. The t test is not appropriate for mean gains but only raw mean scores. There are two kinds of T-tests. One is for non-paired and the other is for paired data. The T-test is will only work when the there are no more than two classifications or groups of the nominal variable (male-female, Democrat-Republican, in program-not in program.)

The formula for the t test is below:

 

 

 

F Test

 

The F-test is used to test for differences among sample variance

The formula for F is:

 

 

In comparing two independent samples of size N1 and N2 the F Test provides a measure for the probability that they have the same variance. The estimators of the variance are s12 and s22. We define as test statistic their ratio T = s12/ s22, which follows an F Distribution with f1= N1-1 and f2= N2-1 degrees of freedom. One can formulate the F test for three different hypotheses, defined by: