Kopie van Discrete and continuous distributions

(Keller 7, 8, 9)

Probability distributions

(Keller 7, 8)

A probability distribution is a table, formula or graph that describes the values of a random variable and the probability associated with these values. Since a random variable can be either discrete or continuous we have two types of probability distributions: discrete and continuous probability distributions.

An example of a formula as distribution function is the following continuous normal distribution function.

\displaystyle{F(x)=\int \limits_{-\infty}^{x}\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(t-\mu}{\sigma})^2}}dt}

An example of a table is the following table which describes a discrete distribution function.

Number of personsNumber of households
131.1
238.6
318.8
416.2
57.2
62.7
7 or more1.4
Total116.0

Discrete probability distributions

(Keller 7)

The following requirements hold for the the probability P(x):

  1. 0\leq{P(x)}\leq{1} for all x
  2. \sum_{all x}^{}P(x)=1

There is a relation between the relative frequency diagram and the discrete probability function.

Distribution of the households

The probability distributions can be estimated from relative frequencies.

X is a discrete variable, the number of persons in a household.

X# households millionsP(x)
131.1.268
238.6.333
318.8.162
416.2.140
57.2.062
62.7.023
7 or more1.4.012
Total116.01.00

P(x) is the discrete probability distribution of the number of persons in a household.

We have: P(X=1)=31.1/116=0.268, etc.

Also we can compute P(X\geq{4})=P(X=4)+\cdots+P(X\geq{7})=0.237

Population mean E(X)

(Keller 7)

The population mean \mu is the weighted average of all values of X. The weights are the probabilities.
E(X) is called the expected value of X and is defined by the following formula.

E(X)=\mu=\sum_{all x}{}xP(x)

Example: a dice
What is the mean of throws of a fair dice.?
x=1, 2, \cdots,6
P(x)=1/6 for all x
and thus:
E(X)=\mu=1\cdot\frac{1}{6}+2\cdot\frac{1}{6} + \cdots+6\cdot\frac{6}{6}=3.5

Population variance V(X)

(Keller 7)

The population variance \sigma^2 is calculated similarly. It is the weighted average of the squared deviations from the mean \mu. The weights are the probabilities. It is defined by the following formula:

V(X)=\sigma^2=\sum_{all x}{}(x-\mu)^2P(x)

Example: distribution of households
E(X)=\mu=\sum_{all x}{}xP(x)

=1\cdot P(X=1)+2\cdot P(X=2)+\cdots+7\cdot P(\geq7)=

=1\cdot0.268+2\cdot0.333+\cdots+7\cdot.0.012=2.513

V(x)=\sigma^2=\sum_{all x}{}(x-\mu)^2P(x)=

=(1-\mu)^2\cdot P(X=1)+(2-\mu)^2\cdot P(X=2)+\cdots+(7-\mu)^2\cdot P(\geq7)=

=(1-2.513)^2\cdot0.268+\cdots+(7-2.513)^2\cdot0.012=1.958

The standard deviation is \sigma=\sqrt{\sigma^2}=\sqrt{1.958}=1.399.

Covariance of two discrete variables

The covariance of two discrete variables X and Y is defined as:

COV(X,Y)=\sum_{all x}{}\sum_{all y}{}(x-\mu_x)(y-\mu_y)P(x,y)

P(X,Y) is the joint probability distribution of the random variables X and Y: P(x,y)=P(X=x and Y=y).

Note. We also write COV(XY)=\sigma_{XY}.

Laws of E(X) and V(X)

(Keller 7)

E(c)=c
E(X+c)=E(X)+c
E(cX)=cE(X)

These formulas can easily be derived from the definition of E(X)

V(c)=0
V(X+c)=V(X)
V(cX)=c^2V(X)

These formulas can easily be derived from the definition of V(X)

For example: V(cX)=\sum_{all x}{}(cx-c\mu)^2P(cx)= \sum_{all x}{}c^2(x-\mu)^2P(x)=c^2V(X)

Laws about sum of variables

(Keller, p. 234 (X_1 and X_2 are two random variables)

E(\alpha X_1+\beta X_2)=\alpha E(X_1)+ \beta E(X_2)

V(\alpha X_1+\beta X_2)=\alpha^2 V(X_1)+ \beta^2 V(X_2)+2\alpha \beta COV(X_1,X_2)

If X_1 and X_2 are independent then COV(X_1,X_2)=0) and thus:

V(\alpha X_1+\beta X_2)=\alpha^2 V(X_1)+ \beta^2 V(X_2)

Example
If X_1 and X_2 are independent then (because \alpha=1 and \beta=-1:
V(X_1-X_2)=V(1\cdot X_1+(-1)\cdot X_2)=
=1^2 V(X_1)+ (-1)^2 V(X_2)=V(X_1)+V(X_2)

Coefficient of correlation

The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables.

The population coefficient of correlation is:

\rho=\frac{\sigma_{xy}}{\sigma_{x}\sigma_{y}}

The sample coefficient of correlation is:

r=\frac{s_{xy}}{s_{x}s_{y}}

The coefficient of correlation answers the question: how strong is the association between X and Y?

The advantage of the coefficient of correlation over the covariance is that it has a fixed range from -1 to +1 (proven by Mathematics). If the two variables are very strongly and positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly and negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight linear relationship is indicated by a coefficient close to 0.

The following graphs depict the relations of X and Y for various coefficients of correlation, varying from -1 to +1.

Binomial distribution

(Keller 7)

The binomial distribution is the probability distribution that results from doing a binomial experiment. Binomial experiments have the following properties:

  1. There are a fixed number of trials, represented as n
  2. Each trial has two possible outcomes, success or failure
  3. P(success)=p;   P(failure)=1-p for all trials
  4. The trials are independent, meaning that the outcome of one trial does not affect the outcomes of any other trials.

The binomial random variable X counts the number of successes in n trials of the binomial experiment.
(e.g. s s f s f f f s f s s f s shows n=13 trials and X=7 successes).
To calculate the probability associated with each value X we use combinatorics:

P(X=x)=\frac{n!}{x!(n-x)!}p^x(1-p)^{n-x} for x=0, 1, 2, \cdots, n

Example
A quiz consists of 10 independent multiple-choice questions (n=10). Each question has 5 possible answers, only one of which is correct (p=0.2). You choose to guess the answer to each question. X is the number of correct guesses X=0\cdots, 10. The probability that you will have a score X=0 is:

P(X=0)=\frac{10!}{0!(10-0)!}0.2^0(0.8)^{10-0}=0.1074

{n! is called n factorial =1\cdot2\cdot3\cdots\cdot{n}; 0!=1; 1!=1)

Mean and variance

(Keller 7)

The mean, variance and standard deviation of a binomial random are variable (can be derived mathematically):

\mu=np

\sigma^2=np(1-p) and thus:

\sigma=\sqrt{np(1-p)}

Continuous random variables

(Keller 8)

Unlike a discrete random variable, a continuous random variable is one that assumes an uncountable number of values. We cannot list the possible values because there is an infinite number of them. Because there is an infinite number of values, the probability of each individual value is 0. So, the probability that a man has a height of exactly 180 cm is:

P(X=180)=\lim_{\epsilon\to0}[P(180+\epsilon)-P(180-\epsilon)]=0

Pobability density functions

{Keller 8)

A function f(x) is called a probability density function (over the range a\leq{x}\leq{b}) if it meets the following requirements:

f(x)\geq{0} for all x\in[a,b]

The total area between curve and X-axis is:

\int \limits_{a}^{b} f(x)dx=1

For the interval [a, b] we may also take (-\infty, \infty), as is the case in e.g. the normal distribution.

The normal density function

(Keller 8)

The normal distribution is the most important of all probability distributions. The probability density function p(x) of a normal random variable X is given by:

p(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(x-\mu}{\sigma})^2}} for (-\infty\leq {x}\leq\infty)

The graph is bell-shaped and symmetrical around the mean \mu. This density function is also denoted by N(\mu,\sigma) or N(\mu, \sigma^2).

The normal distribution function is defined by:

F(x)=\int \limits_{-\infty}^{x}\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(t-\mu}{\sigma})^2}}dt

Therefore, the probability P(X<a) equals F(a).

This infinite integral cannot be computed analytically (pen and paper), therefore we need a table or a computer can do the job.

Standard normal distribution

(Keller 8)

A normal density function with mean \mu=0 and standard deviation \sigma=1 is called the standard normal density.

p(x)=\frac{1}{1\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(x-0}{1})^2}} for (-\infty\leqx\leq\infty)

Any normal distribution can be converted to a standard normal distribution, see below. The standard normal distribution is also denoted by N(0,1).

Any (normal) variable X can be converted to a new (normal) variable Z:

Z=\frac{X-\mu}{\sigma}

with the following properties:

E(Z)=E(\frac{X-\mu}{\sigma})=\frac{1}{\sigma}E(X)-\frac{1}{\sigma}E(\mu)=\frac{\mu}{\sigma}-\frac{\mu}{\sigma}=0

V(Z)=V(\frac{X-\mu}{\sigma})=\frac{1}{\sigma^2}V(X-\mu)=\frac{1}{\sigma^2}V(X)=\frac{1}{\sigma^2}\sigma^2=1.

Thus, if

X\sim{N(\mu,\sigma^2)}

then

Z\sim{N(0,1)}.

Example
Suppose the demand X is a normally distributed variable with mean \mu=1000 and standard deviation \sigma=100 and we want to compute P(X<1100). Then:

P(X<1100)=

=P(\frac{X-\mu}{\sigma}<\frac{1100-\mu}{\sigma})=

=P(Z<1.00)=0.8413.

The answer can be found in Table 3 of Appendix B9 of Keller, or by Excel.

Other continuous distributions

There are three other continuous distributions which will be used later.

  1. t distribution (also called Student's t distribution)
  2. \chi^2 (ci-squared) distribution)
  3. F distribution

Sampling distributions

(Keller 9)

A sample of size n is just one of many possible samples of size n. If N is the population size and n the sample size (nN) then the number of possible different samples equals \large{\binom{N}{n}}.

They are usually very large, e.g.: \binom{100}{25}=2.4\cdot{10^{23}}

Most samples have (different) random statistics, e.g. \overline{x} or s.

These sample statistics have a probability distribution, the so-called sampling distribution.

Some mathematics

\overline{X} and s are sample statistics. Let us derive the distribution function of \overline{X}. We know that E(X)=\mu_X and V(X)=\sigma^2. Then:

\mu_{\overline{X}}=E(\overline{X})=E(\frac{\sum{}{}X}{n})=\frac{\sum{}{}{E(X)}}{n}=\frac{nE(X}{n}=\frac{n\mu_{X}}{n}=\mu_X

{\sigma_{\overline{X}}^2=V(\overline{X})=V(\frac{\sum{}{}X}{n})=V(\frac{1}{n}\sum{}{}X)=

=\frac{1}{n^2}V(\sum{}{}X)=\frac{1}{n^2}\sum{}{}V(X)=\frac{1}{n^2}nV(X)=\frac{V(X)}{n}={\sigma_{\overline{X}}^2}/n

So, for ther random variable \overline{X} it holds: \mu_{\overline{X}}=\mu_X and \sigma_{\overline{X}}=\sigma_{X}/\sqrt{n}

Earlier we defined for any random variable X:

Z=\frac{X-\mu_X}{\sigma_X}

and thus for the random variable \overline{X} we get:

Z=\frac{\overline{X}-\mu_\overline{X}}{\sigma_\overline{X}}}=\frac{\overline{X}-\mu_X}{\sigma_{X}/\sqrt{n}}

Central Limit Theorem

(Keller 7, 8, 9)

The sampling distribution of the means of random samples drawn from any population is approximately normal for a sufficiently large sample size n. The larger the sample size, the more closely the sampling distribution of \overlineX will resemble a normal distribution.

If the distribution of the population is normal, then \overline{X} is normally distributed for all sample sizes n. If the population is non-normal, then \overline{X} is approximately normal only for larger values of n. In most practical situations, a sample size of n=30 may be sufficiently large to allow us to use the normal distribution as an approximation for the sampling distribution of \overline{X}.

Verify Central Limit Theorem

(Keller 9)

The following is a program in pseudo code.

  1. Take a first sample of size n=30 of a uniform distribution and compute its sample mean {\overline{X}_1;
  2. Repeat this k=5000 times and thus get 5000 sample means \overline{X}_1\cdots\overline{X}_{5000}. Also these means are random variables.
  3. According to the Central Limit Theorem these 5000 random means should be (approximately) normally distributed.
  4. Verify this graphically by drawing a histogram.
  5. Verify this by applying a normality test (e.g. Anderson-Darling).
  6. Repeat 1-5 for n=1, 10, 30, 100 and notice the differences.

The actual program is executed by the programming language but any programming language can do the job. Thc code of the R program is as follows:

# Suppose x has a uniform distribution
# n is the sample size, preferably n = 30
n <- 30
# k is the number of such sample means, sufficiently large, e.g. k = 5000
k <- 5000
# According to the Central Limit Theorem
# the k sample means should approximate a normal distribution
z <- numeric(k) # z is a vector with k elements and will contain all k sample means
for (j in 1:k) (z[j] <- mean(runif(n))) # compute the mean of each uniform sample
# show the histogram of these means
hist(z)
# and find out whether the distribution of means is normal
# which is approximately true for n ≥ 30
ad.test(z) # Anderson-Darling test

The result is as follows.

The left graphs represents a unoform distribution on [0,1]; the right graph depicts a histogram of 5000 sample means which is rather good approximation of a normal distribution.

Using the standard normal distribution

(Keller 9)

Suppose the population random variable X is normally distributed with \mu=32.2 and \sigma=0.3.

We take a sample of size 4 drawn from the population. The sample mean is denoted by \overling{X}. We want to compute P(\overline{X}>32).

We know:

X is normally distributed, therefore so will be \overline{X}.

\mu_{\overline{X}}=\mu_X=32.2 and \sigma_{\overline{X}}=\frac{\sigma_X}{\sqrt{4}}=\frac{0.3}{2}=0.15

P({\overline_{X}}>32)=P(Z>\frac{32-\mu_\overline{X}}{\sigma_\overline{X}}) Let op: fout in formule.

=P(Z>\frac{32-32.2}{0.15})=P(Z>-1.333)=0.9082

The answer can be found in Table 3 of Appendix B9 of Keller.

The difference of two means

(Keller 9)

Consider the sampling distribution of the difference {\overline{X}}_1-{\overline{X}}_2 of two sample means.

If the random samples are drawn from each of two independent normally distributed populations, then {\overline{X}}_1-{\overline{X}}_2 will be normally distributed as well with:

\mu_{\overline{X}_1}-\mu_\overline{X}_2}=\mu_1-\mu_2

\sigma_{\overline{X}_1}-\sigma_\overline{X}_2}=

=\sqrt{{\sigma_1}^2}/{n_1}+{{\sigma_2}^2}/{n_2}}
Let op: fout in formule.

If two populations are not both normally distributed, and the sample sizes are large enough (n>30), then in most cases the distribution of {\overline{X}}_1-{\overline{X}}_2 is approximately normal (see the Central Limit Theorem).

Normal approximation to Binomial

See the following example: a binomial distribution with n=20 and p=0.5 superimposed by a normal distribution (\mu=np=10 and \sigma={\sqrt{np(1-p)}=2,24).

The graph shows P(0), P(1), \cdots, P(19), P(20) and the graph of a N(10, 2.24) distribution. See the formulas of the probabilities of a binomial distribution.

The normal approximation to binomial works best when the number of experiments n is large and the probability of succes p is close to 0.5.

For the approximation to provide acceptable results two conditions should be met:

np\geq5 and n(1-p)\geq5

The following graph shows the approximations witp p=0.8 and various values of n.

Example
For a binomial distribution (n=20, p=0.5) we find (using Excel):
P(X\leq13)=0.942341.
For a normal distribution (\mu=10, \sigma=2.24) we find:
P(X\leq13)=P(X\leq13.5)=0.940915 (continuity correction).

Distribution of a sample proportion

The estimator of a population proportion of successes is the sample proportion. That is, we count the number of successes in a sample of size n and compute:

\hat{p}=\frac{X}{n}

X is the number of successes, n is the sample size.

Note that the random variable X has binomial distribution.

Using the laws of expected value and variance, we can determine the mean, variance and standard deviation. Sample proportions can be standardized to a standard normal distribution using the formula:

Z=\frac{X-\mu}{\sigma} and thus Z=\frac{\hat{p}-p}{\sqrt{p(1-p/n}}

Note.
Binomial disribution: X=p, E(X)=np, V(X)=np(1-p)

E(\hat{p})=E(\frac{X}{n})=\frac{1}{n}E(X)=\frac{1}{n}\cdot{np}=p

V(\hat{p})=V(\frac{X}{n})=\frac{1}{n^2}V(X)=\frac{1}{n^2}np(1-p)=\frac{p(1-p)}{n}

and thus:

\sigma_{\hat{p}}=\sqrt{p(1-p)/n

Example
In the last election a state representative received 52% of the votes (so p=0.52; this can be considered as a population parameter!)
One year after the election the representative organized a survey that asked a random sample of n=300 people whether they would vote for him in the next election.
If we assume that his popularity has not changed what is the probability that more than half of the sample would vote for him?

The number of respondents who would vote for the representative is a binomial random variable with n=300 and p=0.52 and we want to determine the probability that the sample proportion is greater than 50%, That is, we want to compute P(\hat{p}>0.50).

From the foregoing we know that the sample proportion \hat{p} is approximately normally distributed with mean p=0.52 and standard deviation \sigma_{\hat{p}}=\sqrt{p(1-p)/n}=\sqrt{0.52(1-0.52)/300}=0.0288

Thus we compute:

P(\hat{p}>0.50)=

=P(\frac{\hat{p}-p}{\sqrt{p(1-p/n}}>\frac{0.50-0.52}{0.0288})=

=P(Z>0.69)=0.7549

If we assume that the level of support remains at 52% the probability that more than half the sample of 300 people would vote for the representative is 75,49%.

0
Web Design BangladeshWeb Design BangladeshMymensingh