Discrete and continuous distributions

(Keller 7, 8, 9)

Probability distributions

(Keller 7, 8)

A probability distribution is a table, formula or graph that describes the values of a random variable and the probability associated with these values. Since a random variable can be either discrete or continuous we have two types of probability distributions: discrete and continuous probability distributions.

Example
The following formula represents the continuous normal distribution function.

$\displaystyle{F(x)=\int \limits_{-\infty}^{x}\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(t-\mu}{\sigma})^2}}dt}$

Example
A distribution function can also be described by a table as is the case with the number of persons in a US household..

Number of persons	Number of households
1	31.1
2	38.6
3	18.8
4	16.2
5	7.2
6	2.7
7 or more	1.4
Total	116.0

Discrete probability distributions

(Keller 7)

The following requirements hold for the probability $P(x)$ :

$0\leq{P(x)}\leq{1}$ for all $x$ ;
$\sum_{all x}^{}P(x)=1$ .

There is a relation between a relative frequency diagram and a discrete probability function.

Example
The probability distribution can be estimated from relative frequencies.

$X$ is a discrete variable, the number of persons in a household.

X	# households millions	P(x)
1	31.1	.268
2	38.6	.333
3	18.8	.162
4	16.2	.140
5	7.2	.062
6	2.7	.023
7 or more	1.4	.012
Total	116.0	1.00

$P(x)$ is the discrete probability distribution of the number of persons in a household.

We have: $P(X=1)=31.1/116=0.268$ , $P(X=2)=38.6/116=0.333$ , etc.

Also we can compute $P(X\geq{4})=P(X=4)+\cdots+P(X\geq{7})=0.237$

Population mean E(X)

(Keller 7)

The population mean $\mu$ is the weighted average of all values of $X$ . The weights are the probabilities.
$E(X)$ is called the expected value of $X$ and is defined by the following formula.

$E(X)=\mu=\sum_{all x}{}xP(x)$

Example
What is the mean of throws of a fair dice?
$x=1, 2, \cdots,6$
$P(x)=1/6$ for all $x$ , because it is a fair dice.
Applying the formula we get:

$E(X)=\mu=1\cdot\frac{1}{6}+2\cdot\frac{1}{6} + \cdots+6\cdot\frac{6}{6}=3.5$

Population variance V(X)

(Keller 7)

The population variance $\sigma^2$ is calculated similarly. It is the weighted average of the squared deviations from the mean $\mu$ . The weights are the probabilities. It is defined by the following formula:

$V(X)=\sigma^2=\sum_{all x}{}(x-\mu)^2P(x)$

Example
We compute the variance of the households example. First we compute:

$E(X)=\mu=\sum_{all x}{}xP(x)$

$=1\cdot P(X=1)+2\cdot P(X=2)+\cdots+7\cdot P(\geq7)=$

$=1\cdot0.268+2\cdot0.333+\cdots+7\cdot.0.012=2.513$

Now we can compute the variance:

$V(X)=\sigma^2=\sum_{all x}{}(x-\mu)^2P(x)=$

$=(1-\mu)^2\cdot P(X=1)+(2-\mu)^2\cdot P(X=2)+\cdots+(7-\mu)^2\cdot P(\geq7)=$

$=(1-2.513)^2\cdot0.268+\cdots+(7-2.513)^2\cdot0.012=1.958$

The standard deviation is $\sigma=\sqrt{\sigma^2}=\sqrt{1.958}=1.399$ .

Covariance of two discrete variables

The covariance of two discrete variables $X$ and $Y$ is defined as:

$COV(X,Y)=\sum_{all x}{}\sum_{all y}{}(x-\mu_x)(y-\mu_y)P(x,y)$

$P(X,Y)$ is the joint probability distribution of the random variables $X$ and $Y$ : $P(x,y)=P(X=x$ and $Y=y)$ .

Note. We also write $COV(XY)=\sigma_{XY}$ .

Laws of E(X) and V(X)

(Keller 7)

The following formulas can easily be derived from the definitions of $E(X)$ and $V(X)$ .

$E(c)=c$
$E(X+c)=E(X)+c$
$E(cX)=cE(X)$

$V(c)=0$
$V(X+c)=V(X)$
$V(cX)=c^2V(X)$

For example: $V(cX)=\sum_{all x}{}(cx-c\mu)^2P(cx)= \sum_{all x}{}c^2(x-\mu)^2P(x)=c^2V(X)$

Laws about a linear sum

(Keller, p. 234, $X_1$ and $X_2$ are two random variables).

$E(\alpha X_1+\beta X_2)=\alpha E(X_1)+ \beta E(X_2)$

$V(\alpha X_1+\beta X_2)=\alpha^2 V(X_1)+ \beta^2 V(X_2)+2\alpha \beta COV(X_1,X_2)$

If $X_1$ and $X_2$ are independent then $COV(X_1,X_2)=0$ and thus:

$V(\alpha X_1+\beta X_2)=\alpha^2 V(X_1)+ \beta^2 V(X_2)$

Example
If $X_1$ and $X_2$ are independent then, because $\alpha=1$ and $\beta=-1$ :
$V(X_1-X_2)=V(1\cdot X_1+(-1)\cdot X_2)=$
$=1^2 V(X_1)+ (-1)^2 V(X_2)=V(X_1)+V(X_2)$

Coefficient of correlation

The coefficient of correlation between two variables $X$ and $Y$ is defined as the covariance divided by the standard deviations of the variables.

The population coefficient of correlation is:

$\displaystyle{\rho=\frac{\sigma_{xy}}{\sigma_{x}\sigma_{y}}}$

The sample coefficient of correlation is:

$\displaystyle{r=\frac{s_{xy}}{s_{x}s_{y}}}$

The coefficient of correlation answers the question: how strong is the association between $X$ and $Y$ ?

The advantage of the coefficient of correlation over the covariance is that it has a fixed range from $-1$ to $+1$ (proven by Mathematics). If the two variables are very strongly and positively related, the coefficient value is close to $+1$ (strong positive linear relationship). If the two variables are very strongly and negatively related, the coefficient value is close to $-1$ (strong negative linear relationship). No straight linear relationship is indicated by a coefficient close to 0.

The following graphs depict the relations of $X$ and $Y$ for various coefficients of correlation, varying from $-1$ to $+1$ . Below a number of examples.

Binomial distribution

(Keller 7)

The binomial distribution is the probability distribution that results from doing a binomial experiment. Binomial experiments have the following properties:

There are a fixed number of trials, represented as $n$ ;
Each trial has two possible outcomes, success or failure;
$P$ (success) $=p$ ; $P$ (failure $)=1-p$ for all trials;
The trials are independent, meaning that the outcome of one trial does not affect the outcomes of any other trials.

The binomial random variable $X$ counts the number of successes in $n$ trials of the binomial experiment.
(e.g. s s f s f f f s f s s f s shows $n=13$ trials and $X=7$ successes).
To calculate the probability associated with each value $X$ we use combinatorics:

$\displaystyle{P(X=x)=\frac{n!}{x!(n-x)!}p^x(1-p)^{n-x}$ for $x=0, 1, 2, \cdots, n}$

Example
A quiz consists of $10$ independent multiple-choice questions ( $n=10$ ). Each question has $5$ possible answers, only one of which is correct ( $p=0.2$ ). You choose to guess the answer to each question. $X$ is the number of correct guesses $X=0\cdots, 10$ . The probability that you will have a score $X=0$ is:

$\displaystyle{P(X=0)=\frac{10!}{0!(10-0)!}0.2^0(0.8)^{10-0}=0.1074}$

{ $n!$ is called $n$ factorial $=1\cdot2\cdot3\cdots\cdot{n}$ ; $0!=1$ ; $1!=1$ ).

The mean, variance and standard deviation of a binomial random variable are (derived mathematically):

$\mu=np$

$\sigma^2=np(1-p)$ and thus:

$\sigma=\sqrt{np(1-p)}$

Continuous random variables

(Keller 8)

Unlike a discrete random variable, a continuous random variable is one that assumes an uncountable number of values. We cannot list the possible values because there is an infinite number of them. Because there is an infinite number of values, the probability of each individual value is $0$ . The probability that a man has a height of exactly 180 cm is:

$\displaystyle{P(X=180)=\lim_{\epsilon\to0}[P(180+\epsilon)-P(180-\epsilon)]=0}$

Pobability density functions

{Keller 8)

A function $p(x)$ is called a probability density function (over the range $a\leq{x}\leq{b}$ ) if it meets the following requirements:

$\displaystyle{p(x)\geq{0}$ for all $x\in[a,b]}$

and the total area between curve and $X$ -axis is:

$\displaystyle{\int \limits_{a}^{b} p(x)dx=1}$

For the interval [a, b] we may also take $(-\infty, \infty)$ , as is the case in e.g. the normal distribution.

The normal density function

(Keller 8)

The normal distribution is the most important of all probability distributions. The probability density function $p(x)$ of a normal random variable $X$ is given by:

$\displaystyle{p(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(x-\mu}{\sigma})^2}}$ for $(-\infty\leq {x}\leq\infty)}$

The graph is bell-shaped and symmetrical around the mean $\mu$ . This density function is also denoted by $N(\mu,\sigma)$ or $N(\mu, \sigma^2)$ .

The normal distribution function is defined by:

$\displaystyle{F(x)=\int \limits_{-\infty}^{x}\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(t-\mu}{\sigma})^2}}dt}$

Therefore, the probability $P(X<a)$ equals $F(a)$ .

This infinite integral cannot be computed analytically (pen and paper), therefore we need a table or a computer can do the job.

Standard normal distribution

(Keller 8)

A normal density function with mean $\mu=0$ and standard deviation $\sigma=1$ is called the standard normal density.

$\displaystyle{p(x)=\frac{1}{1\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{(x-0}{1})^2}}$ for $(-\infty\leqx\leq\infty)}$

Any normal distribution can be converted to a standard normal distribution, see below. The standard normal distribution is also denoted by $N(0,1)$ . Any (normal) variable $X$ can be converted to a new (normal) variable $Z$ :

$\displaystyle{Z=\frac{X-\mu}{\sigma}}$

with the following properties:

$\displaystyle{E(Z)=E(\frac{X-\mu}{\sigma})=\frac{1}{\sigma}E(X)-\frac{1}{\sigma}E(\mu)=\frac{\mu}{\sigma}-\frac{\mu}{\sigma}=0}$

$\displaystyle{V(Z)=V(\frac{X-\mu}{\sigma})=\frac{1}{\sigma^2}V(X-\mu)=\frac{1}{\sigma^2}V(X)=\frac{1}{\sigma^2}\sigma^2=1}$ .

Thus, if

$X\sim{N(\mu,\sigma^2)}$

then

$Z\sim{N(0,1)}$ .

Example
Suppose the demand $X$ is a normally distributed variable with mean $\mu=1000$ and standard deviation $\sigma=100$ and we want to compute $P(X<1100)$ . Then: