Distribution Theory

Book: Beyond Multiple Linear Regression Chapter 3

If we have a not-normally distributed response, we need other distributions.

Discrete random variables = countable number of possible values Associated probabilities calculated using probability mass function, calculating P(Y=y) given each variable’s parameters

Binary random variable

Bernoulli process - independent trials, binary outcomes. If we only flip a coin once, we only have one param = p. Basically binomial distribution when n = 1 Bernoulli: Y = number of heads after 1 flip, with probability heads = p Binomial: Y = number of heads after n flips, with probability heads = p In R, dbinom(y, n, p) gives the P(Y=y) value. Effects of params on Binomial dist: as p increases, centre of distribution increases. as n increases, skewness decreases

Geometric random variable

If we did Bernoulli trials until we succeeded we would get a geometric distribution. Y = number of failures before the first success. pmf is P(Y=y) = (1-p)^y * p for y= 0 to inf Y is a geometric distribution. E(Y) = (1-p)/p. Peak of a geometric distribution at y = 0. The larger the value of p, the faster the decay (mean shifts towards 0)

Negative Binomial random variable

Y = the number of failures before the rth success Structure of the pmf looks similar to the Binomial pmf: P(Y=y) = (y + r - 1) C (r -1) * (1 - p)^y * (p) ^ r, for y = 0 to inf If Y ~ Negative Binomial(r, p) then E(Y) = r(1-p) / p As r increases, centre shifts right As p increase, centre shifts left Geometric random variable is a negative binomial random variable with r = 1 Binomial coefficients require non-negative integers; we can generalise using the gamma function: P(Y=y) = gamma(y + r) / gamma(r) * y! * (1-p)^y * p^r Remember r is the fixed number whereas y is the varying number (i.e. r is a parameter of the distribution) The number r is what stops us continuing our counting of y

Hypergeometric random variable

Suppose probability of success is dynamic/changes. Select n items without replacement from N objects, m of which are successes. Modelling Y = number of successes after n selections follows a hypergeometric distribution: p = m/N. Probability of success changes because it depends on what gets drawn.

Poisson random variable

Counting number of events per unit time/space; the number of events depends only on the length or size of the interval. Y = the number of events in one of the intervals, follows a Poisson distribution: lambda is the mean count in the interval. Increase in symmetry of distribution as lambda increases.

Continuous random variables - can take infinite number of values, defined using probability density functions (pdf) - denoted f(y). We get probability by integration of the density curve.

Exponential random variable

If we have a Poisson process with rate lambda, and we are modelling the time Y until the first event, we could use an exponential distribution: As lambda increases, (event is more frequent), E(Y) tends to 0 and distributions die off quicker (since event is more likely to happen to wait time goes down)

Gamma random variable

Gamma distribution for Y if Y = wait time before r events occur in a Poisson process with rate lambda. Exponential distribution = Gamma distribution where r = 1 As r increases, means increase pdf for gamma distribution is defined for all read, non-negative r

Gaussian/Normal random variable

Beta random variable

used to limit possible values to an interval; often used to model distributions of probabilities by bounding to 0 and 1 if alpha = Beta, then distribution is symmetric. if alpha = beta = 1, then we get a uniform distribution (f(y) = 1)

Testing distributions

The above distributions may be useful when modelling. The below distributions may be useful when performing hypothesis testing, as commonly used test statistics follow these distributions

chi-squared distribution

Used in two-way contingency tables, goodness of fit - in these situations, compare observed counts to what is expected under null hypothesis and reject when difference is too large.

In general chi-squared distributions with k degrees of freedom are right skewed with mean = k. They are a special case of gamma distribution, where lambda = 0.5 and r = k/2

Student’s t-distribution

Parametersied by degrees of freedom, k. has mean 0 and standard deviation k/(k-2); as k approaches infinity t-distribution approaches normal distribution

F-distribution

values are non-negative and distribution is right-skewed. Can be derived as ratio of two chi-squared random variables

Summary of distributions: