Beyond Least Squares Using Likelihoods

Book: Beyond Multiple Linear Regression Chapter 2

Want to move beyond independent, identically distributed normal responses. This chapter focuses on likelihood focus on way sot fit models, determine estimates and compare models for a range of responses. The big issue with OLS is that they assume responses are normally distributed. Examples of non-normal responses are binary data, count data Likelihood models provide flexibility in the types of models and provides ways in which to compare models

Modelling - objective still to find an estimate for model parameter using the data. For child gender model, param is probability of a boy, data is families gender comp. Under likelihood method we consider a possible value for the param, and determine the likelihood of seeing the observed data. Best estimate of param is value where we are most likely to see our data from all possible values, given by maximum likelihood estimate - MLE

From likelihood function we can find the parameter value in which likelihood is maximised. Could do graphically, or numerically (grid search), or potentially calc (first derivative). Log-likelihoods make it easier to do differentiation since likelihood is usually a product type function. Change in sample size likely reduces variation in estimate, but doesn’t affect parameter val itself

Likelihood funcs fix the (observed) data and give you the chance of seeing this data if the param takes a certain value. different to probability function, which fixes probability and changes observation.

What about conditional probability (to indicate dependence)? One approach is to use multiple parameters - break down into conditional probabiltiies. Still means we need value of params that give MLE, trying different combos of the params. Hard to graph, could use multi variable calc (partial derivatives)

Always try some EDA to validate / give yourself some direction for the modelling!

Comparing model - is one model statistically significantly better than another? If parameters for a reduced model are subset of params for a larger model, then this is a nested model, and difference in likelihood can be used in a test to judge benefit of extra params. If not nested, then can still compare likelihood via AIC or BIC (functions of log-likelihood)

For nested models, use observed difference in max log likelihood in likelihood ratio test (2 log (max bigger/max smaller)) - using chi-squared distribution

Likelihood methods allow accomodation of non-normal responses and correlated data. fitting under the right assumptions using MLE is identical to OLS despite difference in approach (recall OLS is about minimising the residuals, whereas MLE is identifying the param values that are most likely to give observed data). There are other advantages to MLEs as well (see book chapter for this).

Likelihoods can become complex due to covariates. Likelihood can be useful when data has structure e.g. multi-level which induces correlation. Flexibility of likelihood methods will be useful for non-normal responses, enabling us to move beyond MLR