Logistic Regression

A regression model where the dependent variable is categorical: https://en.wikipedia.org/wiki/Logistic_regression. The categorical variable is usually binary in nature so it is assigned a probability of it occuring. Not exactly probability. An odd of the variable being true/occuring/etc is applied. Odd is the probability of something occuring/probability of alternative occuring: https://en.wikipedia.org/wiki/Odds.

What are odds? Odds are a type of probability measure. If there is a probability of something happening with p, the odds of that something would be p/(1-p). This is used in logistic regression.

There is a thing called risk ratio which is p/q with p and q being probabilities of two events. ODDS RATIO IS NOT RISK RATIO.

Odds ratio is {p/(1-p)}/{q/(1-q)}

How logistic regression works

z = the sum of the independent variables(beta1*x1 + beta2*x2 ..... +betan*xn)

We are trying to see if the binary dependent variable D will occur with D=1 if occuring and D=0 if not.

p(D=1) = (1+e^-z)^-1. The coefficients are unknown and need to be figured out.

Let the above equation become p = (1+e^-z)^-1 -> 1/p = 1+e^-z -> e^-z = (1-p)/p -> e^z = p/(1-p) -> z = ln(p/(1-p))

ln(p/1-p) = beta1*x1 + beta2*x2 ..... +betan*xn which becomes otherwise known as logit(p) = beta1*x1 + beta2*x2 ..... +betan*xn. logit(p) is the log odds of an event happening for a given variable of x.

The odds change when the i'th independent variable changes. beta > 0 implies odds are increased. beta < 0 implies odds are decreased.

But how do you estimate the coefficients? We use the maximum likelihood estimation MLE.

Choosing variables and model adequacy is using the -2Log Likelihood (-2LL) used through SPSS.The smaller the -2LL variable output on SPSS, the better the model is. The change in -2LL needs to be significant however for the model to be considered improved. Several -2LL's are calculated for models containing different number of independent variables.

Use Chi-Squared table

If bigger than the value at critical level then better

The smaller the -2LL the better

Better than multiple regression for binary cases due to assumptions of multiple regression not applying to binary logistic and when interpreting outcome as probability for logistic regression, ordinary regression can give probability of more than 1 or less than 0.