General Stat Notes/Wrong Answer Notes

Write down the topic to trigger the memory

SPSS EXCEL
 * Better handling of subset regression
 * Better control of output
 * Suspect residual calculations
 * Can't do logistical regression
 * Limited diagnostics
 * Can't handle missing values

Heteroskedasticity - the error variances of a model is not constant

Better than multiple regression/linear regression for binary cases due to assumptions of multiple regression not applying to binary logistic and when interpreting outcome as probability for logistic regression, ordinary regression can give probability of more than 1 or less than 0. Logistic allows right hand side of equation to go from -infinity to +infinity. Logistic function can transform any value into a (0,1) scale.

Types of Time series The smaller the -2LL the better
 * Seasonal - cycles of less than a year
 * Cyclical - cycles of more than a year
 * Stationary - no trend, seasonal or cyclical effect

The default chi squared(USE THIS FOR -2LL) is 1df

The bigger the difference of -2LL the better

When comparing odd ratio of two things, subtract the e^x from each other and calculate it.

Cut value is when the probability is more than 0.x, it is treated as 1 but less than it it will be treated as zero.

The percentages in the classification table are straightforward. Don't overthink it.

A HLP is a point that has an extreme value of the on or more of the explanatory factors

p value = the probability of observing a value more extreme for an estimate of Beta.i than the one observed

Goodness of fit: R^2 and adjusted R^2, F test, significance of variables

Histogram = normality of model

pp-plot = normality of model

pred vs resid = linearity and contant variance and normality

Residual deviance differences have to be checked against chi^2 value at 1df to see if significant or not. If bigger than chi^2 value, then significant difference.

Scatterplot = shows residual behaviour/homoskedasticity

Histogram = normality of errors

P-P plot/Q-Q plot = residuals normality shown

adjusted r^2 = 1- (residual/resdf)/(total/totdf)

when simplifying for normies, put numbers in and interpret for ease of explanation

collinearity = subset of variables are linearly dependent

leads to inflated standard errors and coefficients ie making the model be meaningless

heteroskedasticity leads to reduced precision of estimates and unreliable prediction

autocorrelation fixed with autoregression, autocorrelation leads to t and F tests not being valid

plots to test assumptions should show outliers if exists

outliers can be recording errors

things to improve model

more variables - think qualitatively and common sense

assumption checking

significance of variables checking

checking model with new data

check for other problems like collinearity and autocorrelation

SPSS uses maximum likelihood estimation to obtain parameter estimates