Stepwise modelling, Missing Values, Model Validation (and more)

Stepwise modeling Missing values Model Validation
 * Forward; Nothing in model, add variables to model if they are significant, add one by one, stop adding when all significant variables are added
 * Backward: Everything in model, take away variables in model if they are insignificant, take away one by one, stop taking away when the remaining variables are significant
 * Good for exploratory and initial stages of research
 * Generally, produces the most plausible and good model
 * Order at which variables are added/taken away doesn't matter
 * Forward/backward may give different regressions and may be confusing
 * Among the many combinations of variables in a regression, how to choose:
 * https://en.wikipedia.org/wiki/Mallows's_Cp
 * Select the model with the fewest variables having Cp approximately equal to p with p variables and n datapoints
 * Mindless but simple
 * Too many calculations, tedious, inefficient, overwhelming
 * Best model will be included in the list
 * Values can be missing or erroneous for a variety of reasons(Non-response section in ST205)
 * Do what is reasonable to make up for the missing/erroneous values. No hard and fast rules.
 * Check if variables included seem reasonable eg: sign of coefficient, size, etc
 * Find out if model can predict new data well. Assess prediction errors. They should have zero mean, no pattern, low variance.
 * Use part of data to create model then use the model to predict the remaining data not used