Multiple Regression

Adding more explanatory variables to the regression model so that there can be more accurate models.

k is the number of regression variables

Adjusted R^2
 * R^2 becomes less useful as more variables are added
 * The gap between R^2 and adjusted R^2 as non-significant independent variables are added to the regression equation
 * A good model would have R^2 and adjusted R^2 to be quite close around 2-3%

Regression Assumptions - still the same -> NICE-L

Significance Testing
 * Check the estimated standard deviation. Estimated standard deviation of the model can be read as standard error which is a square root of the residual mean squared
 * Run a F test to examine model with the null being that the model has no predictability while the alternative being the model has predictability.
 * Null not having predictability means that If null is failed to be rejected, that means the model is useless.
 * This is a test for overall significance. For individual variables testing, t test is used.
 * Run a t test on the coefficients of individual variables.
 * Can run a hypothesis test with null being coefficient is zero and alternative being coefficient is not to see which variables are useful to the model or not.

Residual analysis
 * Residual plotting
 * Plot the residuals against each independent variable
 * A random plot is a good sign. Anything else is a problem.
 * Include a quadratic term or taking a log of the dependent variable
 * Histogram
 * Histogram of residuals may show skewness in the residuals
 * A good histogram of residuals shows rough normality or approximate symmetry
 * Skewness is bad
 * Normal probability plot
 * An ideal normal probability plot should be at least 45 degrees
 * Anything other is bad
 * Homoskedasticity and Heteroskedasticity
 * Can be found by plotting residuals against the fitted (y hat) values
 * A cone shape is heteroskedasticity and is bad
 * A blocky, amorphous shape is homoskedasticity and is good.