
Each
model appears on a different line in the output. Vars is the number of predictor variables used in the model.
The R-sq (R2), Adj. R-sq (Adj. R2), C-p and s are
then displayed. The List of
possible predictors appears next. If
the predictor is included in that particular model, then an 'X' is placed under
the column header for that variable. For
example, the first line tells us that the first model included in our output has
a R-sq of 94.2%, an adjusted R-sq of 93.0%, a C-p of 47.0 and s is 3.3698.
The only predictor included in this model is temp2.
Now we try to determine which model is the best.
We look for the model that gives us the highest Adj. R2,
taking the standard deviation into account.
In our example, all of the models with an Adj. R2 of 99.5% do
not have significantly different standard deviations.
We then look at the C-p. The C-p is the measure of the difference of a
fitted regression model from a true
model, along with the random error. When
a regression model with p independent
variables contains only random differences from a true
model, the average value of C-p is (p+1), where p is the number
of parameters.
The goal, then,
is to find a model with a C-p of (p + 1) or below. The 6th model has a C-p of 3.4 with 3 parameters,
indicating that the difference in the fitted regression model and the true model
is .6.
On a side note, if two or more models are very close to being the same,
choose “the simplest model”. “The
simplest model” is defined to be the one with the smallest number of
parameters. The 6th model listed above only has 3 parameters and so
it is not overly complicated and is, therefore based on the other information
given above, a good model.
However, the 3rd model, while having a higher C-p statistic
(with a difference in the fitted model and the true
model of 1.1) only has 2 parameters, which makes it a good model also.
Best Subsets Regression then, is good for narrowing down the number of
models to a handful allowing you to use other methods, such as hypothesis
testing, checking model assumptions, checking predictability, etc. to determine
which model is actually the best.
Regression Tutorial Menu Dictionary