A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z 


A

Acceptance Error, Beta Error, Type II Error - An error made by wrongly accepting the null hypothesis when the null is really false. 

 

Acceptance Region - Opposite of the Rejection Region. It is better to call this the "Fail to Reject Region."  In the case of a two-tailed hypothesis t-test, it is shaded in light blue on the picture below.  If the test statistic falls between -tcritical  and tcritical    then we fail to reject the null hypothesis. 

Adjusted R-Squared,  R-Squared Adjusted -  A version of R-Squared that has been adjusted for the number of predictors in the model.  R-Squared tends to over estimate the strength of the association especially if the model has more than one independent variable.

 

Alpha [A, a ],  Chosen Significance Level -  The maximum amount of chance a researcher is willing to take that they will reject a null hypothesis that is true (Type I Error).

 

Alpha Error,  Type I Error -  An error made by wrongly rejecting the null hypothesis when the null is really true.

 

Alternative Hypothesis, Research Hypothesis -  An hypothesis that does not conform to the one being tested, usually the opposite of the null hypothesis.  Symbolized  or .

 

Analysis of Variance (ANOVA) - A test of differences between mean scores of two or more groups with one or more variables.

 

Approximation Curve, Curve Fitting - the general method for using a line or curve to estimate the relationship between two associated numerical variables. 

 

Autocorrelation - This occurs when later variables in a time series are correlated with earlier variables.


 B

Backward Elimination -  A method of determining regression equation that starts with a regression equation that includes all independent variables and then remover variables that are not useful one at a time

 

Best Subsets Regression -  A method of determining the regression equation used with statistical computer applications that allows the user to run multiple regression models using a specified number of independent variables.  The computer will sort through all of the models and display the "best" subsets of all the models that were run.  "Best" is typically identified by the highest value of R-squared.  Other diagnostic statistics such as R-square adjusted and Cp are also displayed to help the user determine their best choice of a model. 

 

Bell-Shaped Curve - A symmetrical curve. Looks like the cross-section of a bell.

 

Best Fit, Goodness of Fit - A model that is the best model for the given data.

  

Beta Error, Acceptance Error, Type II Error - An error made by wrongly accepting the null hypothesis when the null is really false. 

 

Bivariate Association/ Relationship - The relationship between two variables only.


Cp Statistic -   Cp measures the differences of a fitted regression model from a true model, along with the random error.  When a regression model with p independent variables contains only random differences from a true model, the average value of Cp is (p+1), the number of parameters. Thus, in evaluating many alternative regression models, our goal is to find models whose Cp is close to or below (p+1). (Statistics for Managers, page 917.)

 

Centering - takes the difference between each observation and the mean for the variable.

 

Cook’s Distance:   Cook’s distance combines leverages and studentized residuals into one overall measure of how unusual the predictor values and response are for each observation.  Large values signify unusual observations.  Geometrically, Cook’s distance is a measure of the distance between coefficients calculated with and without the ith observation.  Cook and Weisberg suggest checking observations with Cook’s distance > F (.50, p, n-p), where F is a value from an F-distribution.  (Minitab, page 2-9.)

 

Coefficient of Determination   In general the coefficient of determination measures the amount of variation of the response variable that is explained by the predictor variable(s).  The coefficient of simple determination is denoted by r-squared and the coefficient of multiple determination is denoted by R-squared.  

 

Coefficient of Variation   The coefficient of variation, in regression, is the standard deviation of the predictor variable divided by the mean of the predictor variable.  If this value is small, your variation in the y-values (predictor values) is nearly constant.  This implies that the data are ill-conditioned.  (Source:  Minitab Help Menu)

 

Confidence Bands (Upper & Lower) - This is the range of the responses that can be expected for all of the appropriate inputs of 
X's.
The upper confidence band is the highest value that the ÿh   value is predicted to be. The lower confidence band is the lowest value predicted that ÿh could be.

 

Confidence Level - This is the amount of error allowed for the model (given as a percent or a).

 

Confidence Intervals -  A range of values to estimate a value of a population parameter.  Associated with the range of values is also the amount of confidence the researcher has in the estimate.  For example, we might estimate the cost of a new space vehicle to be 35 million dollars.  Assume that the confidence level is 95% and the margin of error is 5 million dollars.  We say that we are 95% confident that the cost is between 30 and 40 million dollars.  

 

Confidence Interval Bounds, Upper and Lower -  The lower endpoint on a confidence interval is called the lower bound or lower limit The lower bound is the point estimate minus the margin of error.  The upper bound is the point estimate plus the margin of error.

         

Correlation -  The amount of association between two or more items.  In these tutorials, correlation will refer to the amount of association between two or more numerical variables.  (See correlation coefficient.)

 

Correlation Coefficients, Pearson’s Sample Correlation Coefficient, r -  Measures the strength of linear association between two numerical variables.  

 

Correlation Matrix -  A table that shows all pairs of correlations coefficients for a set of variables.  

 

Correlation Ratio-  A kind of correlation used when the relation between two variables is assumed to be curvilinear (i.e. not linear).  

 

Criterion Variable -  Another term for the dependent variable.

 

Curve Fitting, Approximation Curve - the general method for using a line or curve to estimate the relationship between two associated numerical variables. 


 D 

Degrees of Freedom,  df, -  The number of values that can vary independently of one another.  For example, if you have a sample of size n that is used to evaluate one parameter, then there are n-1 degrees of freedom.

 

Dependent Variable,  Response Variable,  Output Variable -  The variable in correlation or regression that cannot be controlled or manipulated.  The variable that "depends" on the values of one or more variables.  In math,  frequently represents the dependent variable. 

 

DFITS, DFFITS:  Combines leverage and studentized residual (deleted t residuals) into one overall measure of how unusual an observation is.  DFITS is the difference between the fitted values calculated with and without the ith observation, and scaled by stdev (Ŷi).  Belseley, Kuh, and Welsch suggest that observations with DFITS >2Ö(p/n) should be considered as unusual.  (Minitab, page 2-9.)

 

Direct Correlation,  Positive Correlation,  Direct Relationship,  Positive Relationship -  A relationship between two variables (x, y) such that as x increases, y increases or if x decreases, y decreases.  As one variable increase, so does the other.  See the graph below.

            

Dummy Variable,  Indicator Variable -  A variable used to code the categories of a measurement. Usually, 1 indicates the presence of an attribute and 0 indicates the absence of an attribute.   Example:  If the measurement variable is cost of space flight vehicle then the vehicle might be manned or unmanned.  Let the dummy variable be 1 if the vehicle is manned and 2 if it is unmanned.  Note:  Dummy variable coding can be used for more than 2 categories.  (See Vogt, page 90)


 E

Efficiency, Efficient Estimator - It is a measure of the variance of an estimate's sampling distribution; the smaller the variance, the better the estimator.

 

Error -  In general, the error difference in the observed and estimated value of a parameter.

 

Error, Measurement (Measurement Error) - inaccurate results due to flaw(s) in the measuring instrument.

 

Errors, Residuals -  In regression analysis, the error is the difference in the observed Y values and the predicted Y values that occur from using the regression model.  See the graph below.

Error, Specification (Specification error) - A mistake made when specifying which model to use in the regression analysis. A common specification error involves including a irrelevant variable and leaving out an important variable.


F

F (F test statistic) - This is the test statistic for whenever conducting an analysis of variance.

 

Fits, Fitted Values, Predicted Values -  The Fits are the predicted values found by substituting the original values for the independent variable(s) into the regression equation.   The name "fit" refers to how well the observed data matches the relationship specified in the model.

 

Forward Selection -  A frequently available option of statistical software applications.  A method of determining the regression equation by adding variables to the regression equation until the addition of new variables does not appear to be worthwhile.

 

F-test:  An F-test is usually a ratio of two numbers, where each number estimates a variance. An F-test is used in the test of equality of two populations. An F-test is also used in analysis of variance, where it tests the hypothesis of equality of means for two or more groups. For instance, in an ANOVA test, the F statistic is usually a ratio of the Mean Square for the effect of interest and Mean Square Error. The F-statistic is very large when MS for the factor is much larger than the MS for error. In such cases, reject the null hypothesis that group means are equal. The p-value helps to determine statistical significance of the F-statistic.  (Vogt, page 117)


G

General Linear Model (GLM) -  A full range of methods used to study linear relations between one continuous dependent variable and one or more independent variables, whether continuous or categorical. “General” means the kind of variable is not specified. Examples include Regression and ANOVA.


Regression Tutorial Menu  Dictionary    H to N    O to R    S to Z

STATS @ MTSU