Regression analysis can be conducted with categorical variables. This lesson will focus on regression using categorical variables as "indicator" variables. The variable will be used to indicate whether the specific case has the characteristics prescribed by the category or not. These indicator variables are also called "dummy" variables.
Example for Multiple Regression with one indicator variable.
We have been using the example of thirst increasing with the temperature during time spent outside during the summer time. Another variable that can affect thirst could be whether the person is in the sun or the shade.
Visualization of the Indicator Variable Regression Model
In the graph below, the x-axis represents the temperature and the y-axis represents the water consumed. The two parallel lines indicate the difference in the predicted water consumptions depending on whether the person is in the sun or not. The blue line represents the water consumed if the person is in the sun, and the pink line represents the water drank if the person is NOT in the sun. As expected, more water is consumed when someone is in the sunshine as is shown by the blue line being above the pink line.

The graph below zooms in for a closer view of the range of summer time temperatures represented by our data. This shows that when the person is in the sun there appears to be between 5 and 6 more ounces of water consumed than when they are NOT in the sun.

In this tutorial we will examine how to use indicator variables with two levels (yes and no).
Click here for an example of an indicator variable.
Multiple Regression Main Menu Dictionary
Regression Tutorials Menu STATS @ MTSU