Use the characteristics of a data set to decide whether a linear, quadratic, or exponential model is most appropriate.
Create a regression model for a scenario using technology.
Use a residual plot to validate whether a given model was appropriate.
Quick Lesson Plan
We culminate this unit with a statistical perspective on modeling. In this lesson, students will combine many of the ideas covered in the first four units of this course to find an appropriate function model for a bivariate data set. To prepare for this lesson, make sure that in addition to the lesson handout, you have shared the spreadsheet of data with your students. Having paper copies is helpful but students will also need a digital copy so they can copy and paste the columns into Stapplet. We suggest sharing this link to the digital version of the spreadsheet. A PDF of this data is also available under “Additional Materials”.
The data in the spreadsheet gives the ratio of restaurant to grocery store prices for the same items. Using the table, students first notice general trends and ask questions about things they notice.
Next, students will input their data into Stapplet.com and make a scatterplot. This statistical tool is very user friendly and students should be able to navigate it with ease. They will need to be careful when inputting data so that the inputs for the explanatory and response variables line up. Copy/pasting the columns of data from the spreadsheet will do this automatically. Make sure students don’t have any extra spaces or words in the entry box. The scatterplot is shown below.
Students will start by analyzing a linear model given by the least squares regression line. They will then look at the residual plot to see how far the predicted value (the estimate given by the model) is from the actual value (the value from the data set). Students with no previous exposure to statistics may need help interpreting the “y hat” notation. This just means “predicted output”. For questions 6 and 7, you may need to remind students that the residual plot is actual - predicted. This means that a negative residual is considered an overestimate but a positive residual is considered an underestimate.
The residual plot for the linear regression model is shown below.
In question 9 students explore two additional regression models: a quadratic model, and an exponential model. To tie together all the learning of the course so far, we have students articulate what that model indicates about how the two variables (restaurant to grocery price ratio and time) are changing with respect to each other. At this point in the course, students should be comfortable talking about the difference between linear, quadratic, and exponential growth, and do so with clear and precise language.
After comparing the various models, students will decide which model they think is most appropriate based on the residual plots and use the model to predict a future value. Feel free to switch out the year to the current year when you are teaching this lesson.
What does an output of 2.314 mean?
How can you tell from the residual plot whether the model was good or not? Should we see all residuals of 0? Why or why not?
If the residual is negative is the predicted value an overestimate or underestimate?
Is all curved growth exponential?
What is the same about quadratic growth and exponential growth? What is different?
One very important aspect of regression models is that technology will do whatever the user tells it to do and does not consider the appropriateness of the model. This means that data that the applet will always be able to produce a least squares regression line or quadratic or exponential regression, regardless of what the data looks like. Students must decide based on the information provided in the scatterplot and in the residual plot if the model is appropriate or not. This is where their critical thinking comes in!
A key part of the lesson is interpreting a residual plot. If the model is appropriate, the residual plot should appear random and without apparent patterns. We should NOT be able to say “The model consistently overestimates for ___ values and consistently underestimates for ____ values.” Be sure students understand why the residual plot should be random, and not just close to 0.
Note that we do not talk about r values in this course. If you have a couple of extra days, this would be a great place to add on a few more statistical concepts related to regression, if you are so inclined (or your school’s stats teacher wants to do some extra teaching!).
For more information on the source of this data and how the data was collected, check out the Food Expenditure Series from the Department of Agriculture.