Assignment 10
This assignment provides you with an opportunity to build a causal forecasting model using multiple regression.
Problem (100 Points)
- (10 Pts) Load the data set on franchise sales and build a full correlation matrix, i.e., a matrix that shows the correlations between all variables. The variables in the data set are: NetSales = net sales in $1000s for a franchise; StoreSize = size of store in 1000s square-feet; InvValue = inventory value in $1000s; AdvBudget = advertising budget in $1000s; DistrictSize = number of households in sales district in 1000s ; NumComp = number of competing stores in sales district. Do you detect any multi-collinearity that would affect the construction of a multiple regression model?
- (20 Pts) Build a full multiple regression model that predicts NetSales. Include all variables. Comment on the model: R-Squared, Standard Error, F-Statistic, p-values of coefficients.
- (20 Pts) Predict net sales of a store with the following values for the variables in order: (4.2, 601, 7.8, 14.2, 6)
- (10 Pts) Define the 95% prediction interval for the prediction in (3) -- use the standard error from the linear model output.
- (20 Pts) Build a multiple regression model in which all coefficients are significant -- use stepwise backward elimination based on coefficients with the p-value > 0.05.
- (10 Pts) Predict net sales of a store with the following values for the variables in order: (4.2, 601, 7.8, 14.2, 6) -- don't use the value that are not meaningful. Compare that prediction to the one obtained in (3). Comment on the difference.
- (10 Pts) Calculate the mean square error (MSE) for the model. For now, this value is meaningless but it can be used eventually to compare the model to other models, e.g., a kNN regression model or a decision tree model.
Submission Details
- Your submission must contain two files: the .Rmd notebook and a knitted PDF OR HTML (from the notebook). Name your file with the pattern DA5020.A10.LastName.{Rmd,[pdf,html]}, where LastName is *your* last name.
- The .Rmd file must be fully commented and properly "chunked" R code and detailed explanations. Make sure that it is easy to recognize which question you answer and that your code runs from beginning to end (because that is how we will test it.) Code that doesn't execute, stops, throws errors will receive -- naturally -- receive no points. If the graders have to "debug" your code or spend any effort getting it to run, substantial points will be deducted.
- Not submitting a knitted PDF or HTML will result in reduction of 30 points.
- Not submitting the .Rmd file (or both) will result in a score of 0.
Useful Resources
- TBD