Assignment 9
This assignment provides you with an opportunity to create several forecasting models for time-series data and evaluate them using Mean Absolute Deviation (MAD) and Mean Squared Error (MSE). Additionally, you'll make forecasts and place them into context using prediction intervals.
Problem 1 (20 Points)
- (10 Pts) The built-in dataset USArrests contains statistics about violent crime rates in the US States. Determine which states are outliers in terms of assaults. Outliers, for the sake of this question, are defined as values that are more than 1.5 standard deviations from the mean. Only identify the outliers, do not remove them.
- (10 Pts) For the same dataset as in (1), is there a correlation between urban population and murder, i.e., as one goes up, does the other statistic as well? Comment on the strength of the correlation. Calculate the Pearson coefficient of correlation in R using a function -- you do not have to do the calculation yourself.
Problem 2 (80 Points)
- (5 Pts) Download the data on the growth of mobile phone use in Brazil (you'll need to copy the data and create a CSV that you can load into R) and load it into an R data frame.
- (5 Pts) Forecast phone subscriptions for the next time period using a simple moving average of the prior four time periods.
- (10 Pts) Forecast phone subscriptions for the next time period using a 3-year weighted moving average (with weights of 6 for the most recent year, 4 for the one prior and 2 for the one prior to that).
- (15 Pts) Forecast phone subscriptions for the next time period using exponential smoothing (alpha of 0.2).
- (15 Pts) Forecast phone subscriptions for the next time period using a linear regression trend line.
- (20 Pts) Calculate the average mean squared error for models from (3), (4), and (5), i.e., use the model to calculate a forecast for each given time period and then calculate the squared error for each observation; then average the squared errors. Which model has the smallest mean squared error (MSE)?
- (10 Pts) Calculate a weighted average forecast by averaging out the three forecasts calculated in (3) through (5) with the following weights: 5 for trend line, 2 for exponential smoothing, 1 for weighted moving average. Remember to divide by the sum of the weights in a weighted average.
Problem 3 (+50 Bonus Points)
- Make a forecast of shampoo sales for the next two time periods based on this data set. There is a seasonal component, so a good forecast takes that into account. Explain your model and state its fit. Provide a prediction interval.
Submission Details
- Your submission must contain two files: the .Rmd notebook and a knitted PDF or HTML (from the notebook). Name files with the pattern, DA5020.A9.LastName.{Rmd,[pdf,html]}.
- The .Rmd file must be fully commented and properly "chunked" R code and detailed explanations. Make sure that it is easy to recognize which question you answer and that your code runs from beginning to end (because that is how we will test it.) Code that doesn't execute, stops, throws errors will receive -- naturally -- receive no points. If the graders have to "debug" your code or spend any effort getting it to run, substantial points will be deducted.
- Not submitting a knitted PDF or HTML will result in reduction of 30 points.
- Not submitting the .Rmd file (or both) will result in a score of 0.
- The problem is graded out of 100 so the maximum score is 150/100 which can add bonus points to your assignment average (and thus your overall grade).
Useful Resources
- TBD