Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

Assignment #8Forecasting Demand for a Distribution Center Regression with ARIMA Errors Laboratory(Total 40 pts.) Due: 13 (before 11:

Assignment #8Forecasting Demand for a Distribution Center

Regression with ARIMA Errors Laboratory(Total 40 pts.)

Due: Nov. 13 (before 11:59pm)

In this laboratory we explore utilization of Dynamic Regression (Regression with ARIMA Errors) to develop market-level demand forecasting models (forecasting for distribution). We use as an example the demand of "Skippy" brand peanut butter (a Unilever Corp. product) as observed in a supermarket chain (that shall remain anonymous) operating in the Chicago metropolitan area.

The sales data is provided in CSV format in the file "Peanut Butter Chicago.csv". As this is an individual skill-building assignment (as opposed to an open-ended team assignment), and I would like to achieve some degree of convergence in your answers, I have also provided a preprocessing script for the assignment. The data set corresponds to the total weekly sales of peanut butter for the supermarket chain, not the individual stores. As you can observe from the file, the data corresponds to a combination of multiple brands as well as the supermarket private label (generic) in sizes ranging from 0.75 to 1.5 lbs.

Hide

The data includes the following information for each individual stock keeping unit (SKU) as identified by its UPC code on each week in the data file:

  • VEND Number identifying the product vendor (48001 corresponds to Unilever).
  • UPC The product's universal product code (bar code)
  • UNITS Sales volume
  • DOLLARS Dollar sales revenue
  • VOL_EQ Weight in pounds of a units sold
  • PPU Price per unit ($/container)
  • F Factor specifying advertising in the store weekly flyer:
  • F = "A+" Large size ad.
  • F = "A" Medium size ad.
  • F = "B" Small size ad.
  • D Factor specifying In-Store Display
  • D = 0 No In-Store Display
  • D = 1 Minor In-Store Display
  • D = 2 Major In-Store Display

Examine the variables (columns) in "PBS" and the data-type of each column. Below we eliminate unnecesary columns and correct the data types.

Hide

A first step in the modeling process is to design the features you may possibly include in your model. To this effect notice that the above promotional information may vary across stores and SKUs. Thus as a first step in the modeling process (and the assignment) in the preprocessing script I lumped all products into just three aggregate products (sub-categories): "SK" includes all Skippy brand products, "OB" includes all other branded products and "PL" includes all private label (or generic) products. For each of the three aggregate products I obtained total sales (volume), average sale prices, and volume-weighted averages of the advertising and display variables (F and D).

Our goal is to embed a log-log demand model in an ARIMA model that accounts for the auto-correlations in the sales data. As a first attempt we would like to include a model of the following form:

y=e

βx

p

α

S

p

γ

B

p

γ

o

P

Where the model variables and parameters are defined as follows:

  • y
  •  : Demand (sales volume)
  • p
  • S
  •  : Average price per pound of "Skippy" products
  • p
  • B
  •  : Average price per pound of "Other Branded" products
  • p
  • P
  •  : Average price per pound of "Private Label" products
  • x
  •  : Weighted averages of advertising and display variables for each product sub-category
  • β
  •  : Vector of coefficients for advertising and display variables
  • α,γ,γ
  • o
  • : Coefficients of average prices

Hide

We have a total of 104 weeks of data. In this assignment we will use weeks 1 through 94 as a training set and weeks 95 through 104 as a testing set.

Hide

  1. (5 pts) After pre-processing the data, notice that you have 18 predictive variables plus the sales vector. Notice that the pre-processing step already computes the log of the average prices and sales volumes. Use The Lasso on the training set to obtain (a) a shrunk model and (b) the reduced set of predictive variables minimizing the cross-validated MSE over the training set. (Use set.seed(1) before cross-validation). Report the coefficients of the shrunk model.
  2. (5 pts) Use the training set to fit an unrestricted regression model (i.e., lm(...) ) on the reduced set of explanatory variables identified by The Lasso. Report the coefficients of the full model and comment on the fit of the model and examine the auto-correlations of the residuals of this model.
  3. (5 pts) Fit an ARIMA model to explain the training set log-of-sales-volume data. Report the diagnostic of your model's residuals and comment on the model's validity.
  4. (5 pts) Use the model in Question 3 to compose a 10 period ahead forecast and compare it (overly it) with the testing set log-of-sales data. Comment on the usefulness of this model in terms of precision and confidence interval.
  5. (5 pts) Use the auto.arima(...) function to fit a dynamic regression model to explain sales data (log) using only the predictive variables identified by The Lasso in Question 1. Examine the model's residuals and comment on its validity.
  6. (5 pts) Obtain a dynamic regression model that improves on the auto-arima model in Question 5 in terms of its information coefficients and residual diagnostics. Compare the coefficients of the explanatory variables in (a) The Lasso model, (b) The unrestricted model obtained in Question 2, and (c) The ones obtained in this question. Then use the B notation (polynomial) to describe the model you obtained.
  7. (5 pts) Use the model in Question 5 to create a 10 period ahead forecast and compare it (overly it) with the testing set log-of-sales data. Comment on the usefulness of this model in terms of precision and confidence interval relative to the model without explanatory variables in Question 3.
  8. (5 pts) After you finish a project, it is often useful to reflect on what would you do different if you were to perform this project again. This is no exception. Comment on the training and testing fit statistics and discuss how do you think you could improve on the performance of the model in terms of (a) additional data, (b) different pre-processing of the existing data, and (c) different modeling choices. Discuss your assessment of the potential for improvement (ex-ante priorities) for the different improvement options you suggest.
Show more
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question