Locally Weighted Regression
In this exercise will investigate locally weighted regression models to predict the prize winnings ($1000s) given a variety of information about performance and success statistics for LPGA golfers in 2009.The table attached (see excel file) contains data related to performance and success statistics for LPGA golfers in 2009. The matrix X contains 11 predictor variables:
1. Average drive (yards)
2. Percent of fairways hit
3. Percent of greens reached in regulation
4. Average putts per round
5. Percent of sand saves (2 shots to hole)
6. Tournaments played in
7. Green in regulation putts per hole
8. Completed tournaments
9. Average percentile in tournaments (high is good)
10. Rounds completed
11. Average strokes per round
The column vector y contains the output variable, prize winnings ($1000s). For each variable in x and y.
a. Briefly describe the locally weighted regression algorithm.
b. Discuss the pros and cons of locally weighted regression over ordinary least squares regression.
c. Discuss how weights are determined (Gaussian ker- nels) and how the kernel bandwidth is optimized.
d. Discuss the importance of standardization for the weight calculation.
Divide the data into training, test, and validation data sets. You must use the same training, test, and validation data sets that you used in previous assignments (i.e. TQ3,4,5 &6)
Build three different locally weighted polynomial regression models: zero order (kernel regression), first order (linear regression), and second order (quadratic regression). For each model, find the optimal kernel bandwidth from values of h = [0:15; 0:25; 0:5; 0:75; 1:0; 1:5; 2:0] to minimize the test RMSE.
Compare the validation performance of your three LWR models with all the models we've looked at before. Comment on the results.