Wednesday, October 21, 2009

DIAMOND PRICE


This data set is a sampling of 617 round shape diamonds collected from retail website in December 2007. The data includes the following variables for each diamond: Price, Carats, Clarity, Color, Cut, ClarityCode, ColorCode, CutCode.
I used this data to do a regression model for estimating the value of a round shaped diamond based on its weight, clarity, color, and cut. The model I came up with was: log(price) = 7.784 + 2.032*log(carats) +0.113*clarityCode + 0.102*colorCode + 0.030*cutCode This model has an adjusted R-squared of .96. The remaining variance is possibly explained by some additional factors not included in the dataset, such as symmetry, polish, and fluorescence. I tried using dummy variables for each of the codes, but was surprised how little this increased the model's explanatory power. It's a log-model, so the errors are larger in absolute dollars when estimating more valuable dollars, but the percentage size of the errors is fairly consistent. (about 1 year ago)

No comments:

Post a Comment