[1] "Year" "Lag1" "Lag2" "Lag3" "Lag4" "Lag5"
[7] "Volume" "Today" "Direction"
Year Lag1 Lag2 Lag3
Min. :2001 Min. :-4.922000 Min. :-4.922000 Min. :-4.922000
1st Qu.:2002 1st Qu.:-0.639500 1st Qu.:-0.639500 1st Qu.:-0.640000
Median :2003 Median : 0.039000 Median : 0.039000 Median : 0.038500
Mean :2003 Mean : 0.003834 Mean : 0.003919 Mean : 0.001716
3rd Qu.:2004 3rd Qu.: 0.596750 3rd Qu.: 0.596750 3rd Qu.: 0.596750
Max. :2005 Max. : 5.733000 Max. : 5.733000 Max. : 5.733000
Lag4 Lag5 Volume Today
Min. :-4.922000 Min. :-4.92200 Min. :0.3561 Min. :-4.922000
1st Qu.:-0.640000 1st Qu.:-0.64000 1st Qu.:1.2574 1st Qu.:-0.639500
Median : 0.038500 Median : 0.03850 Median :1.4229 Median : 0.038500
Mean : 0.001636 Mean : 0.00561 Mean :1.4783 Mean : 0.003138
3rd Qu.: 0.596750 3rd Qu.: 0.59700 3rd Qu.:1.6417 3rd Qu.: 0.596750
Max. : 5.733000 Max. : 5.73300 Max. :3.1525 Max. : 5.733000
Direction
Down:602
Up :648
Call:
glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 +
Volume, family = binomial, data = Smarket)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.446 -1.203 1.065 1.145 1.326
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.126000 0.240736 -0.523 0.601
Lag1 -0.073074 0.050167 -1.457 0.145
Lag2 -0.042301 0.050086 -0.845 0.398
Lag3 0.011085 0.049939 0.222 0.824
Lag4 0.009359 0.049974 0.187 0.851
Lag5 0.010313 0.049511 0.208 0.835
Volume 0.135441 0.158360 0.855 0.392
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1731.2 on 1249 degrees of freedom
Residual deviance: 1727.6 on 1243 degrees of freedom
AIC: 1741.6
Number of Fisher Scoring iterations: 3
1 2 3 4 5
0.5070841 0.4814679 0.4811388 0.5152224 0.5107812
Direction
glm.pred Down Up
Down 145 141
Up 457 507
Direction.2005
glm.pred Down Up
Down 77 97
Up 34 44
Direction.2005
glm.pred Down Up
Down 35 35
Up 76 106
a. LDA Requirements: Assumes normal distribution of predictors and equal covariance across all classes.
b. Differences between LDA and Logistic Regression: LDA assumes a common covariance matrix and normally distributed predictors, which is efficient when true. Logistic regression doesn’t assume normality and works well with non-linear class boundaries.
c. ROC (Receiver Operating Characteristic): A plot that shows the performance of a classifier at various thresholds by plotting the true positive rate against the false positive rate. The area under the curve (AUC) indicates the model’s ability to differentiate classes.
d. Sensitivity and Specificity:
Sensitivity: The ability to correctly identify actual positives.
Specificity: The ability to correctly identify actual negatives. The importance of each depends on the specific application, e.g., sensitivity is crucial in medical diagnostics to not miss any cases of disease.
e. Classification Metrics: The critical metric for prediction depends on the consequences of false positives and false negatives, varying by context like healthcare or finance.
This summary provides a clear overview of the topics covered in Chapter 4, highlighting the theoretical differences and practical implications of different classification methods.