1 Support vector machine implementation using Convex Op-
1. (a) (15 pts) Using sklearn, implement a linear SVM with slack variables in dual form by selecting the appropriate method from the descriptions: https://scikit-learn.org/ stable/modules/svm.html#svm-mathematical-formulation. Identify the prediction function that can “predict” an input point, that is, compute the functional margin of the point. Implement a boundary plotting function (see notes in canvas).
(b) (15 pts) Run your SVM implementation on the MNIST-13 dataset. The dataset contains two classes labeled as 1 and 3 in the rst column of the csv file we provided. All other columns are data values. Compute test performance using 10-fold cross-validation on random 80-20 splits. Report your results and explain what you find for the following manipulations.
i. Try C = 0:01; 0:1; 1; 10; 100 and show your results, both graphically and by reporting the number of mistakes on the training and validation data sets.
ii. What is the impact of test performance as C increases?
iii. What happens to the geometric margin 1=jjwjj as C increases?
iv. What happens to the number of support vectors as C increases?
(c) (10 pts) Answer the following questions about the dual SVM approach:
i. The value of C will typically change the resulting classifier and therefore also affects the accuracy on test examples.
ii. The quadratic programming problem involves both w, b and the slack variables i. We can rewrite the optimization problem in terms of w, b alone.
This is done by explicitly solving for the optimal values of the slack variables i = i(w; b) as functions of w, b. Then the resulting minimization problem over w, b can be formally written as:
where the first (regularization) term biases our solution towards zero in the absence of any data and the remaining terms give rise to the loss functions,one loss function per training point, encouraging correct classification. The values of these slack variables, as functions of w, b, are \loss-functions”. a)Derive the functions i(w; b) that determine the optimal i. The equations should take on a familiar form. b) Are all the margin constraints satisfied with these expressions for the slack variables?
Use the code templates provided in the moodle, and any additional instructions posted by the TA.