Please answer the following questions. You can have multiple attempts and only your latest attempt will be marked.
The following code shows an incorrect implementation of Adaboost training algorithm.
Please point out all the mistakes you can find and what should be the input variables and output variables of weak_classifier_train.
def Adaboost_train(train_data, train_label, T):
# train_data: N x d matrix
# train_label: N x 1 vector
# T: the number of weak classifiers in the ensemble
ensemble_models = 
for t in range(0,T):
model_param_t = weak_classifier_train(train_data, train_label) #
model_param_t returns the model parameters of the learned weak classifier
# definition of model
Suppose we have two kernel functions such that there are 2 implicit high-dimensional feature maps that satisfies, where is the dot product (a.k.a. inner product) in the D-dimensional space.
denotes the i-th dimension of the j-th mapped feature.
Is the product of two kernel functions, that is, still a valid kernel function? If yes, prove that. If no, please explain why (You can attach images for your derivation)
Assume that the weak learners are a finite set of linear classfiers, Adaboost cannot achieve zero training error if the training data is not linearly separable.
Random forest uses different subset of training data to build each decision tree in the ensemble.
Adaboost is an ensemble method, it can be used to boost the performance of any classifier.
Assume that the weak learners are a finite set of decision stumps, subtracting a constant vectors, say, [1,0.5,3,…], from all features will not impact the predictive accuracy on the test set.
the above equation shows a variant of Ridge regression with an bias term . Please show how to calculate and, where is i-th data sample and is the i-th target value. is the model parameters.
We use the following convolutional neural network to classify a set of 32$\times$32 color images, that is, the input size is 32$\times$32$\times$3:
1) Layer 1: convolutional layer with the ReLU nonlinear activation function, 100 5$\times$5 filters with stride 2.
2) Layer 2: 2$\times$2 max-pooling layer
3) Layer 3: convolutional layer with the ReLU nonlinear activation function, 50 3$\times$3 filters with stride 1.
4) Layer 4: 2$\times$2 max-pooling layer
5) Layer 5: fully-connected layer
6) Layer 6: classification layer
How many parameters are in the first layer (1 point), the second layer (1 point) and the third layer (assume bias term is used) (1 point)?