# BEST代写-线上编程学术专家

Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

# 这是一篇来自日本的关于人工智能基础的人工智能代写

Instruction

• Submit the A4 sized / letter sized report (pdf format fifile) by 23:59 on 30th November 2022 via T2SCHOLA.
• You can use either Japanese or English.
• You can write hand-written manuscript as well as using word processor such as MS Word and TEX.
• You could earn 100 pts by solving all the problems in this document, while you could earn 90 pts to solve the all the problems except (Option). The score earned in this assignment is used for the part of the fifinal score of this lecture. The fifinal score is based on sum of this report and the scores earned at Prof. Okazaki part.

Problem 1

Solve the following problems on linear algebra and probability theory. Here, we assume a vector is a column vector instead of a row vector. Let be the transpose operation; i.e. let x R 1×d be the

transpose of x R d R d×1 .

In problem P1-A, and P1-B, let f(A, b, C, d, x) =1 2 Axb2 2+c x+d be a function of x R d , A R m×d , b R m, c R d , and d R.

P1-A Derive the following derivatives: ∂ ∂f A , and ∂f  b , respectively.

P1-B Here f is rewritten as a function of x as f ˜, and we denote the optimal variable of x to minimize the function f ˜ as ˆx = argmin xRd f ˜(x). Derive analytical solution ofˆx. Here we assume AA be a positive defifinite matrix. (It is ok to use the positive defifinite matrix is also a symmetric matrix.)

P1-C Here let A R m×n,and B R n×n be a square matrix, respectively. Derive the the following derivative, ∂ ∂ A Tr(ABA), where Tr represents trace operation.

P1-D Here let x, and y be real value, respectively. Show that the variance of a sum is V[x + y] =V[x] + V[y] + 2COV[x, y] , where COV[x, y] is the covariance between x and y.

Problem 2

Solve the following problems on linear regression and the effect of regularization. As shown in the lecture, the optimization of the linear regression also known as least square problem is defifined as the following optimization problem:

ˆwLS = argminw 12y Xw22 ,

where the design matrix, the response vector, and the parameter is represented by X R n×d ,y R n，and w R d , respectively (Review lecture slides and check the video if necessary).

Though the regression through this optimization may work properly under some conditions, it is also known that this model is prone to overfifitting. As a simple approach to tackle the overfifitting issue, Ridge regularization is frequently employed in machine learning community. The resultant optimization problem is defifined as follows:

ˆwridge = argmin w 12y Xw2 2 +λ 2w22 .

P2-A Obtain analytical solutions of ˆw LS, andˆwridge, respectively. Here we assume XX is regular, i.e. (XX) 1 exists.

P2-B Explain the procedure of cross validation and the reasons why we need it in machine learning.

(Option) P2-C Even if XX is not regular, prove that XX +λI is regular. This means that the optimal parameterˆw

ridge is available whether XX is regular or not. Here I R d×d represents an identity matrix and λ > 0 denotes hyper parameter of the regularization.

(Option) P2-D Here we assume that XX is regular, prove that X

ˆwLS2 2 ≥ ∥X

ˆwridge2

2 . This result is also known as shrinkage in machine learning. Explain situation(s) where the shrinkage works effectively in machine learning.

Problem 3

Solve the following problems on linear classifification and the numerical optimization. We consider linear binary classifification where an input, an output, and the parameter of the model are represented by x R d , y R, and w R d , respectively. Here we pursue the optimal parameter w with the given training dataset {xi , yi} n

i=1 by employing logistic regression. Specififically, the optimal parameter is obtained by solving the following optimization problem:

ˆw = argmin

w J(w)

J(w) :=ni=1 ln ( 1 + exp ( yiwxi )) + λ 2w22 ,

where λ denotes a hyper parameter of ridge regularization.

P3-A Describe the mechanism of gradient descent methods frequently used in machine learning1 ,and also explain the reason(s) why we need such a numerical method to obtain ˆw in logistic regression.

P3-B Derive ∂J ∂ ( w w) R d , and ∂ ∂ w ( ∂J ∂ ( w w) )R d×d . They are known as gradient, and hessian of J w.r.t w, respectively.

(Option) P3-C As explained in the lecture, The necessary conditions of the optimality is ∂J ∂ ( w w) = 0.

It should be noted that ∂J ∂ ( w w) = 0 is also the suffificient condition of the optimality for logistic regression with the certain strength of regularization λ > 0. This indicates that the parameter˜w such that ∂J ∂ ( w w) |ww = 0 globally minimizes the objective function

J. Explain the reasons why ∂J ∂ ( w w) = 0 is the necessary and suffificient condition of the optimality in logistic regression.