Chapter 2
22 mai 2017 be seen from the second derivative (if it exists). 6. Page 7. DMM summer 2017 ... Calculate the gradient of fA
optimization
The Matrix Cookbook
15 nov. 2012 determinant derivative of inverse matrix
matrixcookbook
Gradients of Inner Products
b − Ax2. 2. = (b − Ax)T (b − Ax). = bT b − (Ax)T b − bT Ax + xT AT Ax. = bT b − 2bT Ax + xT AT Ax Because mixed second partial derivatives satisfy.
lecture
The Matrix Cookbook
determinant derivative of inverse matrix
Matrix Cookbook
Techniques of Integration
apparent that the function you wish to integrate is a derivative in some EXAMPLE 8.1.2 Evaluate ∫ sin(ax + b) dx assuming that a and b are constants ...
calculus Techniques of Integration
Week 3 Quiz: Differential Calculus: The Derivative and Rules of
Question 2: Find limx→2f(x): f(x) = 1776. (A) +∞. (B) 1770. (C) −∞. (D) Does not exist! (E) None of the above. Answer: (E) The limit of any constant
week answers
Assignment 2 — Solutions
If a1b2 = a2b1 show that this equation reduces to the form y′ = g(ax + by). Solution Substituting a2 = λa1 and b2 = λb1 into the equation yields:.
Assignment Solutions
Introduction to Linear Algebra 5th Edition
To see that action I will write b1
linearalgebra
1 Theory of convex functions
1 mar. 2016
ORF S Lec gh
Order and Degree and Formation of Partial Differential Equations
When a differential equation contains one or more partial derivatives of an (viii) z = ax e' +. 1. 2. Sol. (1) We are given z = (2x + a) (2 y + b).
partial differential equations unit

DMM, summer 2017Pauli MiettinenContents•Background •Gradient descent •Stochastic gradient descent •Newton's method •Alternating least squares •KKT conditions2
DMM, summer 2017Pauli MiettinenMotivation•We can solve basic least-squares linear systems using SVD •But what if we have •missing values in the data •extra constraints for feasible solutions •more complex optimization problems (e.g. regularizers) •etc3
DMM, summer 2017Pauli MiettinenGradients, Hessians, and convexity4DMM, summer 2017Pauli MiettinenDerivatives and local optima•The derivative of a function f: ℝ → ℝ, denoted f', explains its rate of change
•If it exists •The second derivative f'' is the change of rate of change5Ä ()=lim h0+Ä(+h)#Ä()
hDMM, summer 2017Pauli MiettinenDerivatives and local optima•A stationary point of differentiable f is x s.t. f'(x) = 0 •f achieves its extremes in stationary points or in points where derivative doesn't exist, or at infinities (Fermat's theorem) •Whether this is (local) maximum or minimum can be seen from the second derivative (if it exists)6
DMM, summer 2017Pauli MiettinenPartial derivative•If f is multivariate (e.g. f: ℝ3 ), we can consider it as a family of functions •E.g. f(x, y) = x2 + y has functions f x (y) = x2 + y and f y (x) = x2 + y •Partial derivative w.r.t. one variable keeps other variables constant 7Ä (,y)=Ä y ()=2 DMM, summer 2017Pauli MiettinenGradient•Gradient is the derivative for multivariate functions f: ℝn• •Here (and later), we assume that the derivatives exist •Gradient is a function ∇f: ℝn
ℝn •∇f(x) points "up" in the function at point x 8Ä= 1 2 nDMM, summer 2017Pauli MiettinenGradient9
DMM, summer 2017Pauli MiettinenHessian•Hessian is a square matrix of all second-order partial derivatives of a function
f: ℝn •As usual, we assume the derivatives exist10H(Ä)= 2 2 1 2 1 2 2 1 n 2 2 1 2 2 2 2 2 n 2 n 1 2 n 2 2 2 n DMM, summer 2017Pauli MiettinenJacobian matrix•If f: ℝm ℝn , then its Jacobian (matrix) is an nm matrix of partial derivatives in form •Jacobian is the best linear approximation of f •H(f(x)) = J(∇f(x))T 11J= 1 1 1 2 1 m 2 1 2 2 2 m n 1 n 2 n m DMM, summer 2017Pauli MiettinenExamples12Ä(,y)= 2 +2y+y (,y)=2+2y y (,y)=2+1Ä=(2+2y,2+1)
H(Ä)=
2220
Ä(,y)=
2 y5+siny
J(Ä)=
2y 2 5cosy FunctionPartial derivativesGradientHessianFunctionJacobianDMM, summer 2017Pauli MiettinenGradient's properties•Linearity: ∇(αf + βg)(x) + α∇f(x) + β∇g(x) •Product rule: ∇(fg)(x) = f(x)∇g(x) + g(x)∇f(x) •Chain rule: •If f: ℝn
and g: ℝm ℝn , then ∇(f∘g)(x) = J(g(x))T ∇f(y)) where y = g(x) •If f is as above and h: ℝ → ℝ, then ∇(h∘f)(x) = h'(f(x))∇f(x) 13IMPORTANT! Chapter 2 OptimizationGradients, convexity, and ALSDMM, summer 2017Pauli MiettinenContents•Background •Gradient descent •Stochastic gradient descent •Newton's method •Alternating least squares •KKT conditions2
DMM, summer 2017Pauli MiettinenMotivation•We can solve basic least-squares linear systems using SVD •But what if we have •missing values in the data •extra constraints for feasible solutions •more complex optimization problems (e.g. regularizers) •etc3
DMM, summer 2017Pauli MiettinenGradients, Hessians, and convexity4DMM, summer 2017Pauli MiettinenDerivatives and local optima•The derivative of a function f: ℝ → ℝ, denoted f', explains its rate of change
•If it exists •The second derivative f'' is the change of rate of change5Ä ()=lim h0+Ä(+h)#Ä()
hDMM, summer 2017Pauli MiettinenDerivatives and local optima•A stationary point of differentiable f is x s.t. f'(x) = 0 •f achieves its extremes in stationary points or in points where derivative doesn't exist, or at infinities (Fermat's theorem) •Whether this is (local) maximum or minimum can be seen from the second derivative (if it exists)6
DMM, summer 2017Pauli MiettinenPartial derivative•If f is multivariate (e.g. f: ℝ3 ), we can consider it as a family of functions •E.g. f(x, y) = x2 + y has functions f x (y) = x2 + y and f y (x) = x2 + y •Partial derivative w.r.t. one variable keeps other variables constant 7Ä (,y)=Ä y ()=2 DMM, summer 2017Pauli MiettinenGradient•Gradient is the derivative for multivariate functions f: ℝn• •Here (and later), we assume that the derivatives exist •Gradient is a function ∇f: ℝn
ℝn •∇f(x) points "up" in the function at point x 8Ä= 1 2 nDMM, summer 2017Pauli MiettinenGradient9
DMM, summer 2017Pauli MiettinenHessian•Hessian is a square matrix of all second-order partial derivatives of a function
f: ℝn •As usual, we assume the derivatives exist10H(Ä)= 2 2 1 2 1 2 2 1 n 2 2 1 2 2 2 2 2 n 2 n 1 2 n 2 2 2 n DMM, summer 2017Pauli MiettinenJacobian matrix•If f: ℝm ℝn , then its Jacobian (matrix) is an nm matrix of partial derivatives in form •Jacobian is the best linear approximation of f •H(f(x)) = J(∇f(x))T 11J= 1 1 1 2 1 m 2 1 2 2 2 m n 1 n 2 n m DMM, summer 2017Pauli MiettinenExamples12Ä(,y)= 2 +2y+y (,y)=2+2y y (,y)=2+1Ä=(2+2y,2+1)
H(Ä)=
2220
Ä(,y)=
2 y5+siny
J(Ä)=
2y 2 5cosy FunctionPartial derivativesGradientHessianFunctionJacobianDMM, summer 2017Pauli MiettinenGradient's properties•Linearity: ∇(αf + βg)(x) + α∇f(x) + β∇g(x) •Product rule: ∇(fg)(x) = f(x)∇g(x) + g(x)∇f(x) •Chain rule: •If f: ℝn
and g: ℝm ℝn , then ∇(f∘g)(x) = J(g(x))T ∇f(y)) where y = g(x) •If f is as above and h: ℝ → ℝ, then ∇(h∘f)(x) = h'(f(x))∇f(x) 13IMPORTANT!