[PDF] 1 Overview 2 A Characterization of Convex Functions

convex function f : S → R defined over a convex set S, a stationary point (the Proof Using the first order expansion of f at x: f(x + λd) = f(x) + ∇f(x)Τ(λd) + o(λd) 2 We now want to find necessary and sufficient conditions for local optimality

1 mar 2016 · Convexity is used in establishing sufficiency If Ω = Rn, the condition above reduces to our first order unconstrained optimality condition ∇f(x) = 0 (why?) Similarly, if x is in the interior of Ω and is optimal, we must have ∇f(x) = 0 (Take y = x − α∇f(x) for α small enough )

[PDF] First-order condition Second-order conditions - Cse iitb

first-order approximation of f is global underestimator Convex functions 3–7 Second-order conditions f is twice differentiable if domf is open and the Hessian

[PDF] Lecture 3 Convex functions

3 mai 2017 · The proof is immediate: the points (f(xi),xi) clearly belong to the conditions for these problems are sufficient for global optimality); and what is Let us first prove that f is convex on the relative interior M of the domain M exp{t} is convex (since its second order derivative is positive and therefore the first

[PDF] 3 Convex functions

affine functions are convex and concave; all norms are convex examples on R 1st-order condition: differentiable f with convex domain is convex iff f(y) ≥ f(x) +

[PDF] Practical Session on Convex Optimization: Convex Analysis

Convexity: Zero-order condition A real-valued function is convex if f (θx + (1 − θ) y) ≤ θf (x) + (1 − θ)f (y), for all x, y ∈ Rn and all 0 ≤ θ ≤ 1 Function is below

[PDF] 1 Overview 2 A Characterization of Convex Functions - Harvard SEAS

[PDF] Convex Functions - Inria

5 déc 2016 · 3 First and Second order conditions Definition (Convex/Concave function: Jensen's inequality) A function f : Rn → R Sketch of the proof (1)

[PDF] Lecture Notes 7: Convex Optimization

Any local minimum of a convex function is also a global minimum Proof We prove the result by Figure 5: An example of the first-order condition for convexity

[PDF] Convexity and Optimization - CMU Statistics

First-order characterization: suppose that f is differentiable (and write ∇f for its gradient) Then f is convex if and analogous story for strict convexity: the condition is that for all x = y, f(y) > f(x) + ∇f(x)T (y − x) Proof: we have f(x⋆) = g (u⋆,v⋆)

[PDF] Convexity II: Optimization Basics

Reminder: a convex optimization problem (or program) is min x∈D f(x) Proof: use definitions First-order optimality condition says that the solution x satisfies

AM 221: Advanced OptimizationSpring 2016

Prof. Yaron Singer Lecture 8 - February 22nd1 Overview

In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed

the simplex method for finding optimal solutions. In this lecture we return to convex optimization and give a characterization of optimality. The main theorem we will prove today shows that for a convex functionf:S!Rdefined over a convex setS, a stationary point (the pointxfor which rf(x)) is a global minimum. We we will review basic definitions and properties from multivariate calculus, prove some important properties of convex functions, and obtain various characterizations of optimality.

2 A Characterization of Convex Functions

Recall the definitions we introduced in the second lecture of convex sets and convex functions:Definition.A setSis called aconvex setif any two points inScontain their line, i.e. for any

1;x22Swe have thatx1+ (1)x22Sfor any2[0;1].Definition.For a convex setSRn, we say that a functionf:S!Risconvex onSif for

any two pointsx1;x22Sand any2[0;1]we have that: f(x1+ (1)x2)f(x1) + 1

f(x2):We will first give an important characterization of convex function. To so, we need to characterize

multivariate functions via their Taylor expansion.

2.1 First-order Taylor approximationDefinition.Thegradientof a functionf:Rn!Rat pointx2Rndenotedrf(x)is:

rf(x) :=@f(x)@x

1;:::;@f(x)@x

n |First-order Taylor expansion.The first order Taylor expansion of a function is: 1 f(x)x x 2 x 1 f(x 1 )!f(x 1 T (x 1 - x 2 ) f(x 2 f(x)x x 2 x 1 f(x 1 )f(x 1 )+ !f(z) T (x 1 - x 2 z!f(z)Figure 1: Depiction of first-order taylor expansion. For a functionf:R!Rof a single variable, differentiable atx2R, we write:

8x2R; f(x) =f(x) +f0(x)(xx) +o(jjxxjj)

where by definition: lim x!xo(jjxxjj)jjxxjj= 0 Similarly, whenf:Rn!Ris a multivariate function, differentiable atx2Rn, we write:

8x2Rn; f(x) =f(x) +rf(x)|(xx) +o(kxxk)

Iffis continuous over[x;x]and differentiable over(x;x), then: f(x) =f(x) +rf(z)|(xx);for somez2[x;x]. This characterization implies that the functionfcan be approximated by the affine functionl: x7!f(x) +f0(x)(xx)whenxis "close to"x.

2.2 Directional derivativesDefinition.Letf:Rn!Rbe a function differentiable atx2Rnand let us considerd2Rn

withkdk= 1. We define thederivative offatxin directiondas: f

0(x;d) = limx!0f(x+d)f(x)

Claim 1.f0(x;d) =rf(x)|d.

Proof.Using the first order expansion offatx:

f(x+d) =f(x) +rf(x)|(d) +o(kdk) 2 hence, dividing by(and noticing thatkdk=2kdk): f(x+d)f(x) =rf(x)|d+o(kdk) lettinggo to0concludes the proof.2.3 Lower bounding convex functions with affine functions In order to prove the characterization of convex functions in the next section we will need the following lemma. This lemma says that any convex function can essentially be underestimated by an affine function. Theorem 2.Letf:Rn!RandSbe a convex subset ofRn. Then,fis convex if and only if for anyx;y2Swe have thatf(y)f(x) +rf(x)|(yx). Proof.[=)] Assumefis convex, and letz=y+(1)xfor somex;y2Sand2[0;1]. From convexity offwe have that: f(z) =f(y+ (1)x)f(y) + (1)f(x) and therefore by subtractingf(x)from both sides we get: f(x+(yx))f(x) =f(y+ (1)x)f(x) =f(z)f(x) f(y) + (1)f(x)f(x) =f(y)f(x):

Thus we get that (for >0):

f(x+(yx))f(x) f(y)f(x)

Applying Claim 1 ond=xywe have that:

rf(x)|(yx) = lim!0+f(x+(yx))f(x) f(y)f(x): [(=] Assume thatf(y)f(x)+rf(x)|(yx)for anyx;y2Sand show thatfis convex. Let x;y2Sandz=y+ (1)x. By our assumption we have that: f(y)f(z) +rf(z)|(yz)(1) f(x)f(z) +rf(z)|(xz)(2)

Multiplying (1) byand adding(1)(2)gives us:

3 Conditions for optimality

Definition.A pointx2Rnat whichrf(x) =0is called astationary point.3.1 Necessary conditions We now want to find necessary and sufficient conditions for local optimality. Claim 3.Ifxis a local extremum of a differentiable functionf:Rn!R, thenrf(x) =0. Proof.Let us assume thatxis a local minimum forf. Then for alld2Rn,f(x)f(x+d)for small enough. Hence:

0f(x+d)f(x) =rf(x)|d+o(kdk)

dividing by >0and letting!0+, we obtain0 rf(x)|d. Similarly, dividing by <0and letting!0, we obtain0 rf(x)|d. As a consequence,rf(x)|d= 0for alld2Rn. This implies thatrf(x) =0.

The case where

xis a local maximum can be dealt with similarly.The above claim states that a a stationary point can either be a minimum, maximum, or a saddle

point of the function, and we will see in the following section that the Hessian of the function can be used to indicate which one exactly. For convex functions however it turns out that a stationary point necessarily implies that the function is at its minimum. Note that with the above Claim, this says that for a convex function a point is optimal if and only if it is stationary. Proposition 4.LetSRnbe a convex set andf:S!Rnbe a convex function. Ifxa stationary point then xis a global minimum. Proof.From Theorem 2 we know that for anyx;y2S:f(y)f(x) +rf(x)(yx). Since

rf(x) =0this implies thatf(y)f(x). As this holds for anyy2S,xis a global minimum.4 Characterizations of Convexity and Optimality via the Hessian

We will now use the Hessian of a function to obtain characterizations of convexity and optimality. We will begin by defining theHessianof a function and then see that it plays a role in approximating a function via a second-order Taylor expansion. We will then usesemi-definitenessof the Hessian matrix to characterize both conditions of optimality as well as the convexity of a function. 4 Definition.Given a functionf:Rn!RitsHessianmatrix at pointx2RndenotedHf(x) (also sometimes denotedr2f(x)) is: H f(x) :=2 6

66664@

2f(x)@x

21@

2f(x)@x

1@x2:::@2f(x)@x

1@xn

2f(x)@x

2@x1@

2f(x)@x

22:::@2f(x)@x

2@xn............

2f(x)@x

n@x1@

2f(x)@x

n@x2:::@2f(x)@x 2n3 7

77775Second-order Taylor expansion.Whenfis twice differentiable it is possible to obtain an

approximation offby quadratic functions. Assume thatf:Rn!Ris twice differentiable atx2Rn, then:

8x2Rn;f(x) =f(x) +rf(x)|(xx) +12

(xx)|Hf(x)(xx) +o(kxxk2) where by definition: lim x!xo(kxxk2)kxxk2= 0 Similarly to the first-order alternative expansion, we have, whenfis of classC1over[x;x] and twice differentiable over this interval:

8x2Rn;f(x) =f(x) +rf(x)|(xx) +12

(xx)|Hf(z)(xx)for somez2[x;x].

Semi-definiteness of a matrix.The following classification of symmetric matrices will be useful.Definition.LetAby a symmetric matrix inRnn. We say thatAis:

1.positive definiteiffx|Ax>0for allx2Rnn f0g;

2.negative definiteiffx|Ax<0for allx2Rnn f0g;

3.positive semidefiniteiffx|Ax0for allx2Rn;

4.negative semidefiniteiffx|Ax0for allx2Rn;

5. Otherwise we say that Aisindefinite.Example: indefinite matrix.Consider the following matrixA:

A:=+41

12 Then we havex|Ax= 4x212x1x22x22. Forx= (1 0), we havex|Ax= 4>0. Forx= (0 1)we havex|Ax=2<0.Ais therefore indefinite. The following theorem gives a useful characterization of (semi)definite matrices. 5

Theorem 5.LetAbe a symmetric matrix inRnn.

1.Ais positive (negative) definite iff all its eigenvalues are positive (negative);

2.Ais positive (negative) semidefinite iff all its eigenvalues are non-negative (non-positive);

Otherwise Ais indefinite.

Proof.The spectral theorem tells us that any real symmetric matrix is diagonalizable. This allows us to diagonalizeA: we can writeA=P|DPwhereDis a diagonal matrix. Then one can write x |Ax=x|P|DPx= (Px)|D(Px). Let us definey:=Px, then we have: x |Ax=nX i=1 iy2i where1;:::;nare the diagonal elements ofD, the eigenvalues ofA. Also note that sincePis invertible,y= 0iffx=0. Ifi>0for alli, then we see thatx|Ax>0for allx6=0. For the other direction, choosingysuch thatyi= 1andyj= 0forj6=iyieldsi>0.

The other cases can be dealt with similarly.4.1 Necessary and sufficient conditions for local extrema

We can now give the second-order necessary conditions for local extrema via the Hessian. Theorem 6.Letf:Rn!Rbe a function twice differentiable atx2Rn. 1. If xis a local minimum, thenHf(x)is positive semidefinite. 2. If xis a local maximum, thenHf(x)is negative semidefinite. Proof.Let us assume thatxis a local minimum. We know from Theorem 3 thatrf(x) =0, hence the second-order expansion at xtakes the form: f(x+d) =f(x) +12 d|Hf(x)d+o(kdj2)

Because

xis a local minimum, whenkdkis small enough:

0f(x+d)f(x)kdk!012

d|Hf(x)d Sod|Hf(x)must be non-negative whenkdkis small enough. This is true for any suchd, hence H

f(x)is positive semidefinite.Combining the conditions of Theorem 3 (stationary point) and Theorem 5 is almost enough to

obtain a sufficient condition for local extrema. We only to slightly strengthen the conditions of

Theorem 5.

Theorem 7.Letfbe a function twice differentiable at a stationary pointx: 6

1.if Hf(x)is positive definite thenxis a local minimum.

2. if Hf(x)is negative definite thenxis a local maximum. Proof.Let us assume thatHf(x)is positive definite. We know thatxis a stationary point. We can write the second-order expansion at x: f(x+d) =f(x) +12 d|Hf(x)d+o(kdk2) Whenkdkis small enough,f(x+d)f(x)has the same sign as12 d|Hf(x)d. But becauseHf(x) is positive definite, for anyd6=0small enough,f(x+d)f(x)>0. This proves thatxis a local minimum.

We can make a similar proof whenHf(x)is negative definite.Remark8.WhenHf(x)is indefinite at a stationary pointx, we have what is known as asaddle

point:xwill be a minimum along the eigenvectors ofHf(x)for which the eigenvalues are positive and a maximum along the eigenvectors ofHf(x)for which the eigenvalues are negative.

4.2 Characterization of convexity

Theorem 9.LetSRnbe convex, andf:S!Rtwice continuous differentiable onS. 1. If Hf(x)is positive semi-definite for anyx2Sthenfis convex onS. 2. If Hf(x)is positive definite for anyx2Sthenfisstrictlyconvex onS. 3. If Sis open andfis convex, thenHf(x)is positive semi-definite8x2S.

Proof.

1. F romthe tailor expansion of fwe know that for somez2[x;y]: f(y) =f(x) +rf(x)T(yx) +12 (yx)THf(z)(yx) IfHf(z)is positive semi-definite, this necessarily implies that: f(y)f(x) +rf(x)T(yx) and from Lemma 2 we get thatfis convex. 2. if Hf(x)is positive definite, we have that: f(y)> f(x) +rf(x)T(yx): Applying the same idea as in Lemma 2 we can show that in this casefisstrictlyconvex. 7

3.Let fbe a convex function and assumeSis open. Forx2S, and some small >0, for any

d2Rnwe have thatx+d2S. From the Taylor expansion offwe get: f(x+d) =f(x) +rf(x)Td+22 dTHf(x)d+o(jjdjj2)

From Lemma 2 we get that iffis convex then:

f(x+d)f(x) +rf(x)Td:

Therefore, we have that for anyd2Rn:

22
dTHf(x)d+o(jjdjj2)0

Dividing by2and taking!0+gives us that for anyd2Rn:dTHf(x)d0.Remark10.It is important to note that ifSis open andfis strongly convex, thenHf(x)is still

(only) positive semi-definite8x2S. Considerf(x) =x4which is strongly convex, then the Hessian isHf(x) = 12x2which equals 0 atx= 0. 8quotesdbs_dbs9.pdfusesText_15

[PDF] [PDF] 1 Overview 2 A Characterization of Convex Functions - Harvard SEAS

AM 221: Advanced OptimizationSpring 2016

2 A Characterization of Convex Functions

1;x22Swe have thatx1+ (1)x22Sfor any2[0;1].Definition.For a convex setSRn, we say that a functionf:S!Risconvex onSif for

2.1 First-order Taylor approximationDefinition.Thegradientof a functionf:Rn!Rat pointx2Rndenotedrf(x)is:

1;:::;@f(x)@x

8x2R; f(x) =f(x) +f0(x)(xx) +o(jjxxjj)

8x2Rn; f(x) =f(x) +rf(x)|(xx) +o(kxxk)

2.2 Directional derivativesDefinition.Letf:Rn!Rbe a function differentiable atx2Rnand let us considerd2Rn

0(x;d) = limx!0f(x+d)f(x)

Claim 1.f0(x;d) =rf(x)|d.

Proof.Using the first order expansion offatx:

Thus we get that (for >0):

Applying Claim 1 ond=xywe have that:

Multiplying (1) byand adding(1)(2)gives us:

3 Conditions for optimality

0f(x+d)f(x) =rf(x)|d+o(kdk)

The case where

66664@

2f(x)@x

2f(x)@x

1@x2:::@2f(x)@x

2f(x)@x

2f(x)@x

22:::@2f(x)@x

2@xn............

2f(x)@x

2f(x)@x

77775Second-order Taylor expansion.Whenfis twice differentiable it is possible to obtain an

8x2Rn;f(x) =f(x) +rf(x)|(xx) +12

8x2Rn;f(x) =f(x) +rf(x)|(xx) +12

1.positive definiteiffx|Ax>0for allx2Rnn f0g;

2.negative definiteiffx|Ax<0for allx2Rnn f0g;

3.positive semidefiniteiffx|Ax0for allx2Rn;

4.negative semidefiniteiffx|Ax0for allx2Rn;

A:=+41

Theorem 5.LetAbe a symmetric matrix inRnn.

1.Ais positive (negative) definite iff all its eigenvalues are positive (negative);

2.Ais positive (negative) semidefinite iff all its eigenvalues are non-negative (non-positive);

Otherwise Ais indefinite.

Because

0f(x+d)f(x)kdk!012

Theorem 5.

1.if Hf(x)is positive definite thenxis a local minimum.

4.2 Characterization of convexity

Proof.

3.Let fbe a convex function and assumeSis open. Forx2S, and some small >0, for any

From Lemma 2 we get that iffis convex then:

Therefore, we have that for anyd2Rn: