Analysis of Optimization Algorithms via Sum-of-Squares arXiv PDF

MA/M.Sc. Mathematics

SOS/Math/C005. Paper -VI- Viva-Voce. SOS/Math/C006. Semester-II. Core Course. Paper- VII Abstract Algebra- II. SOS/Math/C007. Paper- VIII Fluid Dynamics.

Tale spé SOS Maths – Suites & Récurrences Sept 2022

Page 1. Tale spé. SOS Maths – Suites & Récurrences. Sept 2022.

SOSOPT: A Toolbox for Polynomial Optimization

8 Aug 2013 arXiv:1308.1889v1 [math.OC] 8 Aug 2013 ... words a polynomial p is SOS if there exists polynomials {fi}m i=1 such that p = ?m.

Name : Kalyani Joshi Designation : Teaching Assistant Mathematics

Maths-3. BSS2MA01 B.Sc. SOS. Maths-2. BSMA103. B.Sc. SOS. Maths-1. B.Sc. SOS. Maths-4. B.Sc. SOS. Q.B.M.. BBA. SOM. Q.B.A.. BBA. SOM. Publications:- Nil.

Pathfinder: Math

S.O.S. Math · (http://www.sosmath.com/algebra/algebra.html). Step by step instructions for solving high school algebra trigonometry

Robust SOS-Convex Polynomial Programs: Exact SDP Relaxations

7 Oct 2013 arXiv:1307.5386v2 [math.OC] 7 Oct 2013 ... of convex optimization problems called robust SOS-convex polynomial optimization problems

Analysis of Optimization Algorithms via Sum-of-Squares arXiv

22 Jun 2021 by Drori and Teboulle [Math. ... arXiv:1906.04648v4 [math. ... optimization algorithms through the use of sum-of-squares (SOS) optimization ...

Code: Surname: First Name: Form: Email address: Role: HT Bevan

HOF Teacher

sos»woa IM/A Tl srovow

Thomas Edison Middle School presents 2020 Summer Math Challenge sos»woa IM/A Tl srovow. Congratulations on successfully completing seventh grade!

Sparse non-SOS Putinar-type Positivstellens atze

12 Oct 2021 arXiv:2110.10079v1 [math.CA] 12 Oct 2021 ... Recently non-SOS Positivstellensätze for polynomials on compact semialgebraic sets

Analysis of Optimization Algorithms via

Sum-of-Squares

Sandra S. Y. Tan Antonios Varvitsiotis Vincent Y. F. Tan

June 23, 2021

Abstract

We introduce a new framework for unifying and systematizing the performance analysis of first-order black-box optimization algorithms for unconstrained convex min- imization. The low-cost iteration complexity enjoyed by first-order algorithms renders them particularly relevant for applications in machine learning and large-scale data analysis. Relying on sum-of-squares (SOS) optimization, we introduce a hierarchy of semidefinite programs that give increasingly better convergence bounds for higher levels of the hierarchy. Alluding to the power of the SOS hierarchy, we show that the (dual of the) first level corresponds to the Performance Estimation Problem (PEP) introduced by Drori and Teboulle [Math. Program., 145(1):451-482, 2014], a powerful framework for determining convergence rates of first-order optimization algorithms. Consequently, many results obtained within the PEP framework can be reinterpreted as degree-1 SOS proofs, and thus, the SOS framework provides a promising new approach for certifying improved rates of convergence by means of higher-order SOS certificates. To determine analytical rate bounds, in this work we use the first level of the SOS hierarchy and derive new results for noisy gradient descent with inexact line search methods (Armijo,

Wolfe, and Goldstein).

1 Introduction

The pervasiveness of machine learning and big-data analytics throughout most academic fields and industrial domains has triggered renewed interest in convex optimization, the subfield of mathematical optimization that is concerned with minimizing a convex objective function over a convex set of decision variables. Of particular relevance for solving large-scale? Sandra S. Y. Tan was with the Department of Electrical and Computer Engineering, National University of Singapore (sandra_tsy@u.nus.edu). Antonios Varvitsiotis was with the Department of Electrical and Computer Engineering and Department of Industrial Systems Engineering and Management, National Uni-

versity of Singapore (avarvits@gmail.com). Vincent Tan is with the Department of Electrical and Computer

Engineering and Department of Mathematics, National University of Singapore (vtan@nus.edu.sg).

1arXiv:1906.04648v4 [math.OC] 22 Jun 2021

convex optimization problems with low accuracy requirements are first-order algorithms, defined as iterative algorithms that only use (sub)gradient information. There exists extensive literature on the convergence analysis of first-order optimization algorithms with respect to various performance metrics; see, e.g., [2,4-6] and the references therein. However, existing convergence results typically rely on case-by-case analyses and cannot be understood by a common guiding principle. In this work we introduce a uni- fied framework for deriving worst-case upper bounds on the convergence rates of first-order optimization algorithms, through the use of sum-of-squares (SOS) optimization. SOS optimization is an active research area with important practical applications; see, e.g., [3,21,22,29,30]. The key idea underlying SOS optimization is to use semidefinite program- ming (SDP) relaxations for certifying the nonnegativity of a polynomial over a set defined by polynomial (in)equalities. This allows to construct hierarchies of SDPs that approximate the optimal value of arbitrary polynomial optimization problems. To illustrate the main ingredients of our approach, consider the problem of minimizing a convex functionf:Rn!RoverRn, i.e.,minx2Rnf(x);and letxbe a global minimizer. Any solution strategy entails choosing a black-box algorithmAthat generates a sequence of iteratesfxkgk1. Our goal is then to estimate the worst-case convergence rate ofA with respect to a fixed family of functionsFand an appropriate measure of performance (e.g., distance to optimalitykxkxkor objective function accuracyf(xk)f(x)). For concreteness, using as performance metric the objective function accuracy and a first-order algorithmAthat does not increase the objective function value at each step, we seek to solve the following optimization problem: t =minimizet subject tofk+1ft(fkf); x k+1=A(x0;:::;xk;f0;:::;fk;g0;:::;gk); for allf2 F;(1) where we setfk=f(xk)andgk=rf(xk)for allk1. As the optimization problem (1) is hard in general, we relax it into a tractable convex program (in fact, an SDP), in two steps. In the first step, we derive necessary conditions that are expressed as polynomial inequalitiesh1(z)0;:::;hm(z)0and equalitiesv1(z) = 0;:::;vm0(z) = 0, in terms of the variables inz= (f;fk;fk+1;x;xk;xk+1;g;gk;gk+1), which are dictated by the choice of the algorithm and the corresponding class of functions. Having identified these necessary polynomial constraints, the first relaxation of the optimization problem (1) is to find the minimumt2(0;1)such that the polynomialt(fkf)(fk+1f)is nonnegative over the semi-algebraic set

K=fz:hi(z)0; i2[m]; vj(z) = 0; j2[m0]g;

where here and throughout we use the notation[m] =f1;:::;mg. Nevertheless, as this second problem is also hard in general, in the second step we further relax this constraint by demanding that the nonnegativity of the polynomialt(fkf)(fk+1f)overKis 2 certified by an SOS decomposition: t(fkf)(fk+1f) =0(z) +mX i=1 i(z)hi(z) +m 0X j=1 j(z)vj(z);(2) where thei(z)"s are SOS polynomials and thej(z)"s are arbitrary polynomials. Clearly, expression (2) certifies thatt(fkf)(fk+1f)is nonnegative over the semi-algebraic setK. Furthermore, once the degree of thei"s and thej"s has been fixed, the problem of finding the leastt2(0;1)such that (2) holds is an instance of an SDP, and thus, it can be solved efficiently.

1.1 Related Work

Performance Estimation Problem.Our work was motivated by the recent framework introduced by Drori and Teboulle [10] that casts the search for worst-case rate bounds as an infinite-dimensional optimization problem: maximize f;x0;:::;xN;xf(xN)f(x) subject tof2 F; x x is a minimizer offonRn;kx0xk R; x

0;:::;xN;x2Rn;(PEP)

called thePerformance Estimation Problem(PEP). A series of recent works has highlighted the PEP as an extremely useful tool in various settings, including the study of worst-case guarantees for first-order optimization algorithms [7,8,10,36-39], the design of optimal meth- ods [9-11,19,20], and the study of worst-case guarantees for solving monotone inclusion problems [16,18,24,32]. The PEP captures the worst-case objective function accuracy over all functions withinF, afterNiterations of the algorithmAfrom any starting pointx0, which is within distanceRfrom some minimizerx. Although the PEP is infinite-dimensional (as its search space includes all functions in the classF), it can be transformed into an equivalent finite-dimensional problem using the (smooth) convex interpolationapproach introduced in [38]. Following [38], the functional constraintf2 Fis discretized by introducing2(N+ 2)additional variables capturing the value and the gradient of the function at the pointsx0;:::;xN;x. Specifically, setting

I=f0;1;:::;N;g, the finite-dimensional problem

maximize fxi;gi;figi2IfNf subject to9f2 Fsuch thatfi=f(xi);gi=rf(xi)for alli2I; x k+1=A(x0;:::;xk;f0;:::;fk;g0;:::;gk); k= 0;:::;N1; g = 0;kx0xk R;(f-PEP) 3 with decision variablesfxi;gi;figi2I, is equivalent to the PEP in the sense that their optimal values coincide and an optimal solution to the PEP can be transformed to an optimal solution to the f-PEP (and conversely). The seemingly simple step of reformulating the PEP into f-PEP by discretizing and introducing interpolability constraints leads naturally to a powerful approach for evaluating (or upper bounding) the value of the f-PEP. Specifically, if interpolability with respect toF and the iterates generated byAsatisfy conditions that are linear inf= (f0;:::;fN;f)and the entries of the Gram matrixG=X>X, whereX= (x0:::xNxg0:::gNg)2 R n2(N+2), the value of the f-PEP is upper bounded by the SDP defined by all necessary functional and algorithmic constraints, as well as the appropriate reformulations in terms of fandGof the optimality conditiong= 0and the initialization conditionkx0xk R. Moreover, in the case where interpolability with respect toFand the first-order method under consideration are both linearly Gram-representable, i.e., exactly characterized by a finite number of constraints that are linear infand in the entries ofG, the corresponding SDP relaxation of f-PEP is tight, for large enough values ofn. Interpolability conditions have been formulated exactly for various function classes, in- cluding the class ofL-smooth and-strongly convex functions [38, Theorem 5], indicator and support functions [37, Section 3.3], smooth and nonconvex functions [37, Section 3.4]. In terms of the tightness of the SDP relaxation of the f-PEP, in the case whereFis one of the aforementioned function classes, and the corresponding algorithm is a fixed-step linear first-order method as defined in [37, Definition 2.11] the SDP relaxation is tight, as long as

2(N+ 1)n[37, Proposition 2.6].

Integral Quadratic Constraints.A competing approach that uses SDPs to analyze iterative optimization algorithms was introduced in [23]. In this setting, the minimizers of the function of interest are mapped to the fixed points of a discrete-time linear dynamical system with a nonlinear feedback law, whose convergence is then analyzed using integral quadratic constraints (IQCs). The IQC approach allows one to derive analytical and numerical upper bounds on the convergence rates for various algorithms by solving small SDPs. For instance, in [23], algorithms considered include the gradient method, the heavy-ball method, Nesterov"s accelerated method (and related variants) applied to smooth and strongly convex functions. The line of research initiated in [23] has been generalized further in various directions. Some notable examples include the convergence analysis of the ADMM method [27], the case of non-strongly convex objective functions [13], the generalization to stochastic algorithms [17], and the design of first-order optimization algorithms [12]. In addition, an approach drawing upon ideas from both the PEP and IQC frameworks, and comparison between these, was proposed in [35].

1.2 Summary of Results

In most instances where the PEP framework was applied in the literature, close inspection of the proofs of the analytic worst-case bounds reveals that they can be reinterpreted as 4 simple, i.e., low-degree SOS certificates; see, e.g., [39, Appendix A], [38, Section 3.6], and [7, Section 4.1]. This observation is the point of departure for our work, whose aim is to unify the aforementioned results, and additionally, to make the search for the underlying SOS certificates explicit. As it turns out, the connection between the PEP and the SOS framework is an instance of SDP duality. Specifically, we have mentioned that relaxing the f-PEP into an SDP requires the functional and algorithmic constraints to imply linear constraints of the form hci;fi+hCi;Gi aiorhdj;fi+hDj;Gi=bi;(3) for appropriate vectorsci;dj, matricesCi;Djand scalarsai;bj. Nevertheless, notice that an equivalent way expressing the constraints in (3) is aspolynomial constraintsin the variablesf;fk;fk+1;x;xk;xk+1;g;gk;gk+1. Specifically, settingz0= (f;fk;fk+1)and z `= (x(`);xk(`);xk+1(`);g(`);gk(`);gk+1(`)), where we usex(`)to denote the`thcoordi- nate ofxfor`2[n], the constraints in (3) may be equivalently expressed as hci;z0i+nX `=1z >`Ciz`aiorhdj;z0i+X `z >`Djz`=bi;(4) i.e., as polynomials in the variablesz0;z1;:::;zn, to which we apply the SOS framework. Formalizing this connection, in Theorem 3 we show that the dual of the first level of the SOS hierarchy is equivalent to the PEP when the functional and algorithmic constraints are linearly Gram-representable. This allows to reinterpret existing rate bounds derived within the PEP framework as degree-1 SOS certificates. Nevertheless, despite its many successful applications, the PEP framework does not offer a systematic way by which the SDP relaxation can be strengthened when the function class under consideration or the employed algorithm are not linearly Gram-representable. Indeed, recall that to go from the PEP to an SDP we taketwo relaxation steps. In the first step we extract necessary (quadratic) conditions that are dictated by the interpolability with respect toFand the algorithmA. In the second step, we use the identified conditions to formulate an SDP, which gives the desired rate bounds. Now, it is clear that if the first relaxation step is loose, then the value of the SDP is not necessarily equal to the value of the PEP. In such a setting, there is no systematic way to strengthen the PEP-SDP, whereas, the sum-of-squares approach clearly provides a solution: just consider a higher level of the hierarchy. This is exactly why we believe that the sum-of-squares approach is an interesting and complementary approach to the Gram matrix approach of Taylor et al. [38]. On the other hand, the SOS approach provides a systematic framework for finding better (i.e., smaller) bounds on the worst-case contraction factor of descent algorithms, by using higher levels of the SOS hierarchy. It is worth noting though that this flexibility comes at a computational cost, in the sense that the SDPs obtained via the SOS hierarchy are dimension-dependent, i.e., any performance certificate generated by the model only applies to functions over a domain with a fixed dimensionn. To overcome this issue, we show in Theorem 2 that in the specific setting studied in this work (cf. Section 2.3), a degree-1 certificate for the univariate case (i.e.n= 1)can belifted 5 to a degree-1 certificate for the general case(n >1). Nevertheless, we have been unable to extend this lifting procedure for higher-order SOS certificates. As our goal is to identify analytic rates, the inability to work with generalnhas forced us to only consider degree-1 certificates. We leave the consideration of higher degree certificates to future work. In terms of using the SOS hierarchy to derive new convergence results, we focus on gradi- ent descent applied toL-smooth,-strongly convex functions, where the step size is chosen using inexact line search methods. Specifically, in Theorem 4, Theorem 5 and Theorem 6 we respectively study the Armijo, Wolfe, and Goldstein conditions with step size selection in both the noisy and noiseless settings. Denoting by2[0;1)the noise level in the gradient estimation (see (22)), our main results are the following rate bounds: Gradient descent with Armijo-terminated line search: f k+1f

14(1)2L

1(1 +)2

(fkf); which is valid for any noise level2[0;1), algorithm parameters2

0;1(1+)2

and >1. Gradient descent with Goldstein-terminated line search: f k+1f

14(1)2L

1(1 +)2(1)

quotesdbs_dbs47.pdfusesText_47

[PDF] Maths spé ! Sur les matrices

[PDF] maths spé terminale s

[PDF] Maths spé- graphes probabilistes

[PDF] Maths spécialité nombres premiers

[PDF] Maths spécialité T ES : complément sur les suites

[PDF] Maths Spheres

[PDF] maths st2s exercices

[PDF] maths staatistiquee

[PDF] MATHS STATISTIQUES URGENT SVP!!

[PDF] maths stats

[PDF] Maths Suite géométrique terminale

[PDF] Maths super urgent avec grosse récompense (voir le devoir )

[PDF] maths sur les fonction

[PDF] Maths sur les fonctions

[PDF] Maths sur les probabilités exercices

[PDF] Analysis of Optimization Algorithms via Sum-of-Squares arXiv

Analysis of Optimization Algorithms via

Sum-of-Squares

June 23, 2021

Abstract

Wolfe, and Goldstein).

1 Introduction

1arXiv:1906.04648v4 [math.OC] 22 Jun 2021

K=fz:hi(z)0; i2[m]; vj(z) = 0; j2[m0]g;

1.1 Related Work

0;:::;xN;x2Rn;(PEP)

I=f0;1;:::;N;g, the finite-dimensional problem

2(N+ 1)n[37, Proposition 2.6].

1.2 Summary of Results

14(1)2L

1(1 +)2

0;1(1+)2

14(1)2L

1(1 +)2(1)