[PDF] CS281B/Stat241B. Statistical Learning Theory. Lecture 14.





Previous PDF Next PDF



THE SECOND DERIVATIVE

1. Curvature- concavity and convexity. An intuitive definition: a function is said to be convex at an interval if for all pairs of points on the graph



Quasi-Concave Programming

for differentiable quasi-concave functions.4. In speaking of a quasi-concave function some specific domain of definition



Lecture 14 The Last One!

27 ago 2015 Convex and Concave Functions. Definition. Suppose X is a convex subset of Rn. A function f : X ? R is: concave if.



Concave functions in economics 1. Preliminaries 1 2. Concave

John Riley. 6. We have the following alternative definition. Definition: Concave function. The differentiable function f is concave on X if for any 0.



1 Theory of convex functions

1 mar 2016 Convex concave



Quasi-Concave Programming Author(s): Kenneth J. Arrow and Alain

which is an alternative definition of concavity for differentiable functions. The inequality (1.3) states that if f(x) is concave it lies everywhere on or.



Concave and Convex Functions1

Concave and Convex Functions1. 1 Basic Definitions. Definition 1. Let C ? RN be non-empty and convex and let f : C ? R. 1. (a) f is concave iff for any a 



Existence and Uniqueness of Equilibrium Points for Concave N

The existence of an equilibrium point for this concave n-person game is show DEFINITION: The function o(x r) will be called diagonally strictly concave ...



CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

Mixable loss: Reduction to mix loss. Crux: exp-concavity is convenient but too strong. Definition: We say L(a x) is ?-mixable if.



COEFFICIENT ESTIMATES FOR BI-CONCAVE FUNCTIONS 1

13 feb 2018 Introduction Preliminaries and DeFinition. The knowledge on bi-concave univalent functions is based on univalent

CS281B/Stat241B. Statistical Learning Theory.

Lecture 14.

Wouter M. Koolen

•Convex losses•Exp-concave losses•Mixable losses•The gradient trick•Specialists 1 Menu Today we solve new online learning problems by reducing themto problems/algorithms/analyses we already cracked before. 2

Prediction with Expert Advice

Prediction with expert advice:

Protocol:•Fort= 1,2,...-

Experts announce actionsa1t,...,aKt? A.

Learner chooses an actionat? A.

Adversary reveals outcomext? X.

Learner incurs lossL(at,xt).

Goal: small regret w.r.t. best expert.

3

Convex: Reduction to dot loss

SayL(a,x)is[0,1]-bounded and convex inafor eachx:

K k=1w kL(ak,x)≥ L? K? k=1w kak,x?

Then we can feed Hedge?kt=L(akt,xt).

Hedge outputswt. Play the mean actionat=?Kk=1wktakt. K k=1w ktL(akt,xt)

Dot lossw?

t?t≥ L(at,xt)? actual loss Dot-loss bound translates to convex bounded lossL. R

T/2lnK4

Exp-concave: Reduction to mix loss

Definition:We sayL(a,x)isη-exp-concaveinafor eachxif K k=1w The AA outputswt. Play the mean actionat=?Kk=1wktakt. -ln? K? k=1w kte-ηL(akt,xt)?

Mix loss ofwtonη?t≥ηL?

K? k=1w ktakt,xt? actual loss 5

Mix-loss bound translates to exp-concave lossL:

R 6

Example: square loss is exp-concave

Let's consider

L(a,x) = (a-x)2

whereA=X= [-1,+1]. Findηsuch thatLisη-exp-concave by testing negative second derivative: 2 ∂a2e-η(a-x)2=∂ ∂a-2e-η(a-x)2η(a-x) =e-η(a-x)2η?4η(a-x)2-2?

ForX=A= [-Y,+Y]we findη=1

8Y2. 7

Mixable loss: Reduction to mix loss

Crux: exp-concavity is convenient buttoo strong.

Definition:We sayL(a,x)isη-mixableif

ηln?

K? k=1w ke-ηL(ak,x)? Mapping fromw,a1,...,aKto witnessacalledsubstitution function. Mixable losses behave just enough like the mix loss to carry the AA regret bound through. R 8

Square loss is mixable

Square loss is mixable withη=1

2. The substitution function is

w,a1,...,aK?→m 1

2(-1)-m1

2(+1) 4 wheremη(x) =-1

ηln?Kk=1wke-η(ak-x)2

See (Vovk 1990, Haussler, Kivinen, Warmuth, 1998)

9

Mixable loss list

Popular mixable losses:•mix loss, log loss, entropic loss•square loss, Brier loss•Hellinger lossA=X= [0,1]:

L(a,x):=1

2?(⎷

1-x-⎷

1-a)2+ (⎷

x-⎷ a)? Characterisation of mixability: (Van Erven, Reid, Williamson 2012). 10

Gradient trick

11

Gradient trick

Abusing an algorithm that can compete with the bestexpertto in fact compete with the bestconvex combination(cf portfolios).

Assume convex lossL(w,x):

L(w,x)≥ L(wt,x) + (w-wt)?wL(wt,x)?

First-order expansion around algorithm's actionwt

Idea: feed Hedge?t=?wL(wt,x)(may need restriction + translation + scaling to make this[0,1]bounded). Getwt. Playwt. 12

Imaginary regret upper bounds the actual regret:

T t=1w? t?t-minkT t=1? k t= maxkT t=1? w? t?t-?kt? = max wT t=1(wt-w)??w?(wt,x) ≥maxwT t=1(L(wt,x)- L(w,x)) T? t=1L(wt,x)-maxwT t=1L(w,x) Caveat: even if original loss was nice (mixable/curved/...), the imagined lossw?→w??wL(wt,x)islinear. Regret of order⎷ T. 13

Specialists

14

Motivation

Not all expert predictions/actions available every round.•Missing data•Noise•Too expensive ($/time/memory)

How to model missingness? Adversarial.

How to redefine the objective? New variant of regret.

How to still do something optimal? Upgrade of AA.

15

Mix loss game with specialists

Protocol:•Fort= 1,2,...-

Adversary picks the subsetAt?[K]ofawakespecialists. Learner chooses a distributionwton awake specialistsAt. Adversary reveals loss vector?t?(-∞,∞]At.

Learner's loss is themix loss-ln??

k?Atwt,ke-?t,k? 16

Objective

There is no loss when a specialist is asleep.

Regret w.r.t. specialistj: only measured during rounds wherejis awake R j T=? t?[T] j?At-ln?? k?Atw kte-?kt? t?[T] j?At? j t 17

Specialist AA

Definition:TheSpecialist Aggregating Algorithm(SAA) maintains a distributionut. It starts uniformuk1= 1/K.

In roundtwith awake expertsAt, SAA predict with

w kt=ut(k|At) =ukt1{k?At} j?Atujt

Update:

u k t+1=?????u k te-?kt j?Atukte-?kt? j?Atuktk?At u k tk /?At

AA update relative to awake setAt

18

What makes this tick

Consider the sequence??1,??2obtained by completing?1,?2by assigning in each round the SAA mix loss to all the asleep specialists. Theorem:SAA on?and AA on??produce identical weightsut=w?t and suffer identical mix loss.Proof: (homework) 19

Specialist regret bound for SAA

The AA has small regret w.r.t. expertj:

lnK≥T? t=1-ln? K? k=1w ?tke-??tk? -T? t=1? tj T? t=1-ln?? k?Atw kte-?kt? t=[T] t?Aj? j t-? t=[T] t/?Aj-ln?? k?Atw kte-?kt? t=[T] t?Aj-ln?? k?Atw kte-?kt? t=[T] t?Aj? j t =Rj T Adversary more power (sleeping) but regret stilllnK: SAA minimax for specialist mix-loss regret game. 20

Discussion

•Convex bounded losses are easier than dot loss.•Mixable losses are easier than mix loss.•Gradient trick allows us to compete with mixtures (at a cost)•Specialists extension deals with missing data (at no cost).

21
quotesdbs_dbs14.pdfusesText_20
[PDF] concealed carry application

[PDF] concealed carry permit

[PDF] concealed carry permit paperwork

[PDF] concealed carry questionnaire

[PDF] concealed handgun application

[PDF] concentrated acid hydrolysis of biomass

[PDF] concentrated solution

[PDF] concentrated sulfuric acid hydrolysis

[PDF] concentration aqueous solution meaning

[PDF] concentration calculations worksheet answer key

[PDF] concentration calculations worksheet answers

[PDF] concentration calculations worksheet gcse

[PDF] concentration calculations worksheet gcse tes

[PDF] concentration calculations worksheet key

[PDF] concentration calculations worksheet pdf