CS281B/Stat241B. Statistical Learning Theory. Lecture 14. PDF

THE SECOND DERIVATIVE

1. Curvature- concavity and convexity. An intuitive definition: a function is said to be convex at an interval if for all pairs of points on the graph

Quasi-Concave Programming

for differentiable quasi-concave functions.4. In speaking of a quasi-concave function some specific domain of definition

Lecture 14 The Last One!

27 ago 2015 Convex and Concave Functions. Definition. Suppose X is a convex subset of Rn. A function f : X ? R is: concave if.

Concave functions in economics 1. Preliminaries 1 2. Concave

John Riley. 6. We have the following alternative definition. Definition: Concave function. The differentiable function f is concave on X if for any 0.

Quasi-Concave Programming Author(s): Kenneth J. Arrow and Alain

which is an alternative definition of concavity for differentiable functions. The inequality (1.3) states that if f(x) is concave it lies everywhere on or.

Concave and Convex Functions1

Concave and Convex Functions1. 1 Basic Definitions. Definition 1. Let C ? RN be non-empty and convex and let f : C ? R. 1. (a) f is concave iff for any a

Existence and Uniqueness of Equilibrium Points for Concave N

The existence of an equilibrium point for this concave n-person game is show DEFINITION: The function o(x r) will be called diagonally strictly concave ...

CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

Mixable loss: Reduction to mix loss. Crux: exp-concavity is convenient but too strong. Definition: We say L(a x) is ?-mixable if.

COEFFICIENT ESTIMATES FOR BI-CONCAVE FUNCTIONS 1

13 feb 2018 Introduction Preliminaries and DeFinition. The knowledge on bi-concave univalent functions is based on univalent

CS281B/Stat241B. Statistical Learning Theory.

Lecture 14.

Wouter M. Koolen

Convex lossesExp-concave lossesMixable lossesThe gradient trickSpecialists 1 Menu Today we solve new online learning problems by reducing themto problems/algorithms/analyses we already cracked before. 2

Prediction with Expert Advice

Prediction with expert advice:

Protocol:Fort= 1,2,...-

Experts announce actionsa1t,...,aKt? A.

Learner chooses an actionat? A.

Adversary reveals outcomext? X.

Learner incurs lossL(at,xt).

Goal: small regret w.r.t. best expert.

Convex: Reduction to dot loss

SayL(a,x)is[0,1]-bounded and convex inafor eachx:

K k=1w kL(ak,x)≥ L? K? k=1w kak,x?

Then we can feed Hedge?kt=L(akt,xt).

Hedge outputswt. Play the mean actionat=?Kk=1wktakt. K k=1w ktL(akt,xt)

Dot lossw?

t?t≥ L(at,xt)? actual loss Dot-loss bound translates to convex bounded lossL. R

T/2lnK4

Exp-concave: Reduction to mix loss

Definition:We sayL(a,x)isη-exp-concaveinafor eachxif K k=1w The AA outputswt. Play the mean actionat=?Kk=1wktakt. -ln? K? k=1w kte-ηL(akt,xt)?

Mix loss ofwtonη?t≥ηL?

K? k=1w ktakt,xt? actual loss 5

Mix-loss bound translates to exp-concave lossL:

R 6

Example: square loss is exp-concave

Let's consider

L(a,x) = (a-x)2

whereA=X= [-1,+1]. Findηsuch thatLisη-exp-concave by testing negative second derivative: 2 ∂a2e-η(a-x)2=∂ ∂a-2e-η(a-x)2η(a-x) =e-η(a-x)2η?4η(a-x)2-2?

ForX=A= [-Y,+Y]we findη=1

8Y2. 7

Mixable loss: Reduction to mix loss

Crux: exp-concavity is convenient buttoo strong.

Definition:We sayL(a,x)isη-mixableif

ηln?

K? k=1w ke-ηL(ak,x)? Mapping fromw,a1,...,aKto witnessacalledsubstitution function. Mixable losses behave just enough like the mix loss to carry the AA regret bound through. R 8

Square loss is mixable

Square loss is mixable withη=1

2. The substitution function is

w,a1,...,aK?→m 1

2(-1)-m1

2(+1) 4 wheremη(x) =-1

ηln?Kk=1wke-η(ak-x)2

See (Vovk 1990, Haussler, Kivinen, Warmuth, 1998)

Mixable loss list

Popular mixable losses:mix loss, log loss, entropic losssquare loss, Brier lossHellinger lossA=X= [0,1]:

L(a,x):=1

2?(⎷

1-x-⎷

1-a)2+ (⎷

x-⎷ a)? Characterisation of mixability: (Van Erven, Reid, Williamson 2012). 10

Gradient trick

Abusing an algorithm that can compete with the bestexpertto in fact compete with the bestconvex combination(cf portfolios).

Assume convex lossL(w,x):

L(w,x)≥ L(wt,x) + (w-wt)?wL(wt,x)?

First-order expansion around algorithm's actionwt

Idea: feed Hedge?t=?wL(wt,x)(may need restriction + translation + scaling to make this[0,1]bounded). Getwt. Playwt. 12

Imaginary regret upper bounds the actual regret:

T t=1w? t?t-minkT t=1? k t= maxkT t=1? w? t?t-?kt? = max wT t=1(wt-w)??w?(wt,x) ≥maxwT t=1(L(wt,x)- L(w,x)) T? t=1L(wt,x)-maxwT t=1L(w,x) Caveat: even if original loss was nice (mixable/curved/...), the imagined lossw?→w??wL(wt,x)islinear. Regret of order⎷ T. 13

Specialists

Motivation

Not all expert predictions/actions available every round.Missing dataNoiseToo expensive ($/time/memory)

How to model missingness? Adversarial.

How to redefine the objective? New variant of regret.

How to still do something optimal? Upgrade of AA.

Mix loss game with specialists

Protocol:Fort= 1,2,...-

Adversary picks the subsetAt?[K]ofawakespecialists. Learner chooses a distributionwton awake specialistsAt. Adversary reveals loss vector?t?(-∞,∞]At.

Learner's loss is themix loss-ln??

k?Atwt,ke-?t,k? 16

Objective

There is no loss when a specialist is asleep.

Regret w.r.t. specialistj: only measured during rounds wherejis awake R j T=? t?[T] j?At-ln?? k?Atw kte-?kt? t?[T] j?At? j t 17

Specialist AA

Definition:TheSpecialist Aggregating Algorithm(SAA) maintains a distributionut. It starts uniformuk1= 1/K.

In roundtwith awake expertsAt, SAA predict with

w kt=ut(k|At) =ukt1{k?At} j?Atujt

Update:

u k t+1=?????u k te-?kt j?Atukte-?kt? j?Atuktk?At u k tk /?At

AA update relative to awake setAt

What makes this tick

Consider the sequence??1,??2obtained by completing?1,?2by assigning in each round the SAA mix loss to all the asleep specialists. Theorem:SAA on?and AA on??produce identical weightsut=w?t and suffer identical mix loss.Proof: (homework) 19

Specialist regret bound for SAA

The AA has small regret w.r.t. expertj:

lnK≥T? t=1-ln? K? k=1w ?tke-??tk? -T? t=1? tj T? t=1-ln?? k?Atw kte-?kt? t=[T] t?Aj? j t-? t=[T] t/?Aj-ln?? k?Atw kte-?kt? t=[T] t?Aj-ln?? k?Atw kte-?kt? t=[T] t?Aj? j t =Rj T Adversary more power (sleeping) but regret stilllnK: SAA minimax for specialist mix-loss regret game. 20

Discussion

Convex bounded losses are easier than dot loss.Mixable losses are easier than mix loss.Gradient trick allows us to compete with mixtures (at a cost)Specialists extension deals with missing data (at no cost).

21
quotesdbs_dbs14.pdfusesText_20

[PDF] concealed carry application

[PDF] concealed carry permit

[PDF] concealed carry permit paperwork

[PDF] concealed carry questionnaire

[PDF] concealed handgun application

[PDF] concentrated acid hydrolysis of biomass

[PDF] concentrated solution

[PDF] concentrated sulfuric acid hydrolysis

[PDF] concentration aqueous solution meaning

[PDF] concentration calculations worksheet answer key

[PDF] concentration calculations worksheet answers

[PDF] concentration calculations worksheet gcse

[PDF] concentration calculations worksheet gcse tes

[PDF] concentration calculations worksheet key

[PDF] concentration calculations worksheet pdf

[PDF] CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

CS281B/Stat241B. Statistical Learning Theory.

Lecture 14.

Wouter M. Koolen

Prediction with Expert Advice

Prediction with expert advice:

Protocol:Fort= 1,2,...-

Experts announce actionsa1t,...,aKt? A.

Learner chooses an actionat? A.

Adversary reveals outcomext? X.

Learner incurs lossL(at,xt).

Goal: small regret w.r.t. best expert.

Convex: Reduction to dot loss

SayL(a,x)is[0,1]-bounded and convex inafor eachx:

Then we can feed Hedge?kt=L(akt,xt).

Dot lossw?

T/2lnK4

Exp-concave: Reduction to mix loss

Mix loss ofwtonη?t≥ηL?

Mix-loss bound translates to exp-concave lossL:

Example: square loss is exp-concave

Let's consider

L(a,x) = (a-x)2

ForX=A= [-Y,+Y]we findη=1

Mixable loss: Reduction to mix loss

Crux: exp-concavity is convenient buttoo strong.

Definition:We sayL(a,x)isη-mixableif

ηln?

Square loss is mixable

Square loss is mixable withη=1

2. The substitution function is

2(-1)-m1

ηln?Kk=1wke-η(ak-x)2

See (Vovk 1990, Haussler, Kivinen, Warmuth, 1998)

Mixable loss list

L(a,x):=1

2?(⎷

1-x-⎷

1-a)2+ (⎷

Gradient trick

Gradient trick

Assume convex lossL(w,x):

L(w,x)≥ L(wt,x) + (w-wt)?wL(wt,x)?

First-order expansion around algorithm's actionwt

Imaginary regret upper bounds the actual regret:

Specialists

Motivation

How to model missingness? Adversarial.

How to still do something optimal? Upgrade of AA.

Mix loss game with specialists

Protocol:Fort= 1,2,...-

Learner's loss is themix loss-ln??

Objective

There is no loss when a specialist is asleep.

Specialist AA

In roundtwith awake expertsAt, SAA predict with

Update:

AA update relative to awake setAt

What makes this tick

Specialist regret bound for SAA

The AA has small regret w.r.t. expertj:

Discussion

Protocol:Fort= 1,2,...-

Protocol:Fort= 1,2,...-