Bandit convex optimization towards tight bounds

Can a distributed bandit online algorithm solve the optimization problem?

ONE-POINT BANDIT FEEDBACK In this section, we propose a distributed bandit online algorithm with a one-point sampling gradient estimator to solve the considered optimization problem

We then derive expected regret and constraint violation bounds for the pro- posed algorithm

The proposed algorithm is given in pseudo-code as Al- gorithm 1

Can a multi-armed bandit problem be distributed outside a class?

They may be distributed outside this class only with the permission of the Instructor

One could model the online routing problem as a multi-armed bandit problem

Each of the N “arms” of the bandit is a path throughout the network; the loss function measures the time it takes a packet to travel along that path

What is bandit convex optimization (BCO)?

Bandit Convex Optimization (BCO) is a fundamental framework for decisionmaking under uncertainty, which generalizes many problems from the realm of on-line and statistical learning

While the special case of linear cost functions is wellunderstood, a gap on the attainable regret for BCO withnonlinearlosses remainsan important open question

Categories