Advanced tour of the Bayesian Optimization package

[1]:

from bayes_opt import BayesianOptimization

1. Suggest-Evaluate-Register Paradigm

Internally the maximize method is simply a wrapper around the methods suggest, probe, and register. If you need more control over your optimization loops the Suggest-Evaluate-Register paradigm should give you that extra flexibility.

For an example of running the BayesianOptimization in a distributed fashion (where the function being optimized is evaluated concurrently in different cores/machines/servers), checkout the async_optimization.py script in the examples folder.

[2]:

# Let's start by defining our function, bounds, and instantiating an optimization object.
def black_box_function(x, y):
    return -x ** 2 - (y - 1) ** 2 + 1

Notice that the evaluation of the blackbox function will NOT be carried out by the optimizer object. We are simulating a situation where this function could be being executed in a different machine, maybe it is written in another language, or it could even be the result of a chemistry experiment. Whatever the case may be, you can take charge of it and as long as you don’t invoke the probe or maximize methods directly, the optimizer object will ignore the blackbox function.

[3]:

optimizer = BayesianOptimization(
    f=None,
    pbounds={'x': (-2, 2), 'y': (-3, 3)},
    verbose=2,
    random_state=1,
)

One extra ingredient we will need is an UtilityFunction instance. In case it is not clear why, take a look at the literature to understand better how this method works.

[4]:

from bayes_opt import UtilityFunction

utility = UtilityFunction(kind="ucb", kappa=2.5, xi=0.0)

The suggest method of our optimizer can be called at any time. What you get back is a suggestion for the next parameter combination the optimizer wants to probe.

Notice that while the optimizer hasn’t observed any points, the suggestions will be random. However, they will stop being random and improve in quality the more points are observed.

[5]:

next_point_to_probe = optimizer.suggest(utility)
print("Next point to probe is:", next_point_to_probe)

Next point to probe is: {'x': -0.331911981189704, 'y': 1.3219469606529488}

You are now free to evaluate your function at the suggested point however/whenever you like.

[6]:

target = black_box_function(**next_point_to_probe)
print("Found the target value to be:", target)

Found the target value to be: 0.7861845912690542

Last thing left to do is to tell the optimizer what target value was observed.

[7]:

optimizer.register(
    params=next_point_to_probe,
    target=target,
)

1.1 The maximize loop

And that’s it. By repeating the steps above you recreate the internals of the maximize method. This should give you all the flexibility you need to log progress, hault execution, perform concurrent evaluations, etc.

[8]:

for _ in range(5):
    next_point = optimizer.suggest(utility)
    target = black_box_function(**next_point)
    optimizer.register(params=next_point, target=target)

    print(target, next_point)
print(optimizer.max)

-18.49187152919165 {'x': 1.8861546000771092, 'y': -2.9917780942581977}
0.7911494590443674 {'x': -0.31764604716962586, 'y': 1.3285597809731806}
-6.999999999999999 {'x': -1.9999999999999998, 'y': 3.0}
-7.0 {'x': 2.0, 'y': 3.0}
-7.503866814436659 {'x': -2.0, 'y': -1.1222315647536345}
{'target': 0.7911494590443674, 'params': {'x': -0.31764604716962586, 'y': 1.3285597809731806}}

2. Dealing with discrete parameters

There is no principled way of dealing with discrete parameters using this package.

Ok, now that we got that out of the way, how do you do it? You’re bound to be in a situation where some of your function’s parameters may only take on discrete values. Unfortunately, the nature of bayesian optimization with gaussian processes doesn’t allow for an easy/intuitive way of dealing with discrete parameters - but that doesn’t mean it is impossible. The example below showcases a simple, yet reasonably adequate, way to dealing with discrete parameters.

[9]:

def func_with_discrete_params(x, y, d):
    # Simulate necessity of having d being discrete.
    assert type(d) == int

    return ((x + y + d) // (1 + d)) / (1 + (x + y) ** 2)

[10]:

def function_to_be_optimized(x, y, w):
    d = int(w)
    return func_with_discrete_params(x, y, d)

[11]:

optimizer = BayesianOptimization(
    f=function_to_be_optimized,
    pbounds={'x': (-10, 10), 'y': (-10, 10), 'w': (0, 5)},
    verbose=2,
    random_state=1,
)

[12]:

optimizer.set_gp_params(alpha=1e-3)
optimizer.maximize()

|   iter    |  target   |     w     |     x     |     y     |
-------------------------------------------------------------
| 1         | -0.06199  | 2.085     | 4.406     | -9.998    |
| 2         | -0.0344   | 1.512     | -7.065    | -8.153    |
| 3         | -0.2177   | 0.9313    | -3.089    | -2.065    |
| 4         | 0.1865    | 2.694     | -1.616    | 3.704     |
| 5         | -0.2187   | 1.022     | 7.562     | -9.452    |
| 6         | 0.1868    | 2.533     | -1.728    | 3.815     |
| 7         | 0.05119   | 3.957     | -0.6151   | 6.785     |
| 8         | 0.1761    | 0.5799    | 1.181     | 4.054     |
| 9         | 0.04045   | 4.004     | 4.304     | 2.656     |
| 10        | 0.07509   | 0.0       | 4.843     | 7.759     |
| 11        | 0.3512    | 0.0       | -5.713    | 7.072     |
| 12        | -0.8068   | 0.0       | -9.09     | 8.6       |
| 13        | 0.3774    | 0.3974    | -4.19     | 6.264     |
| 14        | 0.157     | 0.0       | -3.587    | 8.534     |
| 15        | -0.7891   | 0.4794    | -5.536    | 4.298     |
| 16        | 0.1176    | 1.038     | -4.671    | 7.41      |
| 17        | 0.1815    | 0.4815    | -2.66     | 6.6       |
| 18        | 0.08677   | 1.933     | -0.1438   | 4.839     |
| 19        | 0.1687    | 1.139     | -0.4707   | 2.69      |
| 20        | 0.1133    | 2.363     | 1.344     | 2.736     |
| 21        | 0.2401    | 0.0       | 1.441     | 1.949     |
| 22        | 0.1568    | 0.1832    | 3.2       | 2.904     |
| 23        | 0.2722    | 0.9731    | 2.625     | 0.5406    |
| 24        | 0.0       | 1.149     | 0.7191    | 0.2267    |
| 25        | 0.1686    | 0.0       | 4.181     | 0.5867    |
| 26        | 0.0644    | 2.276     | 3.975     | -0.1631   |
| 27        | 0.4397    | 0.08737   | 2.66      | -1.531    |
| 28        | 0.2904    | 0.0       | 3.913     | -2.35     |
| 29        | -0.9874   | 0.0       | 1.992     | -3.005    |
| 30        | 0.3001    | 0.2116    | 3.375     | -0.9955   |
=============================================================

3. Tuning the underlying Gaussian Process

The bayesian optimization algorithm works by performing a gaussian process regression of the observed combination of parameters and their associated target values. The predicted parameter\(\rightarrow\)target hyper-surface (and its uncertainty) is then used to guide the next best point to probe.

3.1 Passing parameter to the GP

Depending on the problem it could be beneficial to change the default parameters of the underlying GP. You can use the optimizer.set_gp_params method to do this:

[13]:

optimizer = BayesianOptimization(
    f=black_box_function,
    pbounds={'x': (-2, 2), 'y': (-3, 3)},
    verbose=2,
    random_state=1,
)
optimizer.set_gp_params(alpha=1e-3, n_restarts_optimizer=5)
optimizer.maximize(
    init_points=1,
    n_iter=5
)

|   iter    |  target   |     x     |     y     |
-------------------------------------------------
| 1         | 0.7862    | -0.3319   | 1.322     |
| 2         | -18.49    | 1.886     | -2.992    |
| 3         | 0.7911    | -0.3176   | 1.329     |
| 4         | -6.11     | -1.763    | 3.0       |
| 5         | -2.895    | 1.533     | 2.243     |
| 6         | -4.806    | -2.0      | -0.3439   |
=================================================

3.2 Tuning the `alpha` parameter

When dealing with functions with discrete parameters,or particularly erratic target space it might be beneficial to increase the value of the alpha parameter. This parameters controls how much noise the GP can handle, so increase it whenever you think that extra flexibility is needed.

3.3 Changing kernels

By default this package uses the Matern 2.5 kernel. Depending on your use case you may find that tunning the GP kernel could be beneficial. You’re on your own here since these are very specific solutions to very specific problems.

Observers Continued

Observers are objects that subscribe and listen to particular events fired by the BayesianOptimization object.

When an event gets fired a callback function is called with the event and the BayesianOptimization instance passed as parameters. The callback can be specified at the time of subscription. If none is given it will look for an update method from the observer.

[14]:

from bayes_opt.event import DEFAULT_EVENTS, Events

[15]:

optimizer = BayesianOptimization(
    f=black_box_function,
    pbounds={'x': (-2, 2), 'y': (-3, 3)},
    verbose=2,
    random_state=1,
)

[16]:

class BasicObserver:
    def update(self, event, instance):
        """Does whatever you want with the event and `BayesianOptimization` instance."""
        print("Event `{}` was observed".format(event))

[17]:

my_observer = BasicObserver()

optimizer.subscribe(
    event=Events.OPTIMIZATION_STEP,
    subscriber=my_observer,
    callback=None, # Will use the `update` method as callback
)

Alternatively you have the option to pass a completely different callback.

[18]:

def my_callback(event, instance):
    print("Go nuts here!")

optimizer.subscribe(
    event=Events.OPTIMIZATION_START,
    subscriber="Any hashable object",
    callback=my_callback,
)

[19]:

optimizer.maximize(init_points=1, n_iter=2)

Go nuts here!
Event `optimization:step` was observed
Event `optimization:step` was observed
Event `optimization:step` was observed

For a list of all default events you can checkout DEFAULT_EVENTS

[20]:

DEFAULT_EVENTS

[20]:

['optimization:start', 'optimization:step', 'optimization:end']