# Statistical methods of interest: Bayesian Optimization

Optimization is a classical mathematical problem that has been studied for centuries and remains a vibrant area of research. The high interest in optimization is caused by many factors including the complexity of the problem and the value of optimization solutions to society. We find problems that need to be optimized in every field, from finance to healthcare, from computer science to cybersecurity, from transportation and logistics to sport, and so on..

Increasing computational power together with the growing complexity of problems are driving the development of new and more powerful algorithms. Often the growing complexity comes from the need to optimize a phenomenon modelled with a function that is difficult to handle. Consider the case where we can only observe the output of the function given our input, but we don’t know what the function looks like (often called the black box function), and we want to optimize it. To add complexity to the problem, the function could be expensive to evaluate in terms of time, long run times during processing, or high monetary costs.

Bayesian optimization (Frazier, 2018) offers a smart way of dealing with the problem. As the name suggests, the method is based on the Bayesian paradigm of prior and posterior probabilities. The two main components of the algorithm are the distribution that we use to model the function of interest, and an acquisition function that is used to decide where it’s more valuable to evaluate the function. Bayesian optimization is an iterative process that starts with modeling the black box function with a prior probability distribution (surrogate model) using an initial set of evaluation of the unknown function, then it uses the acquisition function, that is a function of the surrogate model, to choose the next point where to evaluate the function, it updates the surrogate model with the new evaluation (computing the posterior distribution), and it continues until a stopping criteria is met. It is common to set as stopping criteria a maximum number of iterations.

To summarize, the main steps of Bayesian optimization are:

- Build a surrogate model using n0 initial evaluation of the function f(x).
- While the stopping criteria is not met:
- Use the acquisition function to find the point xn where to evaluate the function.
- Evaluate the function in the new point, f(xn)
- Use the new evaluation to compute the posterior distribution of the surrogate model.

When the algorithm is complete it returns a set of points, and we can then select the one that gives the largest value.

This way, we exchange the problem of dealing with the unknown function with one of working with the acquisition function, the benefit of which is that the acquisition function should have a simpler and easier form. There exists several ways of choosing the prior distribution for the unknown function and for the acquisition function. A popular approach is to use the Gaussian process to model the black box function and the expected improvement as the acquisition function. The expected improvement describes the potential improvement of the function of interest with respect to its maximum value observed. Using the Gaussian process makes it possible to write the expected improvement in a close form and makes it easier to compute the maximum (that will be the next point to evaluate the unknown function).

Optimization is a field in constant development, and its algorithms can be of great value when working with cybersecurity. At Praxis Security Labs, we are always finding innovative solutions for our customers, and Bayesian optimization is one approach we use to help them.

To know more about what we do and how we can help you, schedule a meeting with our CEO, Kai Roer, or with our Director of Research, Thea Mannix.

References: Frazier, P. I. (2018) A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.