Introduction
Optimization is everywhere, though it can mean different things from different perspectives. From basic calculus, optimization can be simply used to find the maximum or minimum of a function such as in the real domain . In this case, we can either use a gradient-based method or simply spot the solution due to the fact that x4 and x2 are always nonnegative, the minimum values of x4 and x2 are zero, thus has a minimum at . This can be confirmed easily by taking the first derivative of f(x) with respect to x, we have
(I.1) which has only one solution because x is a real number. The condition seems to be sufficient to determine the optimal solution in this case. In fact, this function is convex with only one minimum in the whole real domain.
However, things become more complicated when f(x) is highly nonlinear with multiple optima. For example, if we try to find the maximum value of in the real domain, we can naively use
(I.2) which has an infinite number of solutions for . There is no simple formula for these solutions, thus a numerical method has to be used to calculate these solutions. In addition, even with all the efforts to find these solutions, care has to be taken because the actual global maximum occurs at . However, this solution can only be found by taking the limit , and it is not part of the solutions from the above condition of . This highlights the potential difficulty for nonlinear problems with multiple optima or multi-modality.
Furthermore, not all functions are smooth. For example, if we try to use to find the minimum of
(I.3) we will realize that f(x) is not differentiable at , though the global minimum occurs at . In this case, optimization techniques that require the calculation of derivatives will not work.
Problems become more challenging in higher-dimensional spaces. For example, the nonlinear function
(I.4) where for , has a minimum at , but this function is not differentiable at x*.
Therefore, optimization techniques have to be diverse to use gradient information when appropriate, and not to use it when it is not defined or not easily calculated. Though the above nonlinear optimization problems can be challenging to solve, constraints on the search domain and certain independent variables can make the search domain much more complicated, which can consequently make the problem even harder to solve. In addition, sometime, we have to optimize several objective functions instead of just one function, which will in turn make a problem more challenging to solve.
In general, it is possible to write most optimization problems in the general form mathematically
(I.5) (I.6) (I.7) where fi(x),??j(x), and ?k(x) are functions of the design vector
(I.8) Here, the components xi of x are called design or decision variables, and they can be real continuous, discrete, or the mixture of these two. The functions fi(x) where are called the objective functions, and in the case of , there is only a single objective, and the problem becomes single-objective optimization. The objective function is sometimes called the cost function, loss function, or energy function in the literature. The space spanned by the decision variables is called the design space or search space Rn, while the space formed by the objective function values is called the response space or objective landscape. The equalities for ?j and inequalities for ?k are called constraints. It is worth pointing out that we can also write the inequalities in the other way , and we can also formulate the objectives as maximization.
If a solution vector x satisfies all the constraints ?j(x) and ?k(x), it is called a feasible solution; otherwise, it is infeasible. All the feasible points or vectors form the feasible regions in the search space.
In a rare but extreme case where there is no objective at all, there are only constraints. Such a problem is called a feasibility problem and any feasible solution is an optimal solution to such problems.
If all the problem functions are linear, the problem becomes a linear programming (LP) problem. Here, the term "programming" means planning, it has nothing to do with computer programming in this context. Thus, mathematical optimization is also called mathematical programming. In addition, if the variables in LP only take integer values, the LP becomes an integer LP or simple integer programming (IP). If the variables can only take two values (0 or 1), such an LP is called binary IP.
If there are no constraints (i.e. and ), the problem becomes an unconstrained optimization problem. If but , it becomes equality-constrained optimization. Conversely, if but , it becomes the inequality-constrained problem.
The classification of optimization problems can also be done by looking at the modality of the objective landscape, which can be divided into multimodal problems and unimodal problems including convex optimization. In addition, classification can also be about the determinacy of the problem. If there is NO randomness in the formulation, the problem is called deterministic and in fact all the above problems are essentially deterministic. However, if there is uncertainty in the variables or function forms, then optimization involves probability distribution and expectation, such problems are often called stochastic optimization or robust optimization.
Classification of optimization problems and the terminologies used in the optimization literature can be diverse and confusing. We summarize most of these terms in Figure I.1.
Figure I.1 Classification of optimization problems.
Whether an optimization problem is considered easy or hard, it can depend on many factors and the actual perspective of mathematical formulations. In fact, three factors that make a problem more challenging are: nonlinearity of the objective function, the high dimensionality of the problem, and the complex shape of the search domain.
- Nonliearity and multimodality: The high nonlinearity of a function to be optimized usually means multimodality with multiple local modes with multiple (even infinite) optima. In most cases, algorithms to solve such problems are more likely to get trapped in local modes.
- High dimensionality: High dimensionality is associated with the so-called curse of dimensionality where the distance becomes large, while the volume of any algorithm that can actually search becomes insignificant compared to the vast feasible space.
- Constraints: The multiple complex constraints can make the feasible region complicated with irregular geometry or shapes. In some cases, feasible regions can be split into multiple disconnected regions with isolated islands, which makes it harder for algorithms to search all the feasible regions (thus potentially missing the true optimality).
Other factors such as the evaluation time of an objective are also important. In many applications such as protein folding, bio-informatics, aero-space engineering, and deep machine learning (ML), the evaluation of a single objective can take a long time (from a few hours to days or even weeks), therefore the computational costs can be very high. Thus, the choice of efficient algorithms becomes paramount.
Algorithms for solving optimization problems tend to be iterative, and thus multiple evaluations of objectives are needed, typically hundreds or thousands (or even millions) of evaluations. Different algorithms may have different efficiency and different requirements. For example, Newton's method, which is gradient-based, is very efficient for solving smooth objective functions, but they can get stuck in local modes if the objective is highly multimodal. If the objective is not smooth or has a kink, then the Nelder-Mead simplex method can be used because it is a gradient-free method, and can work well even for problems with discontinuities, but it can become slow and get stuck in a local mode. Algorithms for solving nonlinear optimization are diverse, including the trust-region method, interior-point method, and others, but they are mostly local search methods. In a special case when the objective becomes convex, they can be very efficient. Quadratic programming (QP) and sequential quadratic programming use such convexity properties to their advantage.
The simplex method for solving LP can be efficient in practice, but it requires all the problem functions to be linear. But, if an LP problem has integer variables, the simplex method will not work directly, it has to be combined with branch and bound to solve IP problems.
As traditional methods are usually local search algorithms, one of the current trends is to use heuristic and metaheuristic algorithms. Here meta- means "beyond" or "higher level," and they generally perform better than simple heuristics. In addition, all metaheuristic algorithms use certain tradeoff of randomization and local search. It is worth pointing out that...