Iterative Optimizers

Name: Iterative Optimizers | Difficulty Measures and Benchmarks
Brand: Wiley
Price: 139.99 EUR
Availability: OnlineOnly

Difficulty Measures and Benchmarks

Maurice Clerc(Author)

Wiley (Publisher)

1st Edition

Published on 10. April 2019

224 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-61240-7 (ISBN)

€139.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

1. Some Definitions.

2. Difficulty of the Difficulty.

3. Landscape Typology.

4. LandGener.

5. Test Cases.

6. Difficulty vs Dimension.

7. Exploitation and Exploration vs Difficulty.

8. The Explo2 Algorithm.

9. Balance and Perceived Difficulty.

1
Some Definitions

An important point is that one is supposed to work with a digital computer and that, consequently, every manipulated entity is discrete finite. In particular, the difference between two values cannot be less than a given threshold e, commonly known as computer machine epsilon, even if, in fact, it also depends on the operating system and the language being used.

Most theoretical analyses, especially for convergences, require continuity or even differentiability properties. As a result, they are not valid in our discrete context. A convergence theorem may very simply not hold therein.

1.1. Continuous case vs discrete case: when a theorem no longer holds

A classical theorem is when, with the set of non-negative real numbers, the sequence xn+1 = axn converges to zero for every a positive smaller than 1, and this holds independently of the initial number x1.

However, on a computer, this is different. For simplicity, suppose a machine epsilon e equal to 1. Then, axn is rounded to the nearest integer. If, for example, x1 = 10, then the sequence stabilizes at 1 and does not converge to zero. The theorem no longer holds. More specifically, it must be modified: the convergence still exists, but to e.

Figure 1.1. On a digital computer a sequence that theoretically converges to zero may in fact converge to a positive value wrongly said to be zero

Naturally, this is a simplistic example; however, it encourages considering with much caution any theorem based on an assumption of continuity or, a fortiori, of differentiability.

Similarly, one also must consider with suspicion any computational result on digital computer whose absolute value is much smaller than e. For example, if e = 10-32, then it cannot be advanced that a statement like the one below is relevant:

With this problem, optimizer A is better than optimizer B, because it yields a minimum equal to 10-40 and B only 10-35.

Therefore, although the concepts of iterative optimization are supposedly known (more details are given in Clerc (2015)), it is necessary to partially redefine them, in order to then allow valid analyses, generally based on enumerations.

1.2. Landscape

An optimization problem is defined by a space of points x, known as definition space or search space, valued by a function f. Formally, it is a scalar field, but we are conventionally referring to a landscape and the role of an optimizer is to find optimal value point(s). Only the search for minima is considered here (see definition below), as a specific case that can always be easily referred to.

1.2.1. Size of a landscape

One might assume that a difficulty measure perceived by an optimizer (see section 2.8) for a given problem must be, among other things, an increasing function of the "size" of the latter. But can we define it?

A simple notion, not to say simplistic, is to say that it is the amount of data needed to describe the landscape of the problem. For example, on a computer which, because of its machine epsilon, can discriminate only N values, a landscape whose point space is of dimension D then has a size

[1.1]

In fact, the definition space includes ND points that can be presumably assumed to be classified, for example, in a lexicographic manner, and the landscape is then completely described by the ordered list of values at each point.

This is not very discriminating, since it implies that for a given D, all landscapes have the same size. In reality, it is a maximal size that assumes that the value at a point is independent of the values at other points. This is true, for example, for a totally random landscape, but does not hold in general. Most fortunately, the only method to find the optimum would otherwise be an exhaustive search. Moreover, the existence of data compression algorithms (of images, for example) effectively shows that it is often possible to do better. See also problems such as that of the traveling salesman with D cities, which can be described using only D (D - 1) /2 arc values.

A second idea would be to consider instead the minimum number of values needed to describe the landscape. However, on the one hand, it is often unknown and, on the other hand, it can be in any way related to the difficulty perceived by an optimizer (think of some complex landscapes completely described using simple formulas).

In short, in the absence of a satisfactory definition, we shall simply retain the following: the size is finite and the number of possible landscapes is also finite (one does not imply the other). More precisely, there are N N D possible landscapes1. A consequence is that the requirements of the No Free Lunch Theorem (NFLT) (Wolpert 1997) are satisfied. Thus, on average, all optimizers - including pure random search - are equivalent, however, only on the condition that every landscape is really considered. In practice, this is never the case in test cases. Hence, there are two remarks:

- in some published articles, we sometimes see a comment such as "our new algorithm outperforms others with most problems of test cases, but not all, because of the NFLT". This is incorrect. Given that the test case is not exhaustive, nothing prevents in theory that there is an unsurpassable algorithm with this set of problems;
- because each one is incomplete, every test case is necessarily biased, in the sense that they promote such and such types of optimizers. At the least, we could try to build them to be sufficiently representative of certain classes of landscapes of real problems so that they make it possible to usefully discriminate from generalist optimizers. This is precisely one of the subjects of this little book.

1.2.2. Adjacency and path

We consider two points x = (x1, ., xD) and y = (y1, ., yD) of the search space. They are said to be adjacent if

[1.2]

They are thus as close as possible, because of machine epsilon. In fact, equation [1.2] implies that the components of the vector y - x have only three possible values: e, -e and 0.

Figure 1.2. Path examples

To simplify, instead of saying "points adjacent to", one could say "the adjacents to".

A path is a sequence of adjacents.

1.2.3. Minimum

A point is a minimum if its value is less than or equal to the values of all its adjacents. If there is no equality, then it is a strict minimum.

A minimum is said to be global if its value is less than or equal to the values of all other minima (and thus, ultimately, of all other points in the definition space). Otherwise, this is a local minimum.

1.2.4. Modality

This is the number of local or global minima. We will see that this is a criterion sometimes used to define the difficulty of a problem, but it is in fact much too basic, as intuitively illustrated by Figure 1.3.

Figure 1.3. Same modality, but for all classic optimizers the landscape at the bottom is more difficult

1.2.5. Plateau

Let P be a set of points of the search space such that:

- every point of P has the same value v;
- every point of P has at least D adjacents also in P (connexity condition).

Then, P is said to be a plateau of value v. If it is necessary to be more precise, it will sometimes be referred to as D-plateau. Note that a plateau can be seen as a set of minima.

Figure 1.4. Examples of landscapes with plateaus. The one at the bottom was built with the program LandGener (see Chapter 4). For a color version of this image, see www.iste.co.uk/clerc/iterative.zip

1.2.6. Basin of attraction

Consider a minimum x* of a landscape f of dimension D. Intuitively, it would be tempting to say that a point x belongs to the basin of attraction of x* if the following two conditions are met:

- f (x) > f (x*);
- there exists a path (x1 = x, ., xk, ., xK = x*) with f (xk+1) < f (x).

Figure 1.5. Landscape formed of two basins of attraction. Without the "maximal descent" condition, a point such as A may belong to two basins. With this condition, this may still be the case for a point such as B, at the shared boundary of the basins, but this will not affect the evaluation of the sizes of these basins. For a color version of this image, see www.iste.co.uk/clerc/iterative.zip

Informally, it can be said that there is a strictly descending path from x to x*. However, this definition is not precise...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Iterative Optimizers

Description

More details

Other editions

Additional editions

Content

1
Some Definitions

1.1. Continuous case vs discrete case: when a theorem no longer holds

1.2. Landscape

1.2.1. Size of a landscape

1.2.2. Adjacency and path

1.2.3. Minimum

1.2.4. Modality

1.2.5. Plateau

1.2.6. Basin of attraction

System requirements

Schweitzer Fachinformationen

Iterative Optimizers

Description

More details

Other editions

Additional editions

Content

1 Some Definitions

1.1. Continuous case vs discrete case: when a theorem no longer holds

1.2. Landscape

1.2.1. Size of a landscape

1.2.2. Adjacency and path

1.2.3. Minimum

1.2.4. Modality

1.2.5. Plateau

1.2.6. Basin of attraction

System requirements

1
Some Definitions