Chapter 3: Genetic algorithm
The term "genetic algorithm" (GA) refers to a metaheuristic that was developed in the fields of computer science and operations research. This metaheuristic was inspired by the process of natural selection, and it is a member of the wider class of "evolutionary algorithms" (EA). By relying on biologically inspired operators like mutation, crossover, and selection, genetic algorithms are often employed to develop high-quality solutions to optimization and search problems. This is accomplished via the use of genetic programming. optimization of hyperparameters, and so on.
A population of candidate solutions (which may be referred to as people, animals, organisms, or phenotypes) to an optimization problem may be developed using a genetic algorithm in the direction of producing superior answers. Each potential solution has a set of characteristics, also known as its chromosomes or genotype, which are susceptible to mutation and change. Traditionally, solutions are encoded in binary as strings of 0s and 1s, although there are other alternative encodings as well.
The process of evolution often begins with a population of people who have been created at random. Evolution is an iterative process, and the population that exists at the end of each iteration is referred to as a generation. At each generation, every person in the population is analyzed to determine their level of fitness; this level of fitness is often represented by the value of the objective function in the optimization problem that is being solved. To create a new generation, the genomes of existing people are recombined and maybe subjected to random mutations before being chosen at random from the present population to determine which individuals have the most potential for improvement. The newly generated pool of potential answers is thereafter put to use in the subsequent iteration of the process. In most cases, the algorithm comes to an end when one of two conditions is met: either a predetermined maximum number of generations has been created, or a population fitness level that is deemed adequate has been achieved.
The requirements of a typical genetic algorithm are as follows::
an illustration of the solution space based on genetics, a fitness function that may be used to analyze the solution domain.
Each potential answer is often represented as a collection of bits, according to the standard (also called bit set or bit string). Arrays of various types and structures may be used in a manner that is substantially equivalent to how arrays of the type arrays are used. Because of their consistent sizes, the components of these genetic representations are easy to align, which paves the way for straightforward crossover procedures. This is the primary quality that contributes to the convenience offered by these representations of genetic material. It is also possible to employ representations of variable length, albeit the implementation of crossover in this scenario would be more difficult. In genetic programming, we investigate representations in the form of trees, while in evolutionary programming, we investigate representations in the form of graphs. In gene expression programming, we investigate a combination of linear chromosomes and tree-like structures.
After the genetic representation and the fitness function have been specified, a GA will continue to establish a population of solutions and then proceed to enhance it via the repeated application of the mutation, crossover, inversion, and selection operators.
The size of the population is determined by the characteristics of the issue, although it often ranges from several hundred to several thousand different potential solutions. In many cases, the starting population is produced in a random fashion, which makes it feasible to consider every conceivable option (the search space). Sometimes the solutions will be "seeded" in places where they are most likely to find optimum answers.
A subset of the current population is chosen at the beginning of each succeeding generation in order to serve as the foundation for the next generation. A fitness-based procedure is used to pick individual solutions; in this process, solutions that are more physically fit (as determined by a fitness function) have a greater chance of being chosen. The viability of each potential solution is evaluated by certain selection processes, which then give preference to the most viable options. Because the first technique may take a significant amount of time, the second approaches only rate a representative sample of the population.
The quality of the solution that is represented genetically is evaluated using the fitness function, which is specified over the genetic representation. Always reliant on the task at hand is the fitness function. For example, in the knapsack problem, one desires to maximize the total worth of goods that can be placed into a knapsack of some specific capacity. This may be accomplished by putting as many valuable items as possible into the knapsack. It's possible that an array of bits may be used to represent a solution to a problem. If so, each bit would stand for a separate item, and the value of the bit (either 0 or 1) would indicate whether or not the item was included in the knapsack. The size of the items may be more than the capacity of the knapsack, which means that not every such depiction is legitimate. If the representation is correct, the fitness of the solution is equal to the total of the values of all of the items in the knapsack; otherwise, it is equal to 0.
There are some problems in which it is difficult or even impossible to define the fitness expression; in these situations, a simulation may be used to determine the fitness function value of a phenotype (for instance, computational fluid dynamics may be used to determine the air resistance of a vehicle whose shape is encoded as the phenotype), or even interactive genetic algorithms may be used.
The next stage is to build a second generation population of solutions based on those that were chosen, using a variety of genetic operators, including crossover (which is also known as recombination) and mutation.
From the pool of solutions that were chosen in the previous step, a set of "parent" solutions is bred to generate a new solution at each iteration of the process. By using the previously described strategies of crossover and mutation, it is possible to generate a new solution that, in most cases, takes on many of the qualities of its "parents." This new solution is referred to as a "child" solution. The procedure continues until a new population of solutions with the necessary size is produced, at which point new parents are chosen for each new kid that is born. Although techniques of reproduction that are based on the usage of two parents are considered to be more "biologically inspired," some research shows that more than two "parents" yield superior quality chromosomes. [Citation needed].
The end outcome of these activities is a population of chromosomes in the subsequent generation that is distinct from the chromosomal population in the first generation. Because this process selects for breeding only the most fit creatures from the first generation, in most cases the population's overall fitness will have improved as a result of it. However, there may be some individuals or solutions that are less fit than others in the population. These less optimal options maintain genetic variation within the genetic pool of the parents, and as a result, ensure genetic diversity in the generation of children that comes after them.
On which is more important, genetic crossing or mutation, people can't seem to agree. Fogel (2006) has a significant number of citations that support the significance of mutation-based search.
Although crossover and mutation are often considered to be the most important genetic operators, it is possible to utilize additional genetic operators in genetic algorithms. Some examples of these operators are regrouping, colonization-extinction, and migration.
It is worthwhile to tune parameters like the mutation probability and the crossover probability, as well as the population size, in order to find suitable settings for the problem class that is currently being worked on. A extremely low mutation rate is perhaps what will cause genetic drift (which is non-ergodic in nature). A recombination rate that is too high may cause the genetic algorithm to converge on solutions too quickly. In the absence of elitist selection, a mutation rate that is excessively high has the potential to result in the loss of excellent solutions. An suitable population size guarantees that there is sufficient genetic variety for the task at hand; but, if it is set to a number that is bigger than what is necessary, this might lead to a waste of computing resources.
In addition to the primary operators discussed above, additional heuristics may be used in order to either speed up the computation or make it more accurate. The speciation heuristic increases population variety and helps avoid early convergence to solutions that are less optimum by imposing a penalty on crossover between candidate solutions that are too similar to one another.
This process of generational succession is continued until a condition that causes the process to end has been satisfied. Conditions that often result in termination include:
It has been determined that a solution exists that fulfills the essential requirements.
The predetermined number of generations has been achieved.
Attained the allotted budget (time and money for calculation).
The level of fitness offered by the solution with the highest rating is approaching or has already hit a...