Parent document is top of "FAQ: part 2/6 (A Guide to Frequently Asked Questions)"
Previous document is "Q1.2: What's Evolutionary Programming (EP)?"
Next document is "Q1.4: What's a Classifier System (CFS)?"

Q1.3: What's an Evolution Strategy (ES)?

     In 1963 two students at the Technical University of Berlin (TUB)  met
     and  were  soon  to  collaborate  on  experiments which used the wind
     tunnel of the Institute of Flow Engineering.  During the  search  for
     the  optimal  shapes  of bodies in a flow, which was then a matter of
     laborious  intuitive  experimentation,  the  idea  was  conceived  of
     proceeding  strategically.  However, attempts with the coordinate and
     simple gradient strategies (cf Q5) were unsuccessful.   Then  one  of
     the   students,   Ingo  Rechenberg,  now  Professor  of  Bionics  and
     Evolutionary Engineering, hit upon the idea of trying random  changes
     in  the  parameters  defining  the  shape,  following  the example of
     natural  MUTATIONs.   The  EVOLUTION  STRATEGY  was  born.   A  third
     student,  Peter  Bienert, joined them and started the construction of
     an automatic experimenter, which would work according to  the  simple
     rules  of  mutation  and  SELECTION.   The  second student, Hans-Paul
     Schwefel, set about testing the efficiency of the  new  methods  with
     the  help of a Zuse Z23 computer; for there were plenty of objections
     to these "random strategies."

     In spite of an occasional lack of financial support, the Evolutionary
     Engineering  Group  which  had been formed held firmly together. Ingo
     Rechenberg received  his  doctorate  in  1970  (Rechenberg  73).   It
     contains  the  theory  of  the  two membered EVOLUTION strategy and a
     first proposal for a multimembered strategy which in the nomenclature
     introduced  here  is  of the (m+1) type.   In the same year financial
     support from  the  Deutsche  Forschungsgemeinschaft  (DFG,  Germany's
     National Science Foundation) enabled the work, that was concluded, at
     least temporarily, in 1974 with the thesis  "Evolutionsstrategie  und
     numerische Optimierung" (Schwefel 77).

     Thus,   EVOLUTION   STRATEGIEs   were  invented  to  solve  technical
     OPTIMIZATION  problems  (TOPs)  like  e.g.  constructing  an  optimal
     flashing  nozzle,  and  until  recently  ES  were only known to civil
     engineering folks, as an alternative to standard solutions.   Usually
     no  closed  form  analytical objective function is available for TOPs
     and  hence,  no  applicable  optimization  method  exists,  but   the
     engineer's intuition.

     The  first  attempts  to imitate principles of organic evolution on a
     computer still resembled the iterative optimization methods known  up
     to  that  time  (cf  Q5):   In a two-membered or (1+1) ES, one PARENT
     generates  one  OFFSPRING  per  GENERATION   by   applying   NORMALLY
     DISTRIBUTED  mutations, i.e. smaller steps occur more likely than big
     ones, until a child performs better than its ancestor and  takes  its
     place.  Because  of  this  simple  structure, theoretical results for
     STEPSIZE control and CONVERGENCE VELOCITY could be derived. The ratio
     between  successful  and  all  mutations  should come to 1/5: the so-
     called 1/5 SUCCESS RULE was discovered. This first  algorithm,  using
     mutation  only,  has  then  been  enhanced  to a (m+1) strategy which
     incorporated RECOMBINATION due  to  several,  i.e.  m  parents  being
     available.  The  mutation  scheme  and the exogenous stepsize control
     were taken across unchanged from  (1+1) ESs.

     Schwefel later generalized these strategies to the  multimembered  ES
     now  denoted  by  (m+l)  and (m,l) which imitates the following basic
     principles  of  organic  evolution:  a  POPULATION,  leading  to  the
     possibility   of  recombination  with  random  mating,  mutation  and
     selection. These  strategies  are  termed  PLUS  STRATEGY  and  COMMA
     STRATEGY,  respectively: in the plus case, the parental generation is
     taken into account during selection, while in the comma case only the
     offspring  undergoes selection, and the parents die off. m (usually a
     lowercase mu, denotes the population size, and l, usually a lowercase
     lambda denotes the number of offspring generated per generation).  Or
     to put  it  in  an  utterly  insignificant  and  hopelessly  outdated

	  (define (Evolution-strategy population)
	    (if (terminate? population)
		  (cond (plus-strategy?
			  (union (mutate
				   (recombine population))
			    (recombine population))))))))

     However,  dealing  with ES is sometimes seen as "strong tobacco," for
     it takes a decent amount of probability theory and applied STATISTICS
     to understand the inner workings of an ES, while it navigates through
     the  hyperspace  of  the  usually  n-dimensional  problem  space,  by
     throwing hyperelipses into the deep...

     Luckily,  this  medium  doesn't allow for much mathematical ballyhoo;
     the author therefore has to come up with  a  simple  but  brilliantly
     intriguing  explanation to save the reader from falling asleep during
     the rest of this section, so here we go:

     Imagine a black box. A large black box. As large as, say for example,
     a Coca-Cola vending machine. Now, [..] (to be continued)

     A  single  INDIVIDUAL of the ES' population consists of the following
     GENOTYPE representing a point in the SEARCH SPACE:

	  Real-valued x_i have to be tuned by recombination  and  mutation
	  such  that  an  objective  function  reaches its global optimum.
	  Referring  to  the  metaphor  mentioned  previously,   the   x_i
	  represent the regulators of the alien Coka-Cola vending machine.

	  Real-valued s_i (usually denoted by a lowercase sigma)  or  mean
	  stepsizes  determine  the  mutability of the x_i. They represent
	  being  added  to  each  x_i  as an undirected mutation.  With an
	  "expectancy value" of  0  the  parents  will  produce  offspring
	  similar  to themselves on  average.  In order to make a doubling
	  and a halving of a stepsize equally  probable,  the  s_i  mutate
	  log-normally,  distributed,  i.e.  exp(GD),  from  generation to
	  generation.   These  stepsizes  hide  the  internal  model   the
	  population  has  made of its ENVIRONMENT, i.e. a SELF-ADAPTATION
	  of the stepsizes has replaced the exogenous control of the (1+1)

	  This  concept  works  because  selection sooner or later prefers
	  those individuals having built a good  model  of  the  objective
	  function, thus producing better offspring. Hence, learning takes
	  place on two levels: (1) at the genotypic, i.e. the  object  and
	  strategy  variable  level  and (2) at the phenotypic level, i.e.
	  the FITNESS level.

	  Depending  on  an  individual's  x_i,  the  resulting  objective
	  function  value  f(x),  where  x denotes the vector of objective
	  variables, serves as the PHENOTYPE (fitness)  in  the  selection
	  step.  In  a  plus strategy, the m best of all (m+l) individuals
	  survive to become the parents of the next generation.  Using the
	  comma variant, selection takes place only among the l offspring.
	  The  second  scheme  is  more  realistic  and   therefore   more
	  successful,  because  no  individual  may survive forever, which
	  could at least  theoretically  occur  using  the  plus  variant.
	  Untypical for conventional optimization algorithms and lavish at
	  first   sight,   a   comma   strategy   allowing    intermediate
	  deterioration  performs  better!  Only  by forgetting highly fit
	  individuals can a permanent adaptation  of  the  stepsizes  take
	  place  and avoid long stagnation phases due to misadapted s_i's.
	  This means that these individuals have built an  internal  model
	  that  is  no  longer  appropriate for further progress, and thus
	  should better be discarded.

	  By  choosing  a  certain  ratio  m/l,  one  can  determine   the
	  convergence  property  of the evolution strategy: If one wants a
	  fast, but local convergence, one  should  choose  a  small  HARD
	  SELECTION,  ratio,  e.g.  (5,100),  but  looking  for the global
	  optimum, one should favour  a softer selection (15,100).

	  Self-adaptation within  ESs  depends  on  the  following  agents
	  (Schwefel 87):

     Randomness: One cannot model mutation
	  as  a  purely  random  process.  This would mean that a child is
	  completely independent of its parents.

     Population size: The population has to be sufficiently large. Not
	  the  current  best  should be allowed to reproduce, but a set of
	  good individuals.  Biologists have coined  the  term  "requisite
	  variety"  to  mean  the  genetic  variety necessary to prevent a
	  SPECIES  from  becoming  poorer  and  poorer   genetically   and
	  eventually dying out.

	  In  order  to  exploit  the effects of a population (m > 1), the
	  individuals should recombine their knowledge with that of others
	  (cooperate)   because   one   cannot  expect  the  knowledge  to
	  accumulate in the best individual only.

     Deterioration: In order to allow better internal models (stepsizes)
	  to provide better progress in  the  future,  one  should  accept
	  deterioration  from  one generation to the next. A limited life-
	  span in nature is not a sign of failure, but an important  means
	  of preventing a species from freezing genetically.

	  ESs  prove  to  be  successful  when compared to other iterative
	  methods on a large number of test problems (Schwefel 77).   They
	  are  adaptable  to nearly all sorts of problems in optimization,
	  because they need very little  information  about  the  problem,
	  especially  no derivatives of the objective function. For a list
	  of some 300 applications of EAs, see  the  SyS-2/92  report  (cf
	  Q14).   ESs are capable of solving high dimensional, multimodal,
	  nonlinear  problems   subject   to   linear   and/or   nonlinear
	  constraints.   The  objective  function  can  also,  e.g. be the
	  result of a SIMULATION, it does not have to be given in a closed
	  form.   This  also holds for the constraints which may represent
	  the outcome of, e.g. a finite elements method (FEM).   ESs  have
	  been  adapted  to VECTOR OPTIMIZATION problems (Kursawe 92), and
	  they can also serve as a heuristic for NP-complete combinatorial
	  problems like the TRAVELLING SALESMAN PROBLEM or problems with a
	  noisy or changing response surface.


	  Kursawe,  F.  (1992)   "   Evolution   strategies   for   vector
	  optimization",  Taipei, National Chiao Tung University, 187-193.

	  Kursawe, F. (1994) "  Evolution  strategies:  Simple  models  of
	  natural  processes?", Revue Internationale de Systemique, France
	  (to appear).

	  Rechenberg,   I.   (1973)   "Evolutionsstrategie:    Optimierung
	  technischer Systeme nach Prinzipien der biologischen Evolution",
	  Stuttgart: Fromman-Holzboog.

	  Schwefel,   H.-P.    (1977)    "Numerische    Optimierung    von
	  Computermodellen   mittels   der   Evolutionsstrategie",  Basel:

	  Schwefel, H.-P. (1987) "Collective  Phaenomena  in  Evolutionary
	  Systems",  31st  Annu.  Meet.  Inter'l  Soc.  for General System
	  Research, Budapest, 1025-1033.

Parent document is top of "FAQ: part 2/6 (A Guide to Frequently Asked Questions)"
Previous document is "Q1.2: What's Evolutionary Programming (EP)?"
Next document is "Q1.4: What's a Classifier System (CFS)?"