My favorites | Sign in
Project Home Downloads Issues Source
Search
for
houdini_ea_generator  
An automated generator for job defintions using Houdini.
Updated Jul 11, 2011 by patrick....@gmail.com

Houding EA generator

Houdini

The Houding evolutionary algorithm generator is designed to be used together with SideFX Houdini software for optimising three-dimensional design problems.

Houdini is an advanced modelling and animation application that includes a visual dataflow modelling (VDM) approach. The VDM interface allows you to create complex modelling procedures visually by creating networks consisting of nodes and links. No coding is necessary.

You will need to create one developmental network and one or more evaluation networks. Each network is saved in a seperate Houdini file (with extension .hipnc).

  • The developmental network starts with a set of genes (or parameters) and generates a phenotype, which is a three dimensional model of the design.
  • The evaluation network starts with a phenotype and calculates a score by analysing or simulating the design model.

In the xml settings file, you then specify the names of these files (along with various other settings). The generator will then generate Python code that wraps these files, so that when the job is executed, the Houdini files will be used to develop and evaluate individuals in the population.

The job definition folder

In order to create a job definition, you will need to create a folder containing the xml settings file and the Houdini files. The names of the files are not predefined, so you can give them whatever names you like. For example, the folder my be as follows:

  • my_jobdef
    • dev.hipnc
    • eval1.hipnc
    • eval2.hipnc
    • settings.xml

An example xml settings file

An example xml settings file is shown below:

<!-- Settings file for Dexen generator -->
<dexen_jobdef type="houdini_ea">
    <population initial_size="100" max_births="10000" />
    <devTask label="Development" file="dev.hipnc" input_size="10"/>
    <evalTask label="Eval Area" file="eval_area.hipnc" input_size="10" type="MIN"/>
    <evalTask label="Eval Volume" file="eval_vol.hipnc" input_size="10" type="MAX"/>
    <feedbackTask label="Feedback" input_size="20" fitness="pareto_ranking" 
                  num_deaths="4" num_births="4" sel_deaths="worst" sel_births="best" 
                  mutation_prob="0.1" />
    <genotype>
        <geneSet from_pos="1" to_pos="1" type="float_range" min="0" max="10" />
        <geneSet from_pos="2" to_pos="2" type="int_range" min="2" max="15" />
        <geneSet from_pos="3" to_pos="3" type="int_choice">8,10,12,14,16,18</geneSet>
    </genotype>
</dexen_jobdef>

Below is a description of each xml element:

  • population: There should only be one population element, and it should define the following set of attributes:
    • initial_pop: The initial population generated using randomly initialised genotypes. The number of live individuals in the population will typically remain constant, and will be equal to the initial population size.
    • max_births: The total number of individuals that can be born.
  • devTask: There should only be one devTask element, and it should define the following set of attributes:
    • label: The label used to refer to this task.
    • file: The name of the Houdini file to be used for development.
    • input_size: The number of individuals to be processed by the development task each time it is executed.
  • evalTask: There may be one or more evalTask elements, each of which should define the following attributes:
    • label: The label used to refer to this task.
    • file: The name of the Houdini file to be used for evaluation.
    • input_size: The number of individuals to be processed by the development task each time it is executed.
    • type: Specifies whether the performance criteria being evaluated should be minimised or maximised.
  • feedbackTask: There should only be one feedbackTask element, and it should define teh following attributes:
    • label: The label to be used to refer to this task.
    • input_size: The number of individuals to be processed by this task each time it is executed.
    • fitness: The fitness calc method. Currently there is only 'pareto_ranking'. Later we may add 'average' - which will take the average of the performance scores (this would require the scores to be normalised).
    • num_births: The number of new individuals that will be created. Typically this will be less than or equal to the input_size, and equal to num_deaths.
    • num_deaths: The number of individuals that will be killed. Typically this will be less than or equal to the input_size, and equal to num_births.
    • sel_births: The selection method to be used for selecting parents for reproduction. This can be either 'best', 'worst', 'roulette_best', 'roulette_worst'. Typically, you would set this to either 'best' or 'roulette_best'.
    • sel_deaths: The selection method to be used for selection individuals to be killed. This can be either 'best', 'worst', 'roulette_best', 'roulette_worst'. Typically, you would set this to either 'worst' or 'roulette_worst'.
    • mutation_prob: The probability that during reproduction, each gene in the genotype will be mutated.
  • genotype: There should only be one genotype element, and it should contains a set of geneSet elements that specify the types of genes that will be used. See below for more details.

Genotype structure

The geneSet elements can be used to specify the structure of your genotype. This structure will depend on the type of problem that you are trying to optimise. For example, for some problems you might want a genotype that is a binary set of '0's and '1's. For other problems you may prefer to work with a set of floats (real valued numbers) within a certain range (e.g. between 1.0 and 10.0). In addition, you can also specify genotype structuures that contain a mixture oof different types of genes.

To specify the genotype structure, you add one or more geneSet elements to the xml settings file. In order to illustrate this more clearly, lets have a look at a more complex example. Consider a genotype structure with 16 genes, where :

  • the first 4 genes are integers in the range 1 to 5,
  • the next 4 genes are floats in the range 1.0 to 10.0,
  • the next 4 genes are integers in the list (1,3,5,8,15), and
  • the next 4 genes are strings in the list ("material1", "material2", "material3"),

A typical genotype might look something like this:

[2,3,4,1,1,3.345,4.789,3.123,9.345,1,3,1,8,"material1","material3","material2","material1"]

In order to specify this genotype structure, the xml would look something like this:

    <genotype>
        <geneSet from_pos="1" to_pos="4" type="int_range" min="1" max="5" />
        <geneSet from_pos="5" to_pos="8" type="float_range" min="1.0" max="10.0" />
        <geneSet from_pos="9" to_pos="12" type="int_choice">1,3,5,8,15</geneSet>
        <geneSet from_pos="13" to_pos="16" type="string_choice">material1,material2,material3</geneSet>
    </genotype>

All geneSet elements have certain attributes in common. The common attributes are as follows:

  • from_pos and to_pos: The range of genes in the genotype. The positions are inclusive, starting at 1.
  • type: The type of gene that will be used. Possible types are int_range, float_range, int_choice, and string_choice.

The geneSet elements with type int_range or float_range also have min and max attributes.

  • min and max: The minimum and maximum values for the numeric range. The values are inclusive.

The geneSet elements with type int_choice or string_choice have no min and max attributes (since there is no range). However, the range must be specified as the content of that element. The values should be listed in a single line, separated by commas.

Feedback task

The feedback task takes the individuals that it receives and does the following:

  • Individuals are ranked using a standard pareto ranking method. This will take into account multiple evaluation criteria, and will also take into account whether evaluation criteria need to be maximised or minimised. If individuals are of equal rank, then they are ranked randomly relative to one another. This results in a sorted list of individuals, arranged from best to worst.
  • Individuals are selected and killed. The number of individuals that are selected is specified by num_deaths, and the method of selection is specified by sel_deaths.
  • New individuals are created by selecting parents and performing reproduction (using standard crossover and mutation). The number of parents that are selected is specified by num_births, and the method of selection is specified by sel_birthss.

Typically, you want to keep the total number of live individuals constant. Therefore you will need to make sure that you set the number of births equal to the number of deaths. It is also better to set these to even values, since for births, pairs of parents will be selected.

Note that, with certain types of problems, it may be that all individuals in the input set will be of rank 1, i.e. on the pareto front. In such a case, the ranking will be completely random.

Population settings

The initial population size is an important parameter in an evolutionary algorithm. This population size is also the size of the live population that is maintained throughout the evolutionary process.

  • Population sizes that are too small may result in a lack of diversity and premature convergence, where all individuals in the population may end up being almost exactly the same.
  • Population sizes that are too big may may slow down the evolutionary process, rendering the whole process ineffective.

Typical population sizes are in the range from 50 to 200.

The maximum number of births is an upper limit, in order to ensure that the evolutionary process does not continue indefinitly - potentially filling up the whole hard disk.

Input size settings

In the xml settings file, each task is assigned an input size. This specifies how many individuals will be processed each time that task is executed.

For the development and evaluation tasks, the input sizes only affect performance. They do not affect the evolutionary process itself. For example, assuming that the initial population size if 100, the input sizes for development and evaluation can be set anywhere between 1 and 100. Setting the input size to 1 may be too low, since it means that each time only 1 individual is sent to a slave for processing, which may be too inefficient. Setting the input size to 100 may be too hight, since it means that many slaves may be left idle with nothing to do. A relativley low number such as 10 seems to work well.

For feedback, the input size is more critical since it will also affect the way that evolution progresses. Lets assume that the initial population size if 100.

If the input size for feedback if set to 100, then it means that the feedback task will have to wait until all individuals in the population have been fully evaluated. (By fully evaluated, we mean that the individuals have been developed and then evaluated by all the evaluation tasks.) This then produces a sychchronous type of evolutionary process.

On the other hand, if the feedback size is set to something smaller - for example 20 individuals - then it means that an asynchronous type of evolutionary process can unfold. As soon as the first 20 individuals have been fully evaluated, feedback can already kick in and start killing and reproducing.

Examples

The examples are located in the dexen\examples\xml folder.


Sign in to add a comment
Powered by Google Project Hosting