Monday, January 31, 2005


Assembling a computational cluster for GEC runs

A year and half ago, we built a 90-node diskless PC cluster, similar to (and inspired by) the Beawolf cluster in John Koza's lab for running all our GEC experiments. We mostly use the cluster to run multiple GA runs to collect statistics, and also for testing the effect of different parameters on the scalability of the algorithms (For example, see Martin's blog). Since then, quite a few people have asked me a how to build such a cluster, what components to buy, etc.

So, I decided to put it all here in the blog (note that the component costs are rough estimates, and Pricewatch and Newegg are good sources to check prices):

Don't forget, the new motherboards and processors drain a lot of power, so you need about 1.5 Amps per node. That means they produce a lot of heat, but nothing that a few well placed fans and couple of ACs can't fix.

Total estimated cost of building the cluster: 500 (server) + 320 * (# nodes).



Some helpful scripts

In a recent post, Xavier talked about R, the excellent statistical package. I have used it extensively to compute statistical tests, and I made a script to automatize most of the testing process.

The script performs pairwise t-tests to compare multiple methods on several datasets. It uses the Bonferroni correction to ajust the significance level of the tests for the multiple comparisons. It supposes that the results are paired, like the results obtained, for instance, from a stratified ten-fold cross-validation test. The script reads a text file with the following format:

<dataset1> <method1> <numerical result 1>
<dataset1> <method1> <numerical result 2>
<dataset1> <method2> <numerical result 1>

for(dataset in levels(data$V1)){
for(m1 in rows){
for(m2 in cols){
if(! && val<0.05){
print(sprintf("Sig %s %s -> %s",dataset,m1,m2))
print(sprintf("Sig %s %s -> %s",dataset,m2,m1))

The script prints a message for each significant difference observed in the results, using a confidence level of 95%

Another script I use very often is a perl wrapper for gnuplot, the plotting program from the GNU project. The script generates draft plots just to check how do the results look, without the overhead of launching other programs that can generate more fancy plots like Matlab.


#!/usr/bin/perl -w

open(FH,"gnuplot -persist") or die "Can't exec gnuplot";

print FH "set encoding iso_8859_1\n";
print FH "set data style lines\n";

my $end=0;
my $index=0;
my $line="plot ";

while(not $end) {
if(defined $ARGV[$index]) {
$line.="\"$ARGV[$index]\" with errorbars,";
} else {
chop $line;
print FH $linea;

This script receives as parameters a variable-length number of data files to overlap in the same plot. The contents of each file should be a stream of lines with the
following format:
<X value> <Y value> <Y max> <Y min>

The program plots the points specified by the data files connected by lines and with errorbars (specified by <Y max> and <Y min>)
The plot is shown in the current display.

I hope this helps


Unnatural selection article gets blog traction

Sam Williams's recent article "Unnatural Selection" in Technology Review is getting some blog traction. Franz Dill at the IFTF blog abstracted the original article and made comment about my comment. As an aside, I'm getting to be something of an IFTF blog fan. The report on P&G's cool marketing usage of social networking is a recent goody, for example. Other sites in on the action include Dark Government News Port, Kinja, Link Filter, Robots. net and Smart Mobs. is a search engine dedicated to the blogosphere for those wishing to track what's going on. Thanks also to Blojj for direct blog comment on IlliGAL Blogging.

Sunday, January 30, 2005


ISGEC to SIGEVO: From Outlaws to Mainstream?

I want to comment on Paul Winward's brief post regarding ISGEC becoming ACM's newest SIG, SIGEVO. I've been heavily involved in the process of this transformation, and so my views are not entirely unbiased (no one's views are entirely unbiased or so any good postmodernist would claim), but I think this move is good for ISGEC, good for ACM, good for the field of genetic and evolutionary computation, and good for computer science. For too long, GAs and EC more generally have lived in the backwaters of academic and industrial life. On the one hand, it is hard to understand why this is. After all, evolution is one of the biggest ideas in all of science, and the impact of modern genetics is hard to overstate, so it seems quite natural that computations based on evolution and genetics should have an important place in science and technology. Why, then, have GAmists and ECers been largely relegated to second tier schools, non-tenured posts, or non-CS departments (even, god forbid, engineering departments)?

To some extent it's a matter of bad timing. First wave evolutionary computation researchers (and cyberneticists of all stripes) in the late 50s and 60s were just getting their sea legs under them when Minsky and Papert's premature and misleading hatchet job on neural networks appeared and took the wind out of cybernetic sails generally. Moreover, computers of the day weren't entirely up to the tasks put to them, and a number of prominent pioneering GA/ECs works were prematurely dismissed as no big deal. Finally, the rise of symbolic AI and the subsequent cybernetics winter made it academically disreputable to persist in the "folly" of genetic algorithms and evolutionary computation, but persist the field did, and we are here today because of the courage of a small group who swam against the intellectual currents of the 60s, 70s, and 80s.

Another reason for the current situation is that human lifespans are no longer a good match to the turnover in intellectual ideas. This brings to mind the old story told of physicist Max Plank. When he was asked how the acceptance of his revolutionary ideas in quantum physics was going, it is reported that he said, "Wonderful, lots of funerals." In Plank's day, perhaps the natural turnover in faculty matched the turnover of ideas, but in our time, rapid change in thinking has not been matched with a concomitant shortening of faculty life expectancy. The result is large numbers of powerful faculty in charge with ideas in mind that are more than a little behind the time. Unfortunately, the most obvious solutions to this problem are serious felonies, and no one is here suggesting that steps be taken to cull the herd.

So, it is in this sense, that the acceptance of SIGEVO by ACM couldn't come at a better time, and should be seen as a very positive thing. With GECCO as an ACM conference proceedings, young faculty can safely put their work there without endlessly defending their choice. With ACM affiliated journals in EC and GP, department heads have the moral equivalent of a Good Housekeeping seal of approval on our field to rely upon instead of the pleas of a lone faculty member up for tenure. More importantly, if SIGEVO is seen as part of ACM and CS, it will become easier for CS department heads to hire faculty with GA/EC credentials.

Of course, department heads when asked about tenure and promotion will tell you that each case is special and is scrutinized without regard for brand names. Department heads have told me that the decision to move from ISGEC to SIGEVO will make absolutely no difference to them. But consumers will tell you that their decisions to buy products are also made on a case-by-case basis without regard for brands--as their grocery baskets fill with Crest, Tide, Mach 3, Ragu, Pepsi, Twinkies, and Tide. In a busy world, trusted brands allow buyers to get quality products at low risk of error and low search costs. In the new world of ACM and SIGEVO, CS departments and their heads will be able to rely on a trusted brand in one of the most important decisions they make, the hiring, retention, and promotion of faculty.

It will take some time to know the overall effect, and indeed, joining a large bureaucratic organization like ACM will come with its share of constraints and costs. But, in the end, I believe the young people in the PhD pipeline will benefit immensely through higher probability of hiring, better chances to tenure, and improved prospects of funding as the once outlaw field of genetic algorithms and evolutionary computation comes in off the range, hangs up its six-shooter, and becomes a law-abiding denizen of the mainstream of computer science.

Saturday, January 29, 2005


GA application on code compiling optimization

Acovea (analysis of compiler options via evolutionary algorithm) is an open source project which utilizes GAs to tune the optimization flags for GCC to yield the fastest code specifically for your computer. GCC has many flags for code optimization, and acovea finds the best combination for you by using GAs. Code optimization has been an important research area in computer software and computer system. We would expect more evolutionary techniques applied to this area in the near future.


"Computer sentience is possible," says Holland

Science Daily posts a University of Michigan press release entitled Falling Prey to Machines? in which John Holland discusses the connection between contemporary science fiction such as Michael Crichton's Prey, genetic algorithms, and the possibility of sentient computation.

"Computer sentience is possible," said John Holland, professor of electrical engineering and computer science and professor of psychology at the University of Michigan. "But for a number of reasons, I don't believe that we are anywhere near that stage right now."
Readers of this blog need no introduction to Holland, but they might be somewhat surprised by his views regarding the possibility of computers rivaling human intelligence.

According to Holland, the problem with developing artificial intelligence through things like genetic algorithms is that researchers don't yet understand how to define what computer programs should be evolving toward. Human beings did not evolve to be intelligent--they evolved to survive. Intelligence was just one of many traits that human beings exploited to increase their odds of survival, and the test for survival was absolute. Defining an equivalent test of fitness for targeting intelligence as an evolutionary goal for machines, however, has been elusive. Thus, it is difficult to draw comparisons between how human intelligence developed and how artificial intelligence could evolve.

"We don't understand enough about how our own human software works to come even close to replicating it on a computer," says Holland.

According to Holland, advances in software have not kept pace with the exponential improvements in hardware processing power, and there are many artificial intelligence problems that cannot be solved by simply performing more calculations. While hardware performance continues to double almost every year and a half, the doubling time for software performance is at least 20 years.

"In the final analysis, hardware is just a way of executing programs," says Holland. "It's the software that counts."
Hat tip to Open Source Protein Structure Prediction for picking up the news release and for republishing Cosma Shalizi's lovely review of Holland's Emergence.


Canadian researchers apply GAs in mining

The Toronto Globe and Mail reports that researchers at Laurentian University are using genetic algorithms to schedule mining of ore deposits. The article is remarkably bereft of details even by MSM standards. It is unclear from the article what the GA is actually going to do. No mention is made of who the reseachers are or what department, college, or lab they come from. Nonetheless, a Google search on the search terms, "genetic algorithms," "mining," and "Laurentian University" quickly leads to the following link. The Laurentian University Mining Automation Laboratory (LUMAL) under the direction of Nick Vayenas is applying simulation, robotics, and genetic algorithms to various mining problems. We can pray for crisper reporting from the the Globe and Mail, but in the meanwhile, odds are that this is the GA mining work that was so cryptically cited.

Friday, January 28, 2005


Symmetry, synchronization, and niching

A while ago, Christoph, a visiting student at the product development research lab was talking to me about how he was having some problems with solving certain instances of his seach problem. He told me that the problematic GA runs would converge to a mediocre solutions, but different parts within those solution were of good quality. I remembered Clarissa's work—that she did during her visit to IlliGAL—about how strong symmteries in problems such as Ising and other spin-glass models can cause synchorinization problems and how niching methods can be used to avoid them. So, I suggested him that there might be synchronization issues with the problematic instances and using a niching method might be the solution. Lo and behold, he came back yesterday and said using niching significantly improved the GA performance!

Clarissa Van Hoyweghen's Ph.D. Thesis Symmetry in the Representation of an Optimization Problem and related papers, available as IlliGAL technical reports 2001020 and 2001030 are a good starting point for anyone interested in knowing more about symmetry and synchronization problems in optimization in general, and GAs in particular.


Are competent learning classifier systems emerging?

Estimation of distribution algorithms (EDAs) have changed the way people approach to evolutionary algorithms. Recently, such approaches are slowly infiltrating in the learning classifier systems world.

Early efforts of Sierra, Jiménez, Inza, Larrañaga, and Muruzábal showed that such algorithms may also be used in genetics-based machine learning approaches. They build a simple Pittsburgh approach classifier based on EDAs. Recently, however, such approaches have hit the main stream of LCS, XCS.

The usage of EDAs can help addressing two major issues in XCS: (1) knowledge extraction and (2) structure identification. Knowledge extraction addresses the issue of mining problem knowledge from the final solution developed by XCS. The goal is to identify most important features in the problem and the dependencies among those features. The extracted knowledge may not only be used for further data mining, but may actually be re-fed into the system giving it further competence in solving problems in which dependent features, that is, building blocks, need to be processed effectively. A paper proposing to extract a feature dependency tree out of the developed rule-based problem representation of XCS may be found here.


Record breaking GECCO

You may be interested in knowing that GECCO-2005 received 557 submissions, an all time record for GECCO.


Green GAs and the random keys guy

The Oregon Daily Emerald reports that James Bean, noted operations researcher, genetic algorithmist, and inventor of the random keys technique for permutation and other combinatorial problems, is using genetic algorithms to help develop sustainable automobile technology in his new role as Dean of the Lundquist College of Business. The research was begun at Michigan in conjunction with the EPA and General Motors. Dr. Bean was on the faculty of the Department of Industrial and Operations Engineering (IOE) of the University of Michigan for many years, where he also held the position of Associate Dean for Academic Affairs.


J-H Chen PhD takes top prize in Taiwan

Genetic algorithms researcher and IlliGAL alumnus, Jian-Hung Chen, recently received top honors in Taiwan's computer science PhD dissertation competition. His thesis, entitled Theory and Applications of Efficient Multi-Objective Evolutionary Algorithms (CS, Feng Chia University, 2004) was ranked number one among all CS dissertations in a national competition held under the auspices of the Institute of Information & Computing Machinery (IICM). The work is a delightful amalgam of theory and practice. Chen takes the methodology of The Design of Innovation and applies it to understanding the approximate complexity of multiobjective evolutionary algorithms (MOEAs). Thereafter, he successfully applies carefully designed MOEAs to problems in production planning and design of nearest-neighbor classifiers. The dissertation concludes with novel theory and empirical work that carry over the method of fitness inheritance to MOEAs. Please join me in congratulating JH for his outstanding work. Better yet, as he is in the middle of a postdoc/job search, send your job offers to him here.


Corkscrew-shaped space antennas

Jason Lohn of the NASA Ames Research Center outside Mountain View, CA used a genetic algorithm to design a wide-beam space antenna. The corkscrew shaped antenna, small enough to fit in a wine glass, proved worthy to transmit the needed bandwidth. Lohn expects the antenna to take part in the NASA Space Technology 5 mission which will test three antennas in space. He hopes that five other GA-designed antennas will make it to space this year.

The Technology Review article also highlights the history of evolutionary computation – briefly mentioning the work of Holland, Goldberg, Koza, and others. From jet engine fan blade design to minimizing power consumption of long-distance pipelines, the article cites several applications of genetic algorithms and genetic programming. A good but concise summary of where the field has been and what current issues it faces.

NASA posted an article which further describes their use of evolutionary algorithms.


C/C++ calling Matlab functions

Since many of us use C/C++ to code our GAs, and then use Matlab to plot the results, here's several simple steps that allow you to call Matlab functions directly from C/C++:

1. Convert the main to mexFunction
mexFunction is the entrance of matlab executable.
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[]) {

int i;
int argc = nrhs;

char **argv;

argv = new char * [argc];

for (i=0; i<argc; i++) {
int length = mxGetNumberOfElements(prhs[i])+1;
argv[i] = new char [length];
mxGetString(prhs[i], argv[i], length);

Basically, prhs are left elements and plhs are right elements.
e.g. In Matlab
a and b are prhs, and nrhs = 2
c, d, and e are plhs, and nlhs = 3

2. Convert printf to mexPrintf
This can be done simply by
"#define printf mexPrintf"

3. Error message: mexErrMsgTxt will show text in red (default) in Matlab.
mexErrMsgTxt("blah blah blah");

4. Include "mex.h" everywhere you call mex functions.
mex.h can be found under the maltab directory.

5. To call a matlab function, use mexCallMATLAB
For example, to call a = f(b), where a is a double and b is a 1x2 matrix, the code looks like:

mxArray *plhs, *prhs;
prhs = mxCreateDoubleMatrix(1, 2, mxREAL);
double *ptr = mxGetPr(prhs);
ptr[0] = b;
ptr[1] = c;
mexCallMATLAB(1, &plhs, 1, &prhs, "f");
a = *mxGetPr(plhs);

6. Compile the transferred code into mex (in Matlab)
mex -f /usr/local/matlab/bin/ foo.cpp foo1.cpp foo2.cpp ..... are needed to let matlab know that you are doing c/c++.
It can be found under the matlab directory. The 1st name will be used as the library name (foo). In windows, the result is .dll; in Linux, it's .mexglx.

More mex functions can be found on the MathWorks. Hope that helps.


Observing human mind with an evolutionary tool!?

Interactive evolutionary computation (IEC) (concretely IGA, IES, IEP, or IGP) is a framework for optimizing the target system based on IEC user's subjective evaluation. The IEC user's psychological evaluation scale in mind is reflected in his or her subjective evaluation characteristics, and these in turn, are reflected in the systems designed with the IEC. We may be therefore able to observe the psychological scale or the mental health state of the user indirectly by observing the system outputs.

This constitutes a new direction for IEC research, while most past IEC researches focus on system optimization.

Experimental result using IEC-based CG lighting design with three schizophrenics and five mentally healthy students implies that this approach has potential to measure human mental health. See detail at (PDF).

Thursday, January 27, 2005


Great book for genetic and evolutionary algorithmists

Several years ago, Dave Goldberg lent me a little book by Nobel laureate Herbert A. Simon entitled The Sciences of the Artificial. I found this book really exciting. One aspect of this work directly relates to genetic and evolutionary algorithms... Herb Simon talks about decomposability of most complex real-world systems, which is a feature that a good evolutionary algorithm should be able to exploit. I'd strongly suggest this book to anyone interested in somewhat a philosphical discussion of what types of problems we should be trying to solve with our optimizers. If nothing else, the book will provoke the reader to think about these things.


Bisection method for determining an adequate population size

My turn :-) I thought I would contribute with something simple but very useful for running scalability experiments.

As many know, I've worked on the design of better genetic algorithms (GAs) for quite a long time. I focused and still focus on designing GAs that scale up well, that means, their computational complexity should grow as a low order polynomial of the number of problem variables or problem size (whether we use the number of evaluations to compute the complexity or we use the actual time complexity including everything). Yes, I know the No Free Lunch theorem and by no means am I trying to violate it. But, as most of us, I am not interested in solving all problems or all classes problems closed under rather general operators (like permutation), but I want to solve problems that can be expected in the real world. And I believe that the real-world problems should mostly be much simpler than some arbitrary problems with no inherent structure. Here I want to describe the bisection method for determining a reasonable population size suggested to me by Kumara years ago, which I find useful for scalability experiments.

We all know that computational complexity of GAs depends primarily on the population size and the number of generations. Given that we know the optimum, the number of generations can be determined by the GA itself---when the GA finds the optimum or the entire population converges to it (or gets close enough), we've got it. The problem is how to determine an adequate population size for each problem size. A "good" population size should be the one that ensures reliable convergence to the optimum (where "reliable may be defined any way we want, for instance, that the algorithm finds the optimum successfully in 9 out of 10 independent runs).

Right now I use a method based on bisection. The basic idea is to first determine a reasonable initial interval (min,max) of population sizes where min is too small and max is too large. Then, this interval is halved again and again until it is small enough (e.g. until (max-min)/min<0.1). In the end, our "perfect" population size should be somewhere between min and max, so when we choose max we know that we're not too far from the number we were looking for and we know that this population size works. The entire pseudocode of the bisection method follows:

1. Start with some very small N (say, N=10)
2. Double N until GA convergence is sufficiently reliable
3. (min,max)=(N/2,N)
4. repeat until (max-min)/min<0.1 (use your own threshold here)
if N leads to sufficiently reliable convergence
then max=N
else min=N
5. Compute all your statistics for this problem size using
pop. size=max

What I like on using the bisection method is that all parameter tweaking will be done automatically (and thus I don't have to do it myself), because the remaining GA parameters do not influence time complexity as much (mostly by some constant factor and that's not so important when we talk about asymptotic upper bounds). To eliminate noise, the bisection can be run several times and the results of all bisection runs can be averaged (right now I use 10 rounds of bisection, each with 10 runs).

Of course, the above thinking is correct only if larger populations lead to better solutions (given enough generations we should all be able to believe this fact) and if smaller populations do not significantly increase the number of generations (in all cases I studied, any sufficiently large population size lead to about the same number of generations, as can be expected for many variants of evolutionary algorithms based primarily on recombination). To determine whether or not the second condition is at least partially satisfied, one can try one problem size (rather larger than smaller) and test whether the overall complexity decreases for larger populations than the one determined by the bisection. Most likely, the complexity increases with the population size and thus the bisection can run the experiments for us.

Scalability plots obtained in this manner should persuade GA sceptics who often think we don't know what computational complexity is.


Matlab script for 3D bar plot with error bars

Matlab doesn't have any direct function to plot 3D bars with error bars. Here is my script to do it, hope its useful for others as well.

g = [1 6 8 10:13 26 51 101]; % labels for x values

ny = 32; % Number of y values
nID = [1:10]; % Actual labels required on the x axis;
% The two arrays below are the values you need to plot
yAvg = rand(length(g),32); % Average of a quantity
yStdDev = rand(length(g),32)*0.15; % Standard deviation of the quantity

bar3(1:ny,yAvg') ; % Plot the 3D bar

% Put some labels
xlabel('X label', 'FontSize', 14);
ylabel('Y label', 'FontSize', 14);
zlabel('Some proportion', 'FontSize', 14);
title('3D bar plot with error bars', 'FontSize', 14);

% Set custom ticks and tick labels
set(gca, 'Xtick', nID);
set(gca, 'xTickLabel', g(nID));

%Now plot the standard deviations as the line on top of the bars
hold on;
for i = 1:ny,
for j = 1:length(g),
z35 = yAvg(j,i):yStdDev(j,i)/20:yAvg(j,i)+yStdDev(j,i);
x35(1:length(z35)) = i;
y35(1:length(z35)) = j;
plot3(y35, x35, z35,'r-')
clear x35; clear y35; clear z35;
hold off
axis tight;


Data mining tools

Usually, when I work on data mining problems using genetics-based machine learning, I tend to compare the results with the ones obtained using non evolutionary methods. I know Martin and Jaume have also been using some of this tools too in their data mining related papers.

The first one I started using was WEKA. It has a nice collection of a classification, regression, and clustering algorithms. Written in Java, it is easy to use, providing a flexible environment for rapid preliminary filtering and analysis of raw data. Recently, I have notice the existence at least more than 20 different projects using such framework.

Lately, I have moved from WEKA to D2K, a data mining framework developed by the Automated Learning Group at the National Center for Supercomputing Applications. It is again pure Java. The thing I like the most about D2K, and one of the reasons for switching, is the data flow oriented paradigm that it uses. Using an intuitive graphical editor, complicated analysis and visualization task are rapidly deployed by simple drag & drop. I have been heavily using D2K in the DISCUS project, and I have no regrets about not using WEKA anymore. Only good words is what I have about D2K's quality and how much effort the ALG people put into it to make a great package easy to extend and customize.

Another tool I want to mention is a pretty specialized library. LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). They provide sources in C++, Java, and C# .NET, and interfaces to Python, R , Matlab, Perl, and Ruby interfaces. I have been using it in some of my recent research, and if you are interested in such areas, I definitely recommend you to take a look at it.

And these leads me to one of my favorite tools, R. R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. Contributed packages include all sorts of tools. I just want to point out the project on graphical models. I would recommend such tool to anyone who wants to speed up the analysis of the results that his/her GAs generate.


Martin Pelikan's book on hierarchical Bayesian optimization algorithm

Martin Pelikan's book titled "Hierarchical Bayesian Optimization Algorithm: Toward a New Generation of Evolutionary Algorithms" is now available. In his book, Martin presents a principled method for designing and analyzing scalable genetic algorithms that can solve boundedly-difficult, hierarchically-decomposable problems in polynomial (usually sub-quadratic) time.

More details regarding hBOA and to download the software visit the hBOA page and Martin's webpage.


ISGEC announces its transition to SIGEVO

GECCO 2005 sponsor ISGEC announces its transition to ACM SIGEVO. This means that the proceedings of this year’s conference will be included in the ACM Digital Library. Members in ISGEC automatically became members in SIGEVO.

Those with artistic or creative talent should try their skills at designing the new SIGEVO logo as the competition ends February 7th. Check out the rules here.


GA-Based System saves lives in Iraq

A Business Week article gives an overview of BBN's Boomerang project now fielded in Iraq to protect Allied troops by detecting and locating sniper fire. BBN chief technologist Steve Milligan's efforts to sign and initiate the project are discussed at length in the project:

Milligan thought it was possible, and on Nov. 17, 2003, BBN signed the contract with DARPA. The company's researchers knew that the physics was straightforward. Like supersonic airplanes, bullets create shock waves -- mini-sonic booms -- as they speed through the air. So if engineers arrange seven microphones like the spines of a sea urchin, a shock wave from a bullet will hit each microphone at a slightly different time, like a wave lapping against different pebbles on a beach. By measuring those time differences, it's possible to calculate the trajectory.

As it turned out, "it was much harder than we thought it would be," says Milligan. Not only is the math difficult, but the system also had to work in the cacophony of urban warfare, including echoes from shots -- and do it on the move.
This story is fine as far as it goes, but as GA afficianadoes well know, the BBN effort was supported by a crack team of card-carrying genetic algorithmists. An article written by BBNers Hussain, Montana, Brinn & Cerys entitled Genetic algorithms for UGV navigation, sniper fire localization, and unit of action fuel distribution was presented last June at GECCO 2004 at the Workshop on Military and Security Applications of Evolutionary Computation (MSAEC-2004). The paper details the development of Boomerang and other GA-based military-security applications. These efforts are the result of longstanding GA work at BBN initiated by Dave Davis (now of NuTech Solutions) and carried on by Dave Montana and other hardworking GA types. It is difficult enough working in the obscure vineyards of GAs and EC, but when a group used to toiling in obscurity does something deserving of national recognition, it would be nice if the mainstream business press would get the story right and at least mention the names of the people who really got the job done.

This year's Genetic and Evolutionary Computation Conference (GECCO-2005) is 25-29 June 2005 in Washington DC. This year's Military and Security Applications of Evolutionary Computation Workshop (MSAEC-2005) is held at GECCO-2005 on Saturday, June 25, 2005.

Wednesday, January 26, 2005


Mixing rate, population sizing, and convergence time

While working on papers for GECCO, I revisited the mixing issue, and here's some thoughts.
Experiments showed that a lower mixing rate increases both the population size needed and the convergence time. However, how much it affects each is not well known. Goldberg, Thierens, & Deb (1993) calculated the mixing time for allele-wise crossover and a given population size. Part of the work of the relationship between the mixing rate and the population size needed can be found in Sastry and Goldberg (2002). Part of the work of the relationship between the mixing rate and the convergence time can be found in my recent technical report (coming soon). However, the above papers show only preliminary results, and I believe further investigations would be nice to enrich the GA theory, and help us to develop a better GA.


Two pseudo-random number generators for use in GEC

Here are two good and scalable pseudo-random number generators that I use in my GA code, which might be useful for other GEC members as well.
The pLab webpage is a good source for information on different RNGs.


Revisiting Pittsburgh

Yesterday, I was reading again Jaume's paper on incremental learning. A simple idea such as a round robin of small disjoint training sets greatly helps Pittsburgh classifier systems in the quest for generality. On the Michigan approach, however, such issue was settled down after the idea of evolving classifiers based on accuracy by Wilson (now celebrating the 10th anniversary of its publication).

The more I think about it, the more I wonder how Wilson's ideas, that gave birth to XCS, could be carried over the Pittsburgh realm, where most of the work done has focused only on evolving rule sets based on the overall performance (pure classification accuracy). If such ideas could be carried over the Pittsburgh realm, the basis for a renewed genetics-based machine learning paradigm would be sketched.


Marketing, interactive GAs, and the company that broke my heart

Matt Syrett's article about using interactive GAs for web placement design reminds me how cool I think interactive GAs are. Ever since Dawkins famous Blind Watchmaker code, and Caldwell and Johnston's Faceprints work at ICGA 91, the idea of having user-driven subjective functions in place of predetermined objective functions has opened the door to evolving art, music, poetry, and more. Of course, a lot of water has passed over the iGA dam since the early days, and for those interested in a fairly recent update on the state of interactive evolutionary computation, I recommend Hideyuki Takagi's article.

Over a number of years, I've followed research in iGAs and some time ago, I started to collect information about market research and product development with the idea of founding a company to do just that using interactive GA technology. My dreams of a great startup were dashed one semester when during my genetic algorithms course, some students researching their class project uncovered a company that was already vigorously pursuing interactive GAs in marketing applications. That company, Affinova, has married iGAs and practical market research and product development notions to create a unique line of products and services. Subsequently I met company executive Rob Frasca in another context, and got the chance to tell him how Affinova was the company that broke my entrepreneurial heart. For those interested in a nice case study of marrying GAs and business read the Affinova white paper here.

Tuesday, January 25, 2005


Growth in market for GAs

IDC, a large IT market-research organization, predicts substantial growth in the predictive analytics software market, which includes genetic algorithms, according to an article posted on Ebiz:

IDC defines predictive analytics software to “include all analytics, both tools and packaged applications, more complex in their mathematics than core analytics.”

“Predictive analytics are used to determine the probable future outcome of an event or the likelihood of a current state where it is unknown,” the study continues. “These analytics include data mining, clustering, decision trees, market basket analysis, regression modeling, neural nets, genetic algorithms, text mining, hypothesis testing, decision analytics, and so

With IDC predicting compounded growth of 8% per year in the predictive analytics market, perhaps now is the time for some of us to become more entrepreneurial like GA/EC-based companies such as Engineous Software, Machine Insight, NuTech Solutions, and Schema.


Niching methods, Mahfoud's population-sizing, and scalability of GAs

While working on papers for GECCO, I was re-reading Sam Mahfoud's paper titled "population sizing for Sharing methods". The population-sizing estimate though derived for fitness sharing, also holds for other niching methods such as restricted tournament selection as well.

Since niching mechanism is one of the key components of hierarchical and multiobjective genetic algorithms, the population-sizing paper deserves renewed attention. As Martin and I recently found out the hard way—to which prof. Goldberg replied: 'if you had talked to me before doing the experiments, I would have told you so' :-) —the scalability of the niching method played an important role in the overall scalability of multiobjective hBOA.

Monday, January 24, 2005



Welcome to the Illinois Genetic Algorithm Laboratory (IlliGAL) blog site. In April 1994, a graduate student named Kate Sherwood (Ducky) put up the first IlliGAL web page. To my discredit, I didn't understand why she was so fascinated by the web, but in my defense, I didn't get in her way. Since then, the IlliGAL web site has grown and users from countries around the world generate some 8.4 million hits in nearly 200,000 visits annually. The website has undergone kaizen or continual improvement since Ducky's early efforts, and this blog represents the latest in a series of new features to be added to the site over the years.

Starting today, past and present IlliGAL lab members and affiliates will team up to blog on genetic algorithms, evolutionary computation, and related subjects. In the last American election cycle, blogging grew as a force to rival the mainstream media. In less well followed topics, blogging is growing at astonishing rates, and we hope that IlliGAL Blogging will become a resource and focal point for intelligent discourse in the growing community of genetic and evolutionary computation (GEC).

So let the blogging begin. As with other blogs, reader commentary strongly shapes the agenda, so join us with your comments, or join us with your own GEC-related blog, but join us.

This page is powered by Blogger. Isn't yours?