IlliGAL Blogging: January 2005

Monday, January 31, 2005

Some helpful scripts

In a recent post, Xavier talked about R, the excellent statistical package. I have used it extensively to compute statistical tests, and I made a script to automatize most of the testing process.

The script performs pairwise t-tests to compare multiple methods on several datasets. It uses the Bonferroni correction to ajust the significance level of the tests for the multiple comparisons. It supposes that the results are paired, like the results obtained, for instance, from a stratified ten-fold cross-validation test. The script reads a text file with the following format:

<dataset1> <method1> <numerical result 1>
<dataset1> <method1> <numerical result 2>
.
.
.
<dataset1> <method2> <numerical result 1>
.
.
.


--------------------------------------------------------------

data<-read.table("results_file.dat")

  for(dataset in levels(data$V1)){

     print(sprintf("%s",dataset));

     data2<-data[data$V1==dataset,];

     res<-pairwise.t.test(data$V3,data$V2,p.adj=

             "bonf",pool.sd=FALSE,paired=TRUE)$p.value;

     rows<-dimnames(res)[[1]];

     cols<-dimnames(res)[[2]];

     for(m1 in rows){

        for(m2 in cols){

           val<-res[m1,m2];

            if(!is.na(val) && val<0.05){

              ave1<-mean(data2[data2$v2==m1,]$v3);

              ave2<-mean(data2[data2$v2==m2,]$v3);

              if(ave1>ave2){

               print(sprintf("Sig %s %s -> %s",dataset,m1,m2))

            }else{

               print(sprintf("Sig %s %s -> %s",dataset,m2,m1))

            }

         }

      }

   }

}

--------------------------------------------------------------

The script prints a message for each significant difference observed in the results, using a confidence level of 95%

Another script I use very often is a perl wrapper for gnuplot, the plotting program from the GNU project. The script generates draft plots just to check how do the results look, without the overhead of launching other programs that can generate more fancy plots like Matlab.

--------------------------------------------------------------

#!/usr/bin/perl -w



open(FH,"gnuplot -persist") or die "Can't exec gnuplot";



print FH "set encoding iso_8859_1\n";

print FH "set data style lines\n";



my $end=0;

my $index=0;

my $line="plot ";



while(not $end) {

        if(defined $ARGV[$index]) {

                $line.="\"$ARGV[$index]\" with errorbars,";

                $index++;

        } else {

                chop $line;

                $line.="\n";

                $end=1;

        }

}

print FH $linea;

close(FH);

------------------------------------------------------------

This script receives as parameters a variable-length number of data files to overlap in the same plot. The contents of each file should be a stream of lines with the
following format:
<X value> <Y value> <Y max> <Y min>

The program plots the points specified by the data files connected by lines and with errorbars (specified by <Y max> and <Y min>)
The plot is shown in the current display.

I hope this helps

# posted by Jaume @ 1:51 PM 0 comments

Unnatural selection article gets blog traction

Sam Williams's recent article "Unnatural Selection" in Technology Review is getting some blog traction. Franz Dill at the IFTF blog abstracted the original article and made comment about my comment. As an aside, I'm getting to be something of an IFTF blog fan. The report on P&G's cool marketing usage of social networking is a recent goody, for example. Other sites in on the action include Dark Government News Port, Kinja, Link Filter, Robots. net and Smart Mobs. Technorati.com is a search engine dedicated to the blogosphere for those wishing to track what's going on. Thanks also to Blojj for direct blog comment on IlliGAL Blogging.

# posted by David E. Goldberg @ 8:40 AM 9 comments

Sunday, January 30, 2005

ISGEC to SIGEVO: From Outlaws to Mainstream?

I want to comment on Paul Winward's brief post regarding ISGEC becoming ACM's newest SIG, SIGEVO. I've been heavily involved in the process of this transformation, and so my views are not entirely unbiased (no one's views are entirely unbiased or so any good postmodernist would claim), but I think this move is good for ISGEC, good for ACM, good for the field of genetic and evolutionary computation, and good for computer science. For too long, GAs and EC more generally have lived in the backwaters of academic and industrial life. On the one hand, it is hard to understand why this is. After all, evolution is one of the biggest ideas in all of science, and the impact of modern genetics is hard to overstate, so it seems quite natural that computations based on evolution and genetics should have an important place in science and technology. Why, then, have GAmists and ECers been largely relegated to second tier schools, non-tenured posts, or non-CS departments (even, god forbid, engineering departments)?

To some extent it's a matter of bad timing. First wave evolutionary computation researchers (and cyberneticists of all stripes) in the late 50s and 60s were just getting their sea legs under them when Minsky and Papert's premature and misleading hatchet job on neural networks appeared and took the wind out of cybernetic sails generally. Moreover, computers of the day weren't entirely up to the tasks put to them, and a number of prominent pioneering GA/ECs works were prematurely dismissed as no big deal. Finally, the rise of symbolic AI and the subsequent cybernetics winter made it academically disreputable to persist in the "folly" of genetic algorithms and evolutionary computation, but persist the field did, and we are here today because of the courage of a small group who swam against the intellectual currents of the 60s, 70s, and 80s.

Another reason for the current situation is that human lifespans are no longer a good match to the turnover in intellectual ideas. This brings to mind the old story told of physicist Max Plank. When he was asked how the acceptance of his revolutionary ideas in quantum physics was going, it is reported that he said, "Wonderful, lots of funerals." In Plank's day, perhaps the natural turnover in faculty matched the turnover of ideas, but in our time, rapid change in thinking has not been matched with a concomitant shortening of faculty life expectancy. The result is large numbers of powerful faculty in charge with ideas in mind that are more than a little behind the time. Unfortunately, the most obvious solutions to this problem are serious felonies, and no one is here suggesting that steps be taken to cull the herd.

So, it is in this sense, that the acceptance of SIGEVO by ACM couldn't come at a better time, and should be seen as a very positive thing. With GECCO as an ACM conference proceedings, young faculty can safely put their work there without endlessly defending their choice. With ACM affiliated journals in EC and GP, department heads have the moral equivalent of a Good Housekeeping seal of approval on our field to rely upon instead of the pleas of a lone faculty member up for tenure. More importantly, if SIGEVO is seen as part of ACM and CS, it will become easier for CS department heads to hire faculty with GA/EC credentials.

Of course, department heads when asked about tenure and promotion will tell you that each case is special and is scrutinized without regard for brand names. Department heads have told me that the decision to move from ISGEC to SIGEVO will make absolutely no difference to them. But consumers will tell you that their decisions to buy products are also made on a case-by-case basis without regard for brands--as their grocery baskets fill with Crest, Tide, Mach 3, Ragu, Pepsi, Twinkies, and Tide. In a busy world, trusted brands allow buyers to get quality products at low risk of error and low search costs. In the new world of ACM and SIGEVO, CS departments and their heads will be able to rely on a trusted brand in one of the most important decisions they make, the hiring, retention, and promotion of faculty.

It will take some time to know the overall effect, and indeed, joining a large bureaucratic organization like ACM will come with its share of constraints and costs. But, in the end, I believe the young people in the PhD pipeline will benefit immensely through higher probability of hiring, better chances to tenure, and improved prospects of funding as the once outlaw field of genetic algorithms and evolutionary computation comes in off the range, hangs up its six-shooter, and becomes a law-abiding denizen of the mainstream of computer science.

# posted by David E. Goldberg @ 7:27 PM 16 comments

Saturday, January 29, 2005

"Computer sentience is possible," says Holland

Science Daily posts a University of Michigan press release entitled Falling Prey to Machines? in which John Holland discusses the connection between contemporary science fiction such as Michael Crichton's Prey, genetic algorithms, and the possibility of sentient computation.

"Computer sentience is possible," said John Holland, professor of electrical engineering and computer science and professor of psychology at the University of Michigan. "But for a number of reasons, I don't believe that we are anywhere near that stage right now."

Readers of this blog need no introduction to Holland, but they might be somewhat surprised by his views regarding the possibility of computers rivaling human intelligence.

According to Holland, the problem with developing artificial intelligence through things like genetic algorithms is that researchers don't yet understand how to define what computer programs should be evolving toward. Human beings did not evolve to be intelligent--they evolved to survive. Intelligence was just one of many traits that human beings exploited to increase their odds of survival, and the test for survival was absolute. Defining an equivalent test of fitness for targeting intelligence as an evolutionary goal for machines, however, has been elusive. Thus, it is difficult to draw comparisons between how human intelligence developed and how artificial intelligence could evolve.

"We don't understand enough about how our own human software works to come even close to replicating it on a computer," says Holland.

According to Holland, advances in software have not kept pace with the exponential improvements in hardware processing power, and there are many artificial intelligence problems that cannot be solved by simply performing more calculations. While hardware performance continues to double almost every year and a half, the doubling time for software performance is at least 20 years.

"In the final analysis, hardware is just a way of executing programs," says Holland. "It's the software that counts."

Hat tip to Open Source Protein Structure Prediction for picking up the news release and for republishing Cosma Shalizi's lovely review of Holland's Emergence.

# posted by David E. Goldberg @ 6:34 AM 9 comments

Canadian researchers apply GAs in mining

The Toronto Globe and Mail reports that researchers at Laurentian University are using genetic algorithms to schedule mining of ore deposits. The article is remarkably bereft of details even by MSM standards. It is unclear from the article what the GA is actually going to do. No mention is made of who the reseachers are or what department, college, or lab they come from. Nonetheless, a Google search on the search terms, "genetic algorithms," "mining," and "Laurentian University" quickly leads to the following link. The Laurentian University Mining Automation Laboratory (LUMAL) under the direction of Nick Vayenas is applying simulation, robotics, and genetic algorithms to various mining problems. We can pray for crisper reporting from the the Globe and Mail, but in the meanwhile, odds are that this is the GA mining work that was so cryptically cited.

# posted by David E. Goldberg @ 6:07 AM 1 comments

Friday, January 28, 2005

Are competent learning classifier systems emerging?

Estimation of distribution algorithms (EDAs) have changed the way people approach to evolutionary algorithms. Recently, such approaches are slowly infiltrating in the learning classifier systems world.

Early efforts of Sierra, Jiménez, Inza, Larrañaga, and Muruzábal showed that such algorithms may also be used in genetics-based machine learning approaches. They build a simple Pittsburgh approach classifier based on EDAs. Recently, however, such approaches have hit the main stream of LCS, XCS.

The usage of EDAs can help addressing two major issues in XCS: (1) knowledge extraction and (2) structure identification. Knowledge extraction addresses the issue of mining problem knowledge from the final solution developed by XCS. The goal is to identify most important features in the problem and the dependencies among those features. The extracted knowledge may not only be used for further data mining, but may actually be re-fed into the system giving it further competence in solving problems in which dependent features, that is, building blocks, need to be processed effectively. A paper proposing to extract a feature dependency tree out of the developed rule-based problem representation of XCS may be found here.

# posted by Unknown @ 5:06 PM 2 comments

Green GAs and the random keys guy

The Oregon Daily Emerald reports that James Bean, noted operations researcher, genetic algorithmist, and inventor of the random keys technique for permutation and other combinatorial problems, is using genetic algorithms to help develop sustainable automobile technology in his new role as Dean of the Lundquist College of Business. The research was begun at Michigan in conjunction with the EPA and General Motors. Dr. Bean was on the faculty of the Department of Industrial and Operations Engineering (IOE) of the University of Michigan for many years, where he also held the position of Associate Dean for Academic Affairs.

# posted by David E. Goldberg @ 4:04 PM 0 comments

J-H Chen PhD takes top prize in Taiwan

Genetic algorithms researcher and IlliGAL alumnus, Jian-Hung Chen, recently received top honors in Taiwan's computer science PhD dissertation competition. His thesis, entitled Theory and Applications of Efficient Multi-Objective Evolutionary Algorithms (CS, Feng Chia University, 2004) was ranked number one among all CS dissertations in a national competition held under the auspices of the Institute of Information & Computing Machinery (IICM). The work is a delightful amalgam of theory and practice. Chen takes the methodology of The Design of Innovation and applies it to understanding the approximate complexity of multiobjective evolutionary algorithms (MOEAs). Thereafter, he successfully applies carefully designed MOEAs to problems in production planning and design of nearest-neighbor classifiers. The dissertation concludes with novel theory and empirical work that carry over the method of fitness inheritance to MOEAs. Please join me in congratulating JH for his outstanding work. Better yet, as he is in the middle of a postdoc/job search, send your job offers to him here.

# posted by David E. Goldberg @ 3:11 PM 0 comments

Thursday, January 27, 2005

Data mining tools

Usually, when I work on data mining problems using genetics-based machine learning, I tend to compare the results with the ones obtained using non evolutionary methods. I know Martin and Jaume have also been using some of this tools too in their data mining related papers.

The first one I started using was WEKA. It has a nice collection of a classification, regression, and clustering algorithms. Written in Java, it is easy to use, providing a flexible environment for rapid preliminary filtering and analysis of raw data. Recently, I have notice the existence at least more than 20 different projects using such framework.

Lately, I have moved from WEKA to D2K, a data mining framework developed by the Automated Learning Group at the National Center for Supercomputing Applications. It is again pure Java. The thing I like the most about D2K, and one of the reasons for switching, is the data flow oriented paradigm that it uses. Using an intuitive graphical editor, complicated analysis and visualization task are rapidly deployed by simple drag & drop. I have been heavily using D2K in the DISCUS project, and I have no regrets about not using WEKA anymore. Only good words is what I have about D2K's quality and how much effort the ALG people put into it to make a great package easy to extend and customize.

Another tool I want to mention is a pretty specialized library. LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). They provide sources in C++, Java, and C# .NET, and interfaces to Python, R , Matlab, Perl, and Ruby interfaces. I have been using it in some of my recent research, and if you are interested in such areas, I definitely recommend you to take a look at it.

And these leads me to one of my favorite tools, R. R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. Contributed packages include all sorts of tools. I just want to point out the project on graphical models. I would recommend such tool to anyone who wants to speed up the analysis of the results that his/her GAs generate.

# posted by Unknown @ 4:19 PM 12 comments

GA-Based System saves lives in Iraq

A Business Week article gives an overview of BBN's Boomerang project now fielded in Iraq to protect Allied troops by detecting and locating sniper fire. BBN chief technologist Steve Milligan's efforts to sign and initiate the project are discussed at length in the project:

Milligan thought it was possible, and on Nov. 17, 2003, BBN signed the contract with DARPA. The company's researchers knew that the physics was straightforward. Like supersonic airplanes, bullets create shock waves -- mini-sonic booms -- as they speed through the air. So if engineers arrange seven microphones like the spines of a sea urchin, a shock wave from a bullet will hit each microphone at a slightly different time, like a wave lapping against different pebbles on a beach. By measuring those time differences, it's possible to calculate the trajectory.

As it turned out, "it was much harder than we thought it would be," says Milligan. Not only is the math difficult, but the system also had to work in the cacophony of urban warfare, including echoes from shots -- and do it on the move.

This story is fine as far as it goes, but as GA afficianadoes well know, the BBN effort was supported by a crack team of card-carrying genetic algorithmists. An article written by BBNers Hussain, Montana, Brinn & Cerys entitled Genetic algorithms for UGV navigation, sniper fire localization, and unit of action fuel distribution was presented last June at GECCO 2004 at the Workshop on Military and Security Applications of Evolutionary Computation (MSAEC-2004). The paper details the development of Boomerang and other GA-based military-security applications. These efforts are the result of longstanding GA work at BBN initiated by Dave Davis (now of NuTech Solutions) and carried on by Dave Montana and other hardworking GA types. It is difficult enough working in the obscure vineyards of GAs and EC, but when a group used to toiling in obscurity does something deserving of national recognition, it would be nice if the mainstream business press would get the story right and at least mention the names of the people who really got the job done.

This year's Genetic and Evolutionary Computation Conference (GECCO-2005) is 25-29 June 2005 in Washington DC. This year's Military and Security Applications of Evolutionary Computation Workshop (MSAEC-2005) is held at GECCO-2005 on Saturday, June 25, 2005.

# posted by David E. Goldberg @ 8:11 AM 29 comments

Wednesday, January 26, 2005

Revisiting Pittsburgh

Yesterday, I was reading again Jaume's paper on incremental learning. A simple idea such as a round robin of small disjoint training sets greatly helps Pittsburgh classifier systems in the quest for generality. On the Michigan approach, however, such issue was settled down after the idea of evolving classifiers based on accuracy by Wilson (now celebrating the 10th anniversary of its publication).

The more I think about it, the more I wonder how Wilson's ideas, that gave birth to XCS, could be carried over the Pittsburgh realm, where most of the work done has focused only on evolving rule sets based on the overall performance (pure classification accuracy). If such ideas could be carried over the Pittsburgh realm, the basis for a renewed genetics-based machine learning paradigm would be sketched.

# posted by Unknown @ 2:04 PM 0 comments

Marketing, interactive GAs, and the company that broke my heart

Matt Syrett's article about using interactive GAs for web placement design reminds me how cool I think interactive GAs are. Ever since Dawkins famous Blind Watchmaker code, and Caldwell and Johnston's Faceprints work at ICGA 91, the idea of having user-driven subjective functions in place of predetermined objective functions has opened the door to evolving art, music, poetry, and more. Of course, a lot of water has passed over the iGA dam since the early days, and for those interested in a fairly recent update on the state of interactive evolutionary computation, I recommend Hideyuki Takagi's article.

Over a number of years, I've followed research in iGAs and some time ago, I started to collect information about market research and product development with the idea of founding a company to do just that using interactive GA technology. My dreams of a great startup were dashed one semester when during my genetic algorithms course, some students researching their class project uncovered a company that was already vigorously pursuing interactive GAs in marketing applications. That company, Affinova, has married iGAs and practical market research and product development notions to create a unique line of products and services. Subsequently I met company executive Rob Frasca in another context, and got the chance to tell him how Affinova was the company that broke my entrepreneurial heart. For those interested in a nice case study of marrying GAs and business read the Affinova white paper here.

# posted by David E. Goldberg @ 8:44 AM 20 comments

Tuesday, January 25, 2005

Growth in market for GAs

IDC, a large IT market-research organization, predicts substantial growth in the predictive analytics software market, which includes genetic algorithms, according to an article posted on Ebiz:

IDC defines predictive analytics software to “include all analytics, both tools and packaged applications, more complex in their mathematics than core analytics.”

“Predictive analytics are used to determine the probable future outcome of an event or the likelihood of a current state where it is unknown,” the study continues. “These analytics include data mining, clustering, decision trees, market basket analysis, regression modeling, neural nets, genetic algorithms, text mining, hypothesis testing, decision analytics, and so
on.”

With IDC predicting compounded growth of 8% per year in the predictive analytics market, perhaps now is the time for some of us to become more entrepreneurial like GA/EC-based companies such as Engineous Software, Machine Insight, NuTech Solutions, and Schema.

# posted by David E. Goldberg @ 4:10 PM 4 comments

Monday, January 24, 2005

Welcome

Welcome to the Illinois Genetic Algorithm Laboratory (IlliGAL) blog site. In April 1994, a graduate student named Kate Sherwood (Ducky) put up the first IlliGAL web page. To my discredit, I didn't understand why she was so fascinated by the web, but in my defense, I didn't get in her way. Since then, the IlliGAL web site has grown and users from countries around the world generate some 8.4 million hits in nearly 200,000 visits annually. The website has undergone kaizen or continual improvement since Ducky's early efforts, and this blog represents the latest in a series of new features to be added to the site over the years.

Starting today, past and present IlliGAL lab members and affiliates will team up to blog on genetic algorithms, evolutionary computation, and related subjects. In the last American election cycle, blogging grew as a force to rival the mainstream media. In less well followed topics, blogging is growing at astonishing rates, and we hope that IlliGAL Blogging will become a resource and focal point for intelligent discourse in the growing community of genetic and evolutionary computation (GEC).

So let the blogging begin. As with other blogs, reader commentary strongly shapes the agenda, so join us with your comments, or join us with your own GEC-related blog, but join us.

# posted by David E. Goldberg @ 2:35 PM 18 comments

IlliGAL Blogging