Monday, February 28, 2005
Luckily, as I already mentioned on my previous post, nothing will be happening without the unconditional help and support of Chen-Ju Chao, Abhimanyu Gupta, Mohit Jolly, and Davina Lim. This DISCUS group, besides of their hard work, has been a great example of innovation and creativity, in the best DISCUS flavor.
The milestone of the day: Chen-Ju, below, just ten minutes away from nailing down the assistance of all the 50 students that will be part of the marathon. All in less than three days :D
Pot pourri of GA applications
Sunday, February 27, 2005
About writing presumptions in papers
Catching a Portugese digital fish
GeneticGraph for aesthetic layout
More GA thesis blogging
Saturday, February 26, 2005
Clustering for good
“Our arguments consist of two hypotheses: One is the difference of function by gender as a medium of knowledge diffusion in Japan's actual product market, while the other is the difference by generations. We found that certain generation and gender groups play important roles at the early stage of knowledge diffusion regarding new types of products with new technologies, which enable manufacturers to get effective feedback in creating and improving their products. We argue that the social network of this rapid and divergent feedback from the consumer side may explain why Japanese electronic manufacturers can realize high-quality products in short time periods.”Questionnaires were grouped in four different clusters. The experiments that we will be conducting next week involve having enough representatives of each of those clusters. Scheduling people in groups, and making sure that they will be in, is a time consuming job---not to mention a tough scheduling issue.
The job has another added element of difficulty, the unbalanced distribution of participants among the different clusters. The board above shows the unbalanced distribution we face in two of the four clusters. Thus, for the success of the whole experiment, Chen-Ju Chao and Davina Lim---you can see them below---have been working hard to make sure that there are enough participants available from each cluster.
Panda uses "genetic" heuristic engine
Panda has extended its TruPrevent(TM) Technologies and announces the availability of its most innovative heuristic technology against unknown threats: Genetic Heuristic Engine. This new technology integrates correlation of genetic digital signatures and deep code inspection in a single algorithm, patent pending, which scans the code and DNA traces typical of malware.The company website is sufficiently obscure as to prevent this reader from knowomg whether the term "genetic" is used in the sense of a genetic algorithm or in the sense of a genetic metaphor. Do any IlliGAL Blogging readers know more?
During the testing period, this technology has demonstrated remarkable effectiveness, detecting hundreds of new unknown malicious programs (including spyware) that have emerged over the last few months, without needing updates or signatures. By working alongside the rest of the TruPrevent(TM) Technologies, its detection capacity is maximized while false positives are minimized and with a minimum impact on system performance.
GAs, swimming & data mining
Genetic Algorithm blog blogs GAs
The picture above is one I've seen in book jackets before (linked from www.kurzweilai.net).
Making peace with postmodern thought
But the essence of postmodernism (if such a thing can be said to exist) isn't really all that weird. It seems to me that the core of postmodernism is the acceptance of the principle that key facts of life are socially determined (little things like language, money, political and social institutions, that sort of thing) and our agreement (or disagreement) about their constitution is an integral part of their reality.
Of course, if this sort of thing is taken too far, it is a slippery and unacceptable slope toward solipsism, and from the point of view of an engineer or scientist, physics seems pretty darn real irrespective of the observer (Thomas Kuhn and his paradigm shifts notwithstanding). So what's a nice postmodern engineer to do in making peace with the postmodern world?
It seems to me that one sensible thing to do is to go read John Searle's account of all this in The Construction of Social Reality. Searle's brilliant argument preserves physics (brute facts) and delineates them from social facts in a rigorous manner. As with many philosophers, Searle's argumentation is not for the weak of heart, and it is not a light book to be read at the beach. Nonetheless, it seems like just the right antidote for those who might be tempted to take the postmodernism-as-joke thesis a bit too far.
Kevin Kelly lists advances in scientific method
PhD scholarship for Indian nationals at Politecnico di Milano
In July, the Politecnico di Milano will have two PhD studentships in Computer Engineering for Indian candidates. The scholarship includes around 850 euros and accommodation in student dorms.Those interested, please contact Pier-Luca for additional information about the application procedure.
Genetic programming and population sizing
This paper derives a population sizing relationship for genetic programming (GP). Following the population-sizing derivation for genetic algorithms in Goldberg, Deb, and Clark (1992), it considers building block decision making as a key facet. The analysis yields a GP-unique relationship because it has to account for bloat and for the fact that GP solutions often use subsolutions multiple times. The population-sizing relationship depends upon tree size, solution complexity, problem di±culty and building block expression probability.
Dad, is that a good thing?
Friday, February 25, 2005
Genetic and evolutionary computation bibliographies
The Artificial Evolution Conference has been recently added to the collection of EC bibliographies. Additionally, the Artificial Evolution Bibliography can be searched via the search interface.A searchable list of computer-science related bibliographies can be found here, and EC related bibliographies in BibTeX format can be found here. Citeseer, DBLP, and arXiv.org are other good online resources.
Heard on the GA street
Thursday, February 24, 2005
Business blogging on the rise
Rubik's cube via EC
I could probably use EC to attempt to solve the optimization version of that problem. There are still some more definitions that need to be made; if they can be quantified, then it could be attempted. I wonder if EC could find the theoretical minimum of T, not only the cardinality, but the actual moves? If it could, how would you prove it, though? The problem with that is that there are 43 quintillion different configurations of the cube... Ouch.
I saw an EC approach to Rubik's cube back in 1997 when I visited Ingo Rechenberg's Bionik und Evolutionstechnik laboratory at the Technische Universitaet Berlin as part of my first sabbatical. The link here has an Evolution Strategie approach to solving the cube and 9 other ES demos dating back to PPSN 1994.
The image above is of Ingo Rechenberg linked from his web page.
Wednesday, February 23, 2005
The only channel of communication among participants will be the computer-mediated one provided by DISCUS. Participants will be sitting at the lab workstations using DISCUS---yes, I will post some pictures with a lab running at full steam. No other communication among participants will be available. This way, we will be able to analyze totally self-contained communications using KeyGraph and Influence Diffusion Models.
However, this is not a one man’s job. I must thank all the DISCUS people. Without them, we would not even be able to think of doing the experiment we will be conducting next week. They are making possible this experiment a reality, taking even care of the smallest logistics details.
Stewart Wilson: Mover and shaker of LCSs
The picture I wanted to post had Stewart wearing sunglasses, and they were a nice touch. But knowing Stewart they were for UV protection not the pursuit of cool.
Lemmings fans dig GAs
GAUL package updated
Numerous additions and improvements were made. Most notably, island-model genetic algorithms are now available as parallel versions using either MPI or pthreads. Several new demonstration programs were added to the distribution.
More information is available on the GAUL homepage here.
GA used in lung cancer understanding
The crystallographic structure of the EGFR tyrosine kinase domain, solved in complex with erlotinib, was used as a model for the prediction of kinase-inhibitor binding (Protein Data Bank accession code 1M17 [PDB] ).14 The inhibitor and solvent were stripped from the model. We used the AutoDock program, version 3.0,15 to predict binding, first using a model of erlotinib, made by means of the JME molecular-editing feature of the online resource PRODRG.16 The erlotinib test yielded a model for ligand binding highly similar to that seen in the crystal structure. Using the AutoDockTools interface, we used a grid spacing of 0.375Å and 60x50x40 points centered around the catalytic cleft of the enzyme for docking and adopted the genetic algorithm with local search using default settings. Gefitinib and CL-387,785 were then docked with the use of the same protocol. To illustrate potential inhibitor clashes with the T790M mutant, we prepared figures in which threonine at position 790 (T790) is mutated to methionine. We then chose the lowest free-energy cluster that overlapped in the quinazoline moiety with the crystallographic coordinates found for erlotinib binding.
This work cites earlier work in the following reference:
Additional details are available here.
Morris GM, Goodsell DS, Halliday RS, et al. Automated docking using a Lamarckian genetic algorithm and empirical binding free energy function. J Comput Chem 1998;19:1639-1662.
New IlliGAL technical reports
Lanzi, P.-L., Loaicono, D., Wilson, S. W., Goldberg, D. E. (2005). Extending XCSF Beyond Linear Approximation. IlliGAL Report No. 2005006. (Abstract) (Full paper in PS) (Full paper in PDF)
Lanzi, P.-L., Loaicono, D., Wilson, S. W., Goldberg, D. E. (2005). Extending XCSF XCS with Computable Prediction for the Learning of Boolean Functions. IlliGAL Report No. 2005007. (Abstract) (Full paper in PS) (Full paper in PDF)
Lanzi, P.-L., Loaicono, D., Wilson, S. W., Goldberg, D. E. (2005). XCS with Computable Prediction in Multistep Environments. IlliGAL Report No. 2005008. (Abstract) (Full paper in PS) (Full paper in PDF)
Llorà, X., Sastry, K., Goldberg, D. E., Gupta, A., Lakshmi, L. (2005). Combating User Fatigue in iGAs: Partial Ordering, Support Vector Machines, and Synthetic Fitness. IlliGAL Report No. 2005009. (Abstract) (Full paper in PS) (Full paper in PDF)
Other IlliGAL technical reports and publications are available here.
Man in the striped shirt
Red stripes are his clear favorite, although the picture (linked from www.genetic-programming.org) shows a dashing gray-blue stripe. Must have been a special engagement or holiday.
$10,000 in prizes for GEC human-competitive results
- The result was patented as an invention in the past, is an improvement over a patented invention, or would qualify today as a patentable new invention.
- The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.
- The result is equal to or better than a result that was placed into a database or archive of results maintained by an internationally recognized panel of scientific experts.
- The result is publishable in its own right as a new scientific result ¾ independent of the fact that the result was mechanically created.
- The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions.
- The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered.
- The result solves a problem of indisputable difficulty in its field.
- The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs).
The inaugural competition in 2004 awarded $5000 in prize money among six medalists.
In the interest of full disclosure, the writer of this post was a contest judge in 2004 and will serve in that capacity again this year.
RioRoboLab demos autonomous chopper
RioRoboLab director Ram Prasad said work on the autonomous helicopter began last semester as a Capstone Design effort under sponsorship of the U.S. Army White Sands Missile Range. A Capstone Design is an academic requirement for all graduating engineering students. Prasad said although helicopters are the most maneuverable aircraft, they are the most difficult to control when trying to provide stable flight characteristics. The objective is to implement technologies that are bio-inspired and can emulate human behavior to fly the helicopter. In response, students in the RioRoboLab have worked toward developing control systems to allow a model helicopter to attain autonomous flight.
Whether or not genetic algorithms or evolutionary computation were part of the demo is unclear from the article; however, the lab is a part of a larger Rio Grande Soft Computing Institute. The mission of the institute is "to develop and facilitate the application of innovative soft computing technologies for modeling, analysis, prototyping, manufacturing, testing and evaluation of dynamic processes and systems that have use in government and in industry."
Tuesday, February 22, 2005
The evolutionary music of E. R. Miranda
Flickr pool features evolutionary art
GAs mentioned in networking book
Althought Goff's argument is cast in evolutionary terms, his reasoning is more like that of Adam Smith in Wealth of Nations:
Goff: Well, it dawned on me that there were a lot of autonomous entities that were involved in the creation of software -- i.e., individuals and companies. But it also dawned on me that the good things that would probably happen with respect to software, if that fitscape model -- in other words, autonomous agents operating in their own best self interest -- if that sort of model were unleashed from a software perspective, that good things would happen. And part of what I discuss in the book are platforms that tend to give rise to that sort of organic behavior.
The only thing missing is the invisible hand. Although GAs, GP, and EC generally are useful technology, their influence as metaphor for reasoning about population-oriented systems is perhaps just as important.
Monday, February 21, 2005
Rube, Dave, what's the difference?
Sunday, February 20, 2005
Folksonomy, John Holland, and DB design
Use GAs to identify dialects?
Great workshops at GECCO-2005
The Genetic and Evolutionary Computation Conference (GECCO-2005) to be held in Washington, DC, 25-29 June 2005, (Saturday to Wednesday) has a terrific lineup of workshops:
- 4th Annual Workshop on Biological Applications of Genetic and Evolutionary Computation BioGEC)
- Coevolution Discussion Forum
- Evolutionary Algorithms for Dynamic Optimization
- Eighth International Workshop on Learning Classifier Systems (IWLCS-2005)
- Medical Applications of Genetic and Evolutionary Computation (MedGEC)
- Second Workshop on Military and Security Applications of Evolutionary Computation
- Optimization by Building and Using Probabilistic Models (OBUPM-2005)
- Parameter setting in Genetic and Evolutionary Algorithms
- Scalable, Evolvable, Emergent Design and Developmental Systems
- Second Workshop On Self-Organization In Representations For Evolutionary Algorithms
- Theory of Representations
- Undergraduate Student Workshop
A complete listing is available here. Workshop attendance is included at no additional charge in GECCO registration fees. A number of IlliGAL Blogging bloggers are workshop organizers. Perhaps some of them will tell us how their workshop plans are coming along.
Peirce, Burks, and GAs
(I believe Burks is the gentleman seated in the picture, but I can't confirm this in any of the web sources I could find) and founded Michigan's Logic of Computers Group. But Art Burks is also a Peirce scholar and edited the final two volumes of Harvard's Collected Papers of C. S. Peirce (see here) . Here is a listing of some of Art Burks's other books.
Confessions of a Teaching Company fan
If you have to learn this stuff (and you do)
I think Peirce (sometimes refered to as "The American Aristotle") is definately worth our looking into. He was saying things in the late 1800s that thinkers wouldn't revisit for 100 years. His concept of abduction as a third form of inference (in addition to the classical deduction and induction) is particularly interesting.
As Sowa describes Peircian abduction, it can consist of several iterated procedures, including:
"Reuse. Do an associative search for a predefined theory that can be resued for the current problem.
Revise. Find a theory that approximately matches the problem at hand and use belief revision techniques to tailor it for the current situation.
Combine. Search for scattered fragments of knowledge and perform repeated steps of belief revision to combine them into a complete theory."
I think this has some resonances for GAs and learning classifier systems.
Though Peirce saw abduction as a set of real inference procedures, he also noted (appropos to recent posts on random discovery):
""There is a more familiar name for it than abduction, for it is neither more nor less than guessing."
(Note that I will sometimes talk about Peirce on my blog, as will my friend Russell who writes tired fools. But, be warned, these are not strictly scientific or philosophical blogs, and some content may offend!)
Friday, February 18, 2005
Power-law networks and epistasis
Some serious EH blogging in Portugese
Thursday, February 17, 2005
The rise of academic blogging
Since IlliGAL Blogging opened up shop on 24 January, I've been a little surprised that the young avant garde of the EC world hasn't been blogging its brains out, but maybe I'm missing something.To which, Amir at thesilog commented:
During my surfing in blogosphere, I have found few really technical weblogs. This rush of academic weblogs (in AI-related fields) is rather new or at least new in the eye of me! Most previous academic weblogs are something between daily life of an graduate student or professor and their research-related news, e.g. mine weblogs is a sample of it.Yes, this is what I've found, and I think it is rather surprising. Blogging has been around for awhile. I would have thought that dozens of young faculty and hotshot grad students who already were blogging for personal reaons would have turned to academic blogging by now.
And more importantly, it would be nice to discuss about "what a weblog can do for us?". Is it a place to report, or a place to disuss, or place to fast-publishing? What about its formality? Can I claim that I have this XYZ idea if I do not publish a paper in a conference or journal and write a post in my weblog instead?Others need to answer these questions for themselves, but my lab website has always been a place to disseminate work, inform, and influence. Blogging for IlliGAL Blogging is a way to do those things in a manner that can provide more continually updated content. Also the comment facility allows us to have useful online dialogue in a straightforward manner. Previous efforts at interacting with readers were too static or too annoying.
Moreover, I like the informality of blogging, but I've never been one to be overly formal in academic discourse either. This work is fun. I'm not sure why it is necessary to dress up the joy of discovery in stilted language and thereby turn it into a form of drudgery.
I do share Amir's concern with posting new ideas prior to publication. I don't believe blogging takes the place of scholarly publication, and we are holding new results from this site until they are published. Having said this, it has been a long standing policy at IlliGAL to publish papers as tech reports on the web immediately following paper completion. The public nature of that posting has made it difficult for others to claim our ideas as their own, and perhaps contemporaneous posting of new ideas to a blog could serve the same function. This might especially become the case if bloggers continue their habit of citing one another generously (as is the academic ideal).
If this were to take place, rapid exchange of new ideas in blogs could represent a new kind of open-source brainstorming, but I have difficulty believing this process will become the norm. It is hard to imagine a tenure and promotion committee ever looking favorably upon a series of blogposts as being sufficiently serious. On the other hand, much of what is going on right now on the web was difficult to imagine 15 years ago. What do IlliGAL Blogging readers think?
Wednesday, February 16, 2005
Notes 2 Self notes EC Blogs
Supergenes and competence
The supergene conception can significantly (up to 3 times in the most boundary cases) increase the evolution speed. It can also increase the accuracy of solution for the small populations. However these effects are dependent from the chosen population size and maximal number of iterations. While in some cases the use of supergenes can be definitely sensible, we would recommend to try both supergene and non-supergene versions for your specific task.Interestingly, IlliGAL work has demonstrated significant speed ups (sometimes going from exponential scalability to subquadratic) in hard problems in similar way, except IlliGAL competent GAs require no prior knowledge of the problem being solved or the variable interrelationships. This approach works, because initial results are used to determine which variables are interdependent, and then "supergenes" are automatically constructed to solve the problem quickly.
Similar automatic linkage detection or supergene detection could be built into JGAP and other traditional GAs so they automatically give faster, better solutions to hard problems without as much prior problem knowledge or understanding.
Script and program resources listed
Wash mouth out with GA SOAP
Selfish gene in a rice plant
Holland festschrift volume published
Tuesday, February 15, 2005
PostgreSQL uses GAs
Why blog? Hugh Hewitt's answer
The book has useful chapters on the nuts and bolts of blogging, blogging as an organizational tool, and the kinds of blogs Hugh would like to see started himself. His list of do's and don'ts is helpful:
But Luther was living in a new day. Almost immediately after they were posted [Luther's 95 theses, which were originally written in Latin], someone, no one knows exact who. got hold of a copy of Luther's theses, translated the Latin into German, and published them. Thanks to Gutenberg, Luther--and more important, his ideas--were known all over Germany within two weeks, and all over Europe in a month.
The key rules of blogging success and significance are these:In the main, this sounds like good advice to me.
- Post often.
- Link freely.
- Be generous in praise and attribution.
- Don't be long-winded too often, if at all. Brevity is the soul of blogging when you are getting started.
- Paragraphs are your friend.
- Profanity loses audiences.
- Avoid feuds and flame wars.
- At least at the start, skip the comments sections. You end up with the problem of nuts if you are any good.
- Keep the title short and easy to remember so that it is easy to recall and type into the space at the top of the page.
TDD, GP & testing in general
Open source search engine for Java GAs
GAs as a labor of love
Monday, February 14, 2005
Fourteen days to D-day
In the first four days we already collected 20 questionnaires. Our goal is to reach 200 questionnaires by the February 23, and invite 40 participants to join the focus groups. If you are UIUC student or affiliate and want to participate, just send a mail to email@example.com. Surveys can be filled until February 23, and selected candidates will be asked to participate on focus group activities the week from February 28 to March 4.
Ying-ping Chen directs NCTU Natural Computing Lab
The abstract of my PhD thesis
I am posting the abstract of my recent PhD thesis, which was greatly improved from by visit to IlliGAL last spring.
Title: Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time
Pittsburgh genetics-based machine learning (DeJong, Spears, & Gordon, 1993) is, among others (Wilson, 1995; Venturini, 1993), an application of evolutionary computation techniques (Holland, 1975; Goldberg, 1989a) to machine learning tasks. The systems belonging to this approach are characterized by evolving individuals that are complete rule-sets, usually variable-length. Therefore, the solution proposed by these kind of systems is the best individual of the population.
When using this approach, we have to deal with some problematic issues such as controlling the size of the individuals in the population, applying the correct degree of generalization pressure across a broad range of datasets, reducing the considerable run-time of the system, being able to solve datasets with diverse kind of attributes, etc. All these issues become even more critical when applied to modern-day data mining problems.
In this thesis we have the general objective of adapting the Pittsburgh model to handle successfully these kind of datasets. This general objective is split in three: (1) Improving the generalization capacity of the model, (2) Reducing the run-time of the system and (3) Proposing representations for real-valued attributes. These three objectives have been achieved by a combination of four types of proposals:
- Explicit and static default rules
- Windowing techniques for generalization and run-time reduction
- Bloat control and explicit generalization pressure techniques
- The Adaptive Discretization Intervals rule representation for real-valued attributes
Some of these proposals are focused only on a single objective, some others solving partially more than one objective at the same time. All these proposals are integrated in a system, called GAssist (Genetic clASSIfier sySTem).
An experimentation process including a wide range of data mining problems based on many different criteria has been performed. The experiments reported in the thesis are split in two parts. The first part studies several alternatives integrated in the framework of GAssist for each kind of proposal. The analysis of these results leads us to propose a small number of global configurations of the system, which are compared in the second part of the experimentation to a wide range of learning systems, showing how this system has competent performance and generates very reduced and interpretable solutions.
As one of the topics of my reseach is the use of default rules, I am very interested in the work of Rob Smith in this topic.
Separating fitness calculations from algorithms
Sunday, February 13, 2005
GAs dissed at remarkably unreactive
Ian Clarke is playing with GAs
Evolutionary art featured at The Big Picture
Rob Smith posting
Chips That Thrive on Uncertainty
Saturday, February 12, 2005
Epistasis Blog is blogging about..
Breeding Formula One cars with genetic algorithms
Formula One teams pride themselves on their mechanical tweaking skills. But the Digital Biology Interest Group at University College London discovered that they can boost performance by using computers to "breed" the cars.The above story has also been picked up by New Scientist, Science Blog, Scenta, and Innovations Report. A technical paper related to the topic is available here.
But there was no dating, no wooing, not even a messy oil wet spot in this survival-of-the-fastest experiment. The breeding was done solely with computer-generated simulations using genetic algorithms—programs that combine Mother Nature's laws and computer science to mimic the natural process of evolution. ....
For the race-car research project, likely car designs were generated and then tested using a racing simulation designed by Electronic Arts, with virtual replicas of various Formula One racetracks. The researchers configured 68 parameters in the simulation car, which affected suspension, engine performance, tire and brake pressure, fuel consumption and steering control. ...
Bentley said some of the cars that evolved were "clearly on the limits of drivability—only the computer or Michael Schumacher could have driven a car set up in some of the solutions." According to simulations run with the best and most drivable cars, on various tracks, it is possible to shave 88/100th of a second per lap using genetic algorithms to tune the cars. In an industry where 1/100th of a second really matters, that's significant.
Bird Brains Show How Trial and Error May Contribute to Learning Science Blog: "Bird Brains"
Co-evolutionary wiki up and running
At GECCO-04, an informal coevolution discussion was held where many of the terms used in coevolution were discussed. As a result of this, and after substantial further discussion via email, we have set up aCoevolution Wiki which offers descriptions of several main terms. Naturally, all of these descriptions are subject to debate, and the wiki therefore features a dedicated discussion page for each term. The Coevolution Wiki can be visited here:
Contributions to this online discussion, or any other comments, are very welcome!
Signed, Anthony Bucci, Edwin de Jong, Anthony Liekens, Paul Wiegand
I took a look, and the coec-wiki has made a nice start. Go peek yourself.
Blogging and wikis are important content management tools that should receive greater use in the community for information dissemination and exchange. I applaud this and other efforts in the same vein. We may be a little behind the wiki curve in my lab, but we are now using one for internal lab information exchange online. Some of the free tools are quite sophisticated and good.
GA efficiency in 4-part harmony
Polynomial time is usually good news and cause for celebration, but if you are solving big ole problems, the square of a large number is darn huge (yes, Virginia, these are technical terms I'm using here). The square of a thousand is a million, which means that you might have to go off and run 1000s of function evaluations in large problems, which may be impractical if your eval is an expensive simulation, computation, or data-crunching task.
Thus, we see a problem with stopping at seeking GAs that are merely competent. Yes, competence takes us from solutions that are worst-case exponential to polynomial (from intractable to tractable), but to go from tractable to practical requires that we pay attention to enhancing the efficiency of effective (competent) GAs.
At IlliGAL we do this with a four-part decomposition of the efficiency problem:
- Time continuation
- Evaluation relaxation
Erick Cantu-Paz's work was our first principled foray into the issue of parallelization. My paper on time continuation set the stage for Ravi Srivastava's thesis and Kumara Sastry's work with me on BB-mutation. Kumara's MS thesis and Laura Albert's thesis set the stage for our current thinking in evaluation relaxation. Work with Siegfried Voessner in 1999 laid down a barebone's framework for thinking about global-local hybrids and was followed up by Abhishek Sinha's MS thesis. Many of these items are cited in the references of The Design of Innovation (Google searchable version here).
Genetic and evolutionary art
Friday, February 11, 2005
Announcing my Ph.D Thesis
I am posting an abstract of my dissertation.
Even though the thesis is not available on the online,
some related papers can be obtained.
1. A Genetic Algorithm for Shortest Path Routing Problem and the Sizing of Populations
2. Elitism-based Compact Genetic Algorithms
3. Real-coded Bayesian Optimization Algorithm: Bringing the Strength of BOA into the continuous World
Title: Theory, Design, and Application of Efficient Genetic and Evolutionary Algorithms
The goal of this dissertation is to develop efficient optimization algorithms to solve diverse real-world problems of graded difficulty. Genetic and evolutionary mechanisms have been deployed for reaching the goal.
This thesis has made five significant contributions.
Practical guidelines for developing genetic algorithms (GAs) to solve real-world problems have been proposed. This fills a long standing gap between theory and practice of GAs. A practical population-sizing model for computing solutions with desired quality has also been developed. The model needs no statistical information about the problems. It has duly been validated by computer simulation experiments.
The suggested design-guidelines have been followed in developing a GA for solving the shortest path (SP) routing problem. Experimental studies validate the effectiveness of the guidelines. Further, the population-sizing model passes the feasibility test for this application. It appears to be applicable to a wide class of problems.
Elitist compact genetic algorithms (cGAs) have been developed under the framework of simple estimation of distribution algorithms (EDAs). They can deal with memory- and time-constrained problems. In addition, they do not require any prior knowledge about the problems. The design approach enables a typical cGA to overcome selection noise. This is achieved by persisting with the current best solution until, hopefully a better solution is found. A higher quality of solutions and a higher rate of convergence are attained in this way for most of the test problems. The hidden connection between EDAs and evolutionary strategies (ESs) has been made explicit. An analytical justification of this relationship is followed by its empirical verification. Further, a speedup model that quantifies convergence improvement has also been developed. Experimental evidence has been supplied to support the claims.
The real-coded Bayesian optimization algorithm (rBOA) has been proposed under the general framework of advanced EDAs. Many difficult problems -- especially those that can be decomposed into subproblems of bounded difficulty -- can be solved quickly, accurately, and reliably with rBOA. It can automatically discover unsuspected problem regularities and effectively exploit this knowledge to perform robust and scalable search. This is achieved by constructing the Bayesian factorization graph using finite mixture models. All the relevant substructures are extracted from the graph. Independent fitting of each substructure by mixture distributions is then followed by drawing new solutions by independent subproblem-wise sampling. An analytical model of rBOA scalability in the context of problems of bounded difficulty has also been investigated. The criterion that has been adopted for the purpose is the number of fitness function evaluations until convergence to the optimum. It has been shown that the rBOA finds the optimal solution with a sub-quadratic scale-up behavior with regard to the size of the problem. Empirical support for the conclusion has also been provided. Further, the rBOA is found to be comparable (or even better) to other advanced EDAs when faced with nondecomposable problems.
Finally, a competent multiobjective EDA (MEDA) has also been developed by extending the (single-objective) rBOA. The multiobjective rBOA (MrBOA) is able to automatically discover and effectively exploit implicit regularities in multiobjective optimization problems (MOPs). A selection method has been proposed for preserving diversity. This is done by assigning fitness to individuals by domination rank with some penalty imposed on sharing and crowding of individuals. It must be noted that the solution quality is not compromised in the process. It is experimentally demonstrated that MrBOA outperforms other state-of-the-art multiobjective GEAs (MGEAs) for decomposable as well as nondecomposable MOPs.
It is felt that this work will have a major impact on future genetic and evolutionary computation (GEC) research. Our ardent hope is that it will play a decisive role in bringing about a paradigm shift in computational optimization research.
Biota.org, darwin@home & the Alive Prize
It is our hypothesis that compute space is now or soon will be sufficiently rich and complex to support a reasonable "lifelike" simulation of the processes and products of evolution.
The Darwin@Home project is a challenge to multiple, independent teams to construct platforms in software, hardware or a combination, to test this hypothesis. In recent years, several platforms have been built that suggest that this goal is attainable. We believe that by pooling efforts and creating a shared community of interest, we will quicken the journey along the path of innovation.
The project has been picked up in NewScientist.com and eventually the project would like to offer a prize similar to the X prize:
A long term goal of Biota.org has been to create an international prize competition called the AlivePrize. Darwin@Home is a first step along that road by encouraging the community of people developing platforms and providing them resources and intellectual contributions. In a couple of years after the Darwin@Home efforts have matured, we will pursue the goal of financing and managing a competitive prize modeled after the Ansari X-Prize and the DARPA Grand Challenge.
I wonder whether this sort of thing is helpful to the field or merely shameless self-promotion and grandstanding. But perhaps some of you are wondering the same thing about this blog.
Is your GA competent?
IlliGAL Blogging rolls expand
Don't code, evolve
The basic idea in GP is that, instead of evolving solutions to optimization problems, now the operand is programs (mainly encoded in tree structures like LISP). In our lab, Kumara Sastry applies GP to optimization problems in material science. Along the way, he also contributes in GP theory. One of his goals is to develop a competent GP> which respects the linkage in programs. As you can imagine, one of the main challenges here is to properly define linkage and building blocks in GP.
Can you imagine that someday we no longer need to code, and computers will do that for us? Yes, there is still a long way to go, but I believe we are not too far from it.
Thursday, February 10, 2005
Georges Harik, IlliGAL, and Google
Genetic algorithms for automatic map labeling
A python source code for positioning non-overlapping labels using genetic algorithms is available here.
Chance discovery, marketing, and focus groups: DISCUS is comming
We have been designing a big experiment to validate some of our theoretical assumptions and results on innovation and creativity support, and influence diffusion in web-centric social networks. From February 28 to March 4, several focus groups will be using the state-of-the-art DISCUS platform. The goal, to discus about media environments under the guidance of our Hakuhodo colleagues. Around forty, non-DISCUS related, potential customers will form several focus groups to discus about different media environments and how they use them. Besides the intrinsic marketing interest of such experiment, this first big stress test will help us to see, in real time and close to reality, the performance and feedback of the innovation and creativity support tools of DISCUS.
New IlliGAL technical reports
Matsumura, N., Goldberg, D.E., Llorà, X. (2005). Mining Social Networks in Message Boards. IlliGAL Report No. 2005001. (Abstract) (Full paper in PS) (Full paper in PDF).
Lima, C. F., Sastry, K., Goldberg, D. E., Lobo, F. G. (2005). Combining Competent Crossover and Mutation Operators: A Probabilistic Model Building Approach. IlliGAL Report No. 2005002. (Abstract) (Full paper in PS) (Full paper in PDF).
Sastry, K., Abbass, H. A., Goldberg, D. E., Johnson, D. D. (2005). Sub-structural Niching in Estimation of Distribution Algorithms. IlliGAL Report No. 2005003. (Abstract) (Full paper in PS) (Full paper in PDF).
Sastry, K., Pelikan, M., Goldberg, D. E. (2005). Decomposable Problems, Niching, and Scalability of Multiobjective Estimation of Distribution Algorithms. IlliGAL Report No. 2005004. (Abstract) (Full paper in PS) (Full paper in PDF).
Pelikan, M., Sastry, K., Goldberg, D. E. (2005). Multiobjective hBOA, Clustering, and Scalability. IlliGAL Report No. 2005005. (Abstract) (Full paper in PS) (Full paper in PDF)
Other IlliGAL technical reports and publications are available here.
Wednesday, February 09, 2005
Genetic algorithms walking back to their source
GAs picked up in mainstream blogs
Creative evolutionary design tools
Tuesday, February 08, 2005
Take a chance on chance discovery
I first became familiar with the term "chance discovery" on a trip to Japan in December 2001. I was invited to give a series of lectures on genetic algorithms and engineering leadership at the Graduate School of Systems Management of Tsukuba University by my good and longtime colleague Takao Terano. One of the hosts for the visit was a relatively new faculty member named Yukio Osawa, and when I arrived, he filled my ears with the glory of chance discovery. Unfortunately, during that first meeting, Dr. Osawa's zeal for chance discovery went in one of my ears and out the other, and if the situation had not changed, this story would have had an uninteresting ending. But Dr. Osawa is nothing if he is not persistent, and he continued to regale me with tales of chance disovery accomplishment, and somehow he got me to come back to Japan and give a tutorial with him merging GAs and CD topics into one program.
At that second meeting, my ears and my mind opened up, and I came to realize the importance of CD as a subject. Simply put, where much work in data-mining makes hay from high probability co-occurences, CD uses a variety of techniques to elevate and study the unlikely. In so doing, chance discovery focuses on phenomena that may be important in the future, on phenomena that may be an underlying and unrecognized cause, or important background phenomena that just plain deserve further exploration and explanation. As a result, chance discovery appears central to better, more mechanistic, understanding of creativity, smart mobs (to use Rheingold's term), and, more generally, the unexplained.
Since that second meeting, I have become a fan and sometimes practitioner of CD and its extensions. My own work on on collaborative systems couples genetic algorithms (regular, interactive, and human-based) with Keygraph chance discovery to help support organizational innovation (see http://www-discus.ge.uiuc.edu/). Chance discovery continues to chug along as a field in Japan, and the current volume's geographical representation shows that CD is becoming (has become?) a
scientific topic without borders.
In short, the current volume advances the state of chance discovery art, in philosophy, in theory, and in practice. For those who are familiar with chance discovery and uses, it is an indispensible guide to where CD research is and is going. To those who are unfamiliar with the topic, I recommend it as an entry point to an important area. Either way, I urge you to pick up this volume read it, use it, and don't be like me and take another eight months to pay attention and get involved.
I mean it. Go read something about chance discovery.