Thursday, January 27, 2005

 

Data mining tools

Usually, when I work on data mining problems using genetics-based machine learning, I tend to compare the results with the ones obtained using non evolutionary methods. I know Martin and Jaume have also been using some of this tools too in their data mining related papers.

The first one I started using was WEKA. It has a nice collection of a classification, regression, and clustering algorithms. Written in Java, it is easy to use, providing a flexible environment for rapid preliminary filtering and analysis of raw data. Recently, I have notice the existence at least more than 20 different projects using such framework.

Lately, I have moved from WEKA to D2K, a data mining framework developed by the Automated Learning Group at the National Center for Supercomputing Applications. It is again pure Java. The thing I like the most about D2K, and one of the reasons for switching, is the data flow oriented paradigm that it uses. Using an intuitive graphical editor, complicated analysis and visualization task are rapidly deployed by simple drag & drop. I have been heavily using D2K in the DISCUS project, and I have no regrets about not using WEKA anymore. Only good words is what I have about D2K's quality and how much effort the ALG people put into it to make a great package easy to extend and customize.

Another tool I want to mention is a pretty specialized library. LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). They provide sources in C++, Java, and C# .NET, and interfaces to Python, R , Matlab, Perl, and Ruby interfaces. I have been using it in some of my recent research, and if you are interested in such areas, I definitely recommend you to take a look at it.

And these leads me to one of my favorite tools, R. R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. Contributed packages include all sorts of tools. I just want to point out the project on graphical models. I would recommend such tool to anyone who wants to speed up the analysis of the results that his/her GAs generate.

Comments:
Hi there, I thought you might be interest in this is you are interested inbuy ebook I have found a great site aboutbuy ebook

Nioce looking blog by the way
 
Great idea for mortgage leads . Have you triedthese mortgage letters?
 
Good Blog
Please visit
Advanced Business Marketing

and leave a comment.
 
Hi what a great blog,mlm business home free opportunity. If you have time pay me a visit.

href="http://www.homebusinessandvacations.com">mlm business home free opportunity

 
Man there is alot of comment spam I have noticed. Is there any way to remove it from the blogs?
 
Imagine Thousands Of Links Back To Your Web Site From Other People's Blogs!
 
Think that will drive you some extra traffic?
 
Your site looks great! My own site on it research is not as good as yours, currently it is not much more than a huge linklist, but I would appreciate any feedback you may have: it research
 
Hi all, to answer your question: Yes, there is info talking about mlm review and it is worth reading. Great videos explaining everything about mlm review and most important, the money you will make. See you soon. Robert
 
Imagine the power of tens of thousands of other web sites being able to easily
 
This comment has been removed by a blog administrator.
 
Hey, you have a great blog here! I'm definitely going to bookmark you!

I have a gold marketing safelist up site/blog. It pretty much covers gold marketing safelist up related stuff. Plus you can advertise to billions using our gold marketing safelist up tools.

Come and check it out if you get time :-)
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?