Monday, January 31, 2005

 

Some helpful scripts

In a recent post, Xavier talked about R, the excellent statistical package. I have used it extensively to compute statistical tests, and I made a script to automatize most of the testing process.

The script performs pairwise t-tests to compare multiple methods on several datasets. It uses the Bonferroni correction to ajust the significance level of the tests for the multiple comparisons. It supposes that the results are paired, like the results obtained, for instance, from a stratified ten-fold cross-validation test. The script reads a text file with the following format:

<dataset1> <method1> <numerical result 1>
<dataset1> <method1> <numerical result 2>
.
.
.
<dataset1> <method2> <numerical result 1>
.
.
.


--------------------------------------------------------------
data<-read.table("results_file.dat")
for(dataset in levels(data$V1)){
print(sprintf("%s",dataset));
data2<-data[data$V1==dataset,];
res<-pairwise.t.test(data$V3,data$V2,p.adj=
"bonf",pool.sd=FALSE,paired=TRUE)$p.value;
rows<-dimnames(res)[[1]];
cols<-dimnames(res)[[2]];
for(m1 in rows){
for(m2 in cols){
val<-res[m1,m2];
if(!is.na(val) && val<0.05){
ave1<-mean(data2[data2$v2==m1,]$v3);
ave2<-mean(data2[data2$v2==m2,]$v3);
if(ave1>ave2){
print(sprintf("Sig %s %s -> %s",dataset,m1,m2))
}else{
print(sprintf("Sig %s %s -> %s",dataset,m2,m1))
}
}
}
}
}
--------------------------------------------------------------

The script prints a message for each significant difference observed in the results, using a confidence level of 95%

Another script I use very often is a perl wrapper for gnuplot, the plotting program from the GNU project. The script generates draft plots just to check how do the results look, without the overhead of launching other programs that can generate more fancy plots like Matlab.

--------------------------------------------------------------

#!/usr/bin/perl -w

open(FH,"gnuplot -persist") or die "Can't exec gnuplot";

print FH "set encoding iso_8859_1\n";
print FH "set data style lines\n";

my $end=0;
my $index=0;
my $line="plot ";

while(not $end) {
if(defined $ARGV[$index]) {
$line.="\"$ARGV[$index]\" with errorbars,";
$index++;
} else {
chop $line;
$line.="\n";
$end=1;
}
}
print FH $linea;
close(FH);
------------------------------------------------------------

This script receives as parameters a variable-length number of data files to overlap in the same plot. The contents of each file should be a stream of lines with the
following format:
<X value> <Y value> <Y max> <Y min>

The program plots the points specified by the data files connected by lines and with errorbars (specified by <Y max> and <Y min>)
The plot is shown in the current display.

I hope this helps

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?