I want to statistically analyse outputfiles from a benchmark that runs on 600 nodes. In particular, I need the min, upper quartile, median, lower quartile, min and mean values. My output are the files testrun16-[1-600]
with the code:
ListofFiles = system('dir testrun16-*') set print 'MaxValues.dat' do for [file in ListofFiles]{ stats file using 1 nooutput print STATS_max } set print 'upquValues.dat' do for [file in ListofFiles]{ stats file using 1 nooutput print STATS_up_quartile } set print 'MedianValues.dat' do for [file in ListofFiles]{ stats file using 1 nooutput print STATS_median } set print 'loquValues.dat' do for [file in ListofFiles]{ stats file using 1 nooutput print STATS_lo_quartile } set print 'MinValues.dat' do for [file in ListofFiles]{ stats file using 1 nooutput print STATS_min } set print 'MeanValues.dat' do for [file in ListofFiles]{ stats file using 1 nooutput print STATS_mean } unset print set term x11 set title 'CLAIX2016 distribution of OSnoise using FWQ' set xlabel "Number of Nodes" set ylabel "Runtime [ns]" plot 'MaxValues.dat' using 1 title 'maximum value', 'upquValues.dat' title 'upper quartile', 'MedianValues.dat' using 1 title 'median value', 'loquValues.dat' title 'lower quartile', 'MinValues.dat' title 'minimum value', 'MeanValues.dat' using 1 title 'mean value'; set term png set output 'noises.png' replot
I gain these values and can plot them. However, the tuples from each run get mixed up. The mean of testrun16-17.dat
is plotted on x=317
, it’s min is also at another place.
How can I save the output but keep the tuples together and plot each node on it’s actual place?
Advertisement
Answer
Windows (and Linux?) might have some special way to sort (or unsort) data in a directory list. To eliminate this uncertainty you can loop your files by number. However, this assumes that all numbers from 1 to maximum (=FilesCount
, in your case 600) actually exist.
You tagged Linux, sorry, but I only know Windows and the command to get a list of only the filenames in Windows is 'dir /B testrun16-*'
.
Is there a special reason why you write the statistic numbers in 7 different files? Why not into one file?
Something like this: (modified after OP comment)
### batch statistics reset session FileRootName = 'testrun16' FileList = system('dir /B '.FileRootName.'-*') FilesCount = words(FileList) print "Files found: ", FilesCount # function for extracting the number from the filename GetFileNumber(s) = int(s[strstrt(s,"-")+1:strstrt(s,".dat")-1]) set print FileRootName.'_Statistics.dat' print "File Max UpQ Med LoQ Min Mean" do for [FILE in FileList] { stats FILE u 1 nooutput print sprintf("%d %g %g %g %g %g %g", GetFileNumber(FILE), STATS_max, STATS_up_quartile, STATS_median, STATS_lo_quartile, STATS_min, STATS_mean) } set print plot FileRootName.'_Statistics.dat' u 1:2 title 'maximum value', '' u 1:3 title 'upper quartile', '' u 1:4 title 'median value', '' u 1:5 title 'lower quartile', '' u 1:6 title 'minimum value', '' u 1:7 title 'mean value' ### end of code