Sunday 17 January 2010

Further new features in gnuplot 4.4

Today I would like to discuss another new feature in gnuplot 4.4. This will be the notion of "inline" data manipulation. I don't really know what the proper name would be for this feature, so I will just show what it is.

Normally, when one plots a function or file, the command has the following structure
plot 'foo' using 1:2 with line, f(x) with line

That was the old syntax. In the new version of gnuplot, we can insert arithmetic expressions in the plot command as follows

f(x) = a*sin(x)
plot a = 1.0, f(x), a = 2.0, f(x)

Now, this has some implications. First, one has to be a bit careful, because the arithmetic expressions are separated from the actual function by a comma. However, the 'for' loop that we discussed a week ago, reads statements up to the comma, and then returns to the beginning of the statement. In other words,
plot for [i=1:10] a = i, f(x)
will evaluate the expression a = i ten times, and then plots f(x). At that point, the value of 'a' will be 10, therefore, we have only one plot, and that will be 10*sin(x).

The second implication is that the notion of a function has completely changed. What we do in a plot command now is no longer a mapping of the form
x -> f(x)
but rather, the evaluation of a set of instructions, one of which is the above-mentioned mapping. But the crucial point here is that the mapping is not the only allowed statement. The upshot is that "functions" have become a set of operations, and the following statement is completely legal
f(x) = (a = a+1.0, a*sin(x))
a = 0.0
plot for [i=1:10] f(x)

(It is a complete different question, whether this plot makes any sense...)

What we should notice is the fact that now a function can have the form of
f(x) = (statement1, statement2, statement3, return value)
and when the function is called, statement1, statement2, statement3 are evaluated, and 'return value' is returned. We should not underestimate the significance of this! Many things can be done with this. I will show a few of them below.

The first thing I would like to dive into is calculating some statistics of a file. Let us see how this works!

reset
set table 'inline.dat'
plot sin(x)
unset table
num = 0
sum = 0.0
sumsq = 0.0

f(x) = (num = num+1, sum = sum+x, sumsq = sumsq+x*x, x)
plot 'inline.dat' using 1:(f($2))

print num, sum, sumsq
which will print out
100 -1.77635683940025e-15 0.295958848441
We expected this, for the number of samples is 100 by default, and the sum should be 0 in this case.

So, what about finding the minimum and its position in a data file? This is quite easy. All we have to do is to modify our function definition, and insert a statement that determines whether a value is minimal or not.

reset
set table 'inline.dat'
plot sin(x)
unset table

num = 0
min = 1000.0
min_pos = 0
min_pos_x = 0

f(x,y) = ((min > y ? (min = y, min_pos_x = x, min_pos = num) : 1), num = num+1,  y)
plot 'inline.dat' using 1:(f($1, $2))

print num, min, min_pos_x, min_pos
which prints

100 -0.999385 4.74747 73
i.e., the minimum is at the 73rd record (we count from 0), at x = 4.74747, and its value is -0.999385. Note that instead of an 'if' statement, we use the ternary operator to decide whether min, min_pos_x, and min_pos should be updated.

The implementation of calculating the standard deviation, e.g., should be trivial:
sum = 0.0
sumsq = 0.0
f(x) = (num = num+1, sum = sum + x, sumsq = sumsq + x*x, x)
plot 'inline.dat' using 1:(f($1))

print num, sqrt(sumsq/num - (sum/num)*(sum/num))
We have thus seen how the "inline" arithmetic can be used for calculating quantities, e.g., various moments, minima/maxima and their respective positions. These involve the sequential summing or inspection of the data set. But this trick with the function definition can be used for back-referencing, too. This is what we will discuss next.

The trick is to use a construct similar to this

backshift(x) = (prev = pres, pres = x, prev)

which will store the last but one value in the variable 'prev', and return it. That is, the following code shift the whole curve to the right by one
reset
set table 'inline.dat'
plot sin(x)
unset table

pres = 0.0

backshift(x) = (prev = pres, pres = x, ($0 > 0 ? prev : pres))
plot 'inline.dat' using 1:(backshift($2)) with line, '' u 1:2 with line
(In cases like this, we always have to decide what to do with the first/last data record. In this particular case, I opted for duplicating the first record, - this is what happens in the ternary operator - but this is not the only possibility.) If, for some reason, you have to shift the curve by more, you do the same thing, but multiple times. E.g., the following code shifts by 3 places.

backshift(x) = (prev1 = prev2, prev2 = prev3, prev3 = x, prev1)

Once we have this option of back-referencing, we should ask the question what it can be used for. I show two examples for this.
The first example is drawing arrows along a line given by the data set. Drawing arrows one by one is done by using
set arrow from x1,y1 to x2,y2
but we have to use a different method, if we want to plot the arrows from a file. Incidentally, there is a plotting style, 'with vectors', that works as
plot 'foo' using 1:2:3:4 with vectors
where the first two columns specify the coordinates of the beginning, and the second two columns specify the relative coordinates of the vectors. So, it works on four columns. What should we do, if we want to plot vectors from the points in a file. Well, we use the back shift that we defined above. Our script is as follows:
reset
unset key
set sample 30
set table 'arrowplot.dat'
plot [0:3] sin(x)+0.2*rand(0)
unset table

px = NaN
py = NaN
dx(x) = (xd = x-px, px = ($0 > 0 ? x : 1/0), xd)
dy(y) = (yd = y-py, py = ($0 > 0 ? y : 1/0), yd)

plot 'arrowplot.dat' using (px):(py):(dx($1)):(dy($2)) with vector
which results in the following figure:

Note that we used the ternary operator to get rid of the very first data point. This is needed, because the arrows connect two points, that is, there will be one less arrow, than data points.

In the second example, we will turn this around. In my post in last August, plotting the recession, I showed how the background of a plot can be changed, based on whether the the curve is dropping, or increasing. Let us take the following script
reset
set sample 20
set table 'inline.dat'
plot [0:10] exp(-x)+1.0+rand(0)
unset table

unset key

px = 0
py = 1000
dx(x) = (xd = x-px, px = x, xd)
dyn(y) = (yd = y-py, py = y, (yd < 0 ? yd : 1/0))
dyp(y) = (yd = y-py, py = y, (yd >= 0 ? yd : 1/0))

plot 'inline.dat' using (px):(py):(dx($1)):(dyp($2)) with vector nohead lt 1 lw 3, \
px = 0, py = 0, '' using (px):(py):(dx($1)):(dyn($2)) with vector nohead lt 3 lw 3
which creates the following graph
First we produce some data; old trick. Then we take our difference functions, in this case, three of them. The first one is identical to that in the previous script. The second and the third are identical, except that the second returns a sensible value, if and only, if the slope is negative, while the third one returns 1/0, if the slope is negative. Then we just plot our data, making sure that we re-initialise px, and py before the second plot. Simple.

Another utilisation of the back reference can be found on gnuplot's web site, under running averages.

Next time I will try to go a bit further, and demonstrate some other uses of the inline data processing.
Cheers,
Gnuplotter

Sunday 10 January 2010

Plot iterations and pseudo-files

As I promised some time ago, I will discuss some of the new features in gnuplot 4.4. The first one that I would like to show is the concept of iteration in the plot command, and the concept of certain pseudo-files. If you have ever had to script the creation of your plots, you will appreciate these features.

First, let us see what the for loop looks like in the plot command. There are forms of it: once one can loop through integers, while in the other case, one can step through a string of words. So, the iteration either looks like

plot for [var = start : end {:increment}]

or

plot for [var in "some string of words"]

After this introduction, let us see how these can be used in real life. The first example that I will show is that of waterfall plots, i.e., when a couple of curves are plotted on the same graph, and they are shifted vertically. This is common practice, when one has several spectra, and wants to show the effect of some parameter on the spectra.

reset
f(x,a) = exp(-(x-a)*(x-a)/(1+a*0.5))+0.05*rand(0)
title(n) = sprintf("column %d", n)
set table 'iter.dat'
plot [0:20] '+' using (f($1,1)):(f($1,2)):(f($1,3)):(f($1,4)):(f($1,5)):(f($1,6)) w xyerror
unset table

set yrange [0:15]
plot for [i=1:6] 'iter.dat' u 0:(column(i)+2*i) w l lw 1.5 t title(i)

I would like to walk through the script line by line, for there is something unusual in almost each line. So, after re-setting the gnuplot session, we define a function, which will be a Gaussian, whose centre and width is determined by the parameter 'a'. We then plot this function to a file, 'iter.dat', and do it 6 times, and each time with a different parameter, so that the Gaussian is shifted, and becomes broadened. Note, however, that we do this by plotting a special file, '+'. This was introduced in gnuplot 4.4, and the purpose of this special file is that by invoking this, one can use the standard plot modifiers even with functions. I.e., we can specify 'using' for a function. The importance of this is that many plot styles require several columns, and we could not use those plot styles with functions without the '+' pseudo-file. Consider the following example

reset
unset colorbox
unset key
set xrange [0:10]
set cbrange [0:1]
plot '+' using ($1):(sin($1)):(0.5*(1.0+sin($1))) with lines lw 3 lc palette, \
'+' using ($1):(sin($1)+2):($1/10.0) with lines lw 3 lc palette
which produces the following graph

We can thus colour our curve by specifying the colour in the third column of the pseudo-file. Of course, this is only one possibility, and there are many more. If one wants to plot 3D graphs, then the pseudo-file becomes '++', but the concept is the same: the two variables are denoted by $1 and $2, and the function is calculated on the grid determined by the number of samples and the corresponding data range.

Now, back to the iteration loop! We produce 6 columns of data by plotting '+' by invoking a plot style that requires 6 columns. In this case, it is the xyerrorbars. Having created some data, we plot each column, but we call plot only once: the iteration loop does the rest. In each plot, the curve is shifted upwards, and the title is taken from the column number. For specifying the title, we use the function that we defined earlier: it takes an integer, and returns a string. At the end of the day, we have this graph

This was an example, when we plot various columns from the same file. We can also use the iteration to plot different files. When doing so, there are two options available. One is that we simply specify the file names in a string, as below

reset
filenames = "first second third fourth fifth"
plot for [file in filenames] file using 1:2 with lines
which will plot files 'first', 'second', 'third', 'fourth', and 'fifth'. At this point, note that 'file' is a string, i.e., we can manipulate it as a string. E.g., if we wanted to, instead of 'first', 'second', etc., plot 'first.dat', 'second.dat', and so on, we would do this as
reset
filenames = "first second third fourth fifth"
plot for [file in filenames] file."dat" using 1:2 with lines

The second option is, if the data files are numbered, e.g., if we have 'file_1.dat', 'file_2.dat', and so on, we can use the iteration over integers as follows
reset
filename(n) = sprintf("file_%d", n)
plot for [i=1:10] filename(i) using 1:2 with lines
which will plot the second column versus the first column of 'file_1.dat' through 'file_10.dat'.