Tuesday 16 March 2010

Bubble plots

Yesterday, I discussed a method for adding an edge to an arbitrary symbol. If you recall (or roll down on this page), the idea was to trick gnuplot into plotting our data file twice, but in a way that each point was plotted twice in succession. Now, what if we plotted more times? There was really nothing special about the number 2, so there is no reason why we could not do this. But if we can, then we should, and see what comes out of it. With very small modifications, our script from yesterday can be turned into a bubble graph, like this


So, let us see how the machinery works!

reset
plot 'new_bubble1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_bubble2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_bubble3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

rem(x,n) = x - n*(x/n)
size(x,n) = 3*(1-0.8*rem(x,n)/n)
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n))
blue(x,n) = sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255)
green(x,n) = sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n))

posx(X,x,n) = X + 0.03*rem(x,n)/n
posy(Y,x,n) = Y + 0.03*rem(x,n)/n

unset key
set border back
level = 40
plot for [n=0:level*(red_n+1)-1] 'new_bubble1.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level*(blue_n+1)-1] 'new_bubble2.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level*(green_n+1)-1] 'new_bubble3.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
Again, the first three plots are there for determining the sample size, and nothing more. We, thus, start out with a number of function definitions. The first one is a remainder function, the second one uses the remainder to return the size of the bubble, the third one is a simple helper function, returning values between 0 and 240, and red, blue, and green determine the colour of our bubbles. If you look carefully, you will notice that these colours are successively whiter as the remainder increases. Finally, again by making use of our remainder function, we define two position shifts: in order to give the impression that the bubbles are lit from the top right corner, we have to shift successive circles in that direction. The value of this shift is important in the sense that, if chosen too high, the circles belonging to the same data point will no longer cover each other. (This is not necessary a tragedy, see below.)

Then we decide to have 40 colour levels (we could have anything up to 255, although it might be a bit time consuming and unnecessary), and call our plots. The structure is the same as it was yesterday: we use a for loop for each data set, move the circles a bit, and set the colours to whiter shades. That is all.

Now, what happens, if we take too big a value for the shift? This, actually, might lead to interesting effects, as shown in this graph, where droplets represent the data points.




After having seen the simplest implementation, we should ask whether it is possible to add some decorations. E.g., whether it is possible to add a thin black edge to the symbols. It is relatively simple, as the following script shows. We only have to re-define some of our functions as follows
size(x,n) = (rem(x,n) == 0 ? 3.3 : 3*(1-0.8*rem(x,n)/n))
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n)))
blue(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255))
green(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n)))

posx(X,x,n) = (rem(x,n) < 2 ? X : X + 0.03*rem(x,n)/n)
posy(Y,x,n) = (rem(x,n) < 2 ? Y : Y + 0.03*rem(x,n)/n)
All these functions do is to check whether we are plotting the first round, and if so, set the colour to black. There is a small difference in the shifts, for we do not move the circles, if they are in the first or the second round. The reason is obvious, as is the result

OK, so we can plot bubbles, with or without black circumference, but we would also like to add a legend. Well, that is simple, in fact, nothing could be simpler. Just add the following the following three lines to our code

set label 1 'Red bubbles' at 9,6 left
set label 2 'Blue bubbles' at 9,5 left
set label 3 'Green bubbles' at 9,4 left
and the following six
for [n=0:level-1] 'new_bubble1.dat' using (posx(8.5,n,level)):(posy(6,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level-1] 'new_bubble2.dat' using (posx(8.5,n,level)):(posy(5,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level-1] 'new_bubble3.dat' using (posx(8.5,n,level)):(posy(4,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
and we are done! All we do here is to plot our data files in a silly way: we plot a single point at (8.5,6), (8.5,5), and (8.5,4). The plotting of the data file does not happen in this sense, we use it for convenience's sake only. (This trick can also be used for the post from yesterday.) There, you have it!

Defining new symbols

Some time ago, I showed a method with which we could add a "frame" to a symbol. If you recall, what we did was to plot everything twice, and in order to duplicate our data set, we used a simple gawk script. Now, there is another way of doing this, one which does not rely on the gawk script, in fact, on any external script. I will discuss this method today. The gist of the trick is discussed in the old post, therefore, you are encouraged to cast, at least, a cursory glance at that, if you haven't yet done it.

As I have already pointed out, we had to duplicate our data set. To be more accurate, we haven't got to duplicate anything, we have simply got to plot the data twice. Now, the difficulty is that is we do this in a primitive way, issuing the plot command twice, and taking the same data set, the points might overlap, and leads to some undesired results. So, the task is to plot the data set twice, but to plot each plot twice, and not the data set as a whole. For this, we will use the for loop introduced in gnuplot 4.4, and the 'every' keyword. To cut a long story short, I give my script here, and discuss it afterwards.
reset 
plot 'new_symbol1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_symbol2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_symbol3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

parity(n) = (n/2.0 == int(n/2.0) ? 0 : 1)
size(n) = 2 - parity(n)*0.4
colour(n,r,g,b) = sprintf("#%02X%02X%02X", parity(n)*r, parity(n)*g, parity(n)*b)

unset key
set border back
plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0) ,\
for [n=0:2*blue_n+1] 'new_symbol2.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 9 ps size(n) lc rgb colour(n,100,100,255) ,\
for [n=0:2*green_n+1] 'new_symbol3.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 5 ps size(n) lc rgb colour(n,0,150,0)
Then, let us see what we have here! The first 6 lines are only to retrieve the number of data points in our data sets. If you know this from somewhere else, you can skip these, with the caveat that 'red_n', 'blue_n', and 'green_n' should still be defined somewhere.

Next we define three functions, the first of which determines the parity of an integer, returning 1, if the number is odd, and 0, if it is even. The second function returns a number, depending on the parity of its argument. Surprising as it is, this function will determine the size if the symbol, when we plot. Finally, the third function returns a string, which is equal to the colour given by the triplet (r,g,b), if the first argument, 'n', is odd, and black, if the first argument is even. At this point, it should be clear that we could have defined a function that returns a different colour for even numbers.

We are done with everything, but the plotting, so let us do that! As you see, for each data set, we step through the numbers, but not once, but twice: first plotting in black, and second, plotting with some decent colour. At the same time, we change the symbol size, so that the black symbols are always a bit bigger, than the red, blue, or green. Once all three plots have been called, the following graph will appear:

We can see that the symbols overlap each others, as they should. Now, what about the keys, should we need them? Well, that requires some handwork, but it is not hard, actually. The following self-explanatory script should do
set label 1 'Red symbols' at 1.3, 8 left
plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
n=0, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
n=1, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0)
1 8
e
1 8
e

and this produces the following figure