Introduction

The koiplot package produces scatterplots in which points are replaced by decorated rectangles called brushstrokes. Various parameters of the brushstrokes can be adjusted to plot extra variables in the data set, in a similar way to a balloon plot. Although the koiplot can be viewed as an enhanced scatterplot, its main use is not as a way of displaying data but as a way of producing artistic-looking graphics which could be used, for example, on the cover of a report.

The graphics in this document are just for illustrative purposes; the choice of variables plotted is not supposed to be especially meaningful.

First, here are Anderson’s iris data.

# koiplot
koiplot(iris)
## Default choices of columns to be plotted:
## x: Sepal.Length
## y: Sepal.Width
## angle:  Petal.Length 
## colour: Species

# scatterplot
plot(iris$Sepal.Length, iris$Sepal.Width, col=c("red","green","blue")[as.factor(iris$Species)],pch=19)

# balloon plot
plot(iris$Sepal.Length, iris$Sepal.Width, col=c(rgb(1,0,0,0.5),rgb(0,1,0,0.5),rgb(0,0,1,0.5))[as.factor(iris$Species)],pch=19, cex=iris$Petal.Length)

Brushstrokes

The individual brushstrokes in a koiplot are rectangles produced using the polygon function. The rectangle is centered at the coordinates (x, y). Here is a rectangle centered at (0,0) with a width of 2 and a height of 1.

plot(0, 0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=0, y=0, width=2, height=1, xnoise=0, ynoise=0, curvature=0)

The angle argument can be used to adjust the angle (in radians) the rectangle makes with the positive x-axis.

plot(0, 0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=-1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=0, angle=-10*pi/180)
brushstroke(x=1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=0, angle=10*pi/180)

The xnoise and ynoise arguments add noise in the x- and y-direction respectively.

plot(x=0, y=0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=-1, y=0, width=1, height=0.5, xnoise=0.1, ynoise=0, curvature=0)
brushstroke(x=1, y=0, width=1, height=0.5, xnoise=0, ynoise=0.05, curvature=0)

The curvature argument adds positive or negative curvature which helps the brushstroke look more natural. When adding large amounts of curvature, it is better to set xnoise to zero or else the brushstrokes will look ugly.

plot(x=0, y=0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=-1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=0.1)
brushstroke(x=1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=-0.1)

The xstrectch and ystretch arguments can be used to adjust the brushstrokes to avoid distortion caused by the aspect ratio of the plot not being 1:1. Compare the two plots below.

par(mfrow=c(1,2))

# distorted-looking brushstroke
plot(0, 0, xlim=c(-10, 10), ylim=c(-1, 1), type="n", xlab="x", ylab="y")
brushstroke(-5, 0, width=2, height=0.2, angle=0, xnoise=0, ynoise=0, curvature=0)
brushstroke(5, 0, width=2, height=0.2, angle=45*pi/180, xnoise=0, ynoise=0, curvature=0)

# find the aspect ratio of the plot which has just been made
usr <- par("usr")
xstretch <- usr[2] - usr[1]
ystretch <- usr[4] - usr[3]

# distortion corrected using xstretch and ystretch
plot(0, 0, xlim=c(-10, 10), ylim=c(-1, 1), type="n", xlab="x", ylab="y")
brushstroke(-5, 0, width=2, height=0.2, angle=0, xnoise=0, ynoise=0, curvature=0)
brushstroke(5, 0, width=2, height=0.2, angle=45*pi/180, xnoise=0, ynoise=0, xstretch=xstretch, ystretch=ystretch, curvature=0)

The koiplot function

The main function of the package is koiplot. This function is a wrapper for the lower-level function koiplotxy, which does the actual plotting.

Plotting a data frame

If a data frame is passed to koiplot with no other arguments, the function will select the columns to be plotted and print its choices to the console, as in koiplot(iris). The x variable will be the first numeric column encountered from the left and the y variable will be the second numeric column. The angle will be the third numeric column if there is one. Angles are rescaled into the range -45 degrees to 45 degrees. Colours are selected by finding a factor or character column (the one with the fewest distinct values is used). The transparency value for the colours is 0.1 by default.

In the case of a data frame with over 10000 rows, 10000 rows are selected at random for plotting (otherwise the plotting process may be very slow.)

This function attempts to make a very quick and basic plot. Obviously it doesn’t always work well. For example, if the data frame has a numeric index in its first column, then this will be selected as the first variable, as in the following example using the crabs data of Campbell and Mahon from the MASS package.

library(MASS)
koiplot(crabs)
## Default choices of columns to be plotted:
## x: index
## y: FL
## angle:  RW 
## colour: sp

Extra arguments can be passed to the koiplot function. For example, to omit the axes, border and add a title.

koiplot(crabs, xaxt="n", yaxt="n", xlab="", ylab="", bty="n", main="Crab Data")
## Default choices of columns to be plotted:
## x: index
## y: FL
## angle:  RW 
## colour: sp

Plotting a pair of vectors

Instead of a data frame, vectors x and y can be passed to the koiplot function, and this is usually preferable. For example, in the crab data, suppose that we want to plot carapace length and rear width. The variable cols can be used to specify the colour column.

Caution! Do not confuse cols with col. The latter is also a valid argument to koiplot but it will be passed to the plot function and will do nothing!

koiplot(crabs$CL, crabs$RW, cols=crabs$sp)

This plot can be enhanced by adding an angle variable and forming the colours from the (sex, sp) pairs.

koiplot(crabs$CL, crabs$RW, angle=crabs$FL, cols=rainbow(4)[as.factor(paste0(crabs$sex, crabs$sp))], curvature=0.5, xlab="CL", ylab="RW")

The following plot shows the result of performing PCA and including other optional arguments to koiplot including arguments to plot.

out <- prcomp(crabs[,-c(1:3)])
koiplot(out$x[,2], out$x[,3], angle=out$x[,1], cols=rainbow(4)[as.factor(paste0(crabs$sex, crabs$sp))], curvature=0.25, xaxt="n", yaxt="n", xlab="PC2", ylab="PC3", main="Crab data after projection", bty="n")

Transparency

By default, the alpha value for the colours in koiplot is 0.1. The variable alpha can be specified as a vector containing the alpha values for the individual brushstrokes, but usually it will just be given as a single number.

A value of alpha less than 1 is generally advisable, as this can help the viewer to see which regions of the plot are relatively dense.

x <- data.frame(x=c(rnorm(1000), rnorm(1000, 0, 0.1)), y=c(rnorm(1000), rnorm(1000, 0, 0.3)))

par(mfrow=c(2,2))
xlim <- ylim <- c(-3,3)

plot(x, xlim=xlim, ylim=ylim)
smoothScatter(x, xlim=xlim, ylim=ylim)
koiplot(x, xlim=xlim, ylim=ylim)
koiplot(x, alpha=0.01, xlim=xlim, ylim=ylim)

Colours

It is possible to specify a numeric vector as the cols argument. This will be converted to a vector of rainbow colours in such a way that points with nearby values of the cols argument will have similar colours.

x <- rnorm(1000)
y <- rnorm(1000)
koiplot(x, y, cols=x+y, angle=runif(1000)*2*pi)

Dealing with angles

When an angle variable is specified, the angles are rescaled into the range -45 degrees to 45 degrees by default. This can be changed using min_angle and max_angle. For example, compare the ouput of koiplot(iris) to the following.

koiplot(iris, min_angle=45, max_angle=90)
## Default choices of columns to be plotted:
## x: Sepal.Length
## y: Sepal.Width
## angle:  Petal.Length 
## colour: Species

When a variable takes values in the circle, such as day of the week, it may not be desirable to rescale angles like this. In this case, koiplot can be called with rescale_angles=F and the angles can be specified by the user.

For example, in the bike sharing dataset from the UCI Machine Learning Respository, the variable hr is the hour of the day (in the range 0-23.) Loading the data frame hours.csv into R as an object bike, we can define the angle for the brushstrokes to be horizontal when hr = 11.5 and vertical when hr = 0.

angle <- ((bike$hr-11.5)/11.5)*pi/2

Here, we use the month variable to define the colour of the brushstrokes.

month <- substr(bike$dteday, 6, 7) 
month <- as.numeric(month)

koiplot(bike$temp, bike$cnt, angle=angle, cols=-sin(pi*month/12), curvature=10,
rescale_angles=F, las=1, xlab="temp", ylab="cnt")
## Choosing 10000 random rows. Change nmax to plot more rows

The angles are easier to see if the dimensions (length and width) of the brushstrokes are changed from their default values. In the following plot, the axes are omitted and the y-limits are increased so that all the brushstrokes appear.

koiplot(bike$temp, bike$cnt, angle=angle, cols=-sin(pi*month/12), curvature=10, width=1, height=10, rescale_angles=F, las=1, xlab="temp", ylab="cnt", xaxt="n", yaxt="n", bty="n", ylim=c(-300, max(bike$cnt)+10))
## Choosing 10000 random rows. Change nmax to plot more rows

Randomness of the output

Because the brushstrokes are plotted in a random order and xnoise and ynoise are also random, the plots produced by koiplot are subject to some random variation. It is therefore a good idea to set the seed before running koiplot in order to make sure the plots are reproducible.

Installing the package

According to the guide here you should be able to install the package from RStudio using the following commands.

library(devtools)
install_github("rtrvale/koiplot")

A windows binary can also be downloaded from here.