The koiplot package produces scatterplots in which points are replaced by decorated rectangles called brushstrokes. Various parameters of the brushstrokes can be adjusted to plot extra variables in the data set, in a similar way to a balloon plot. Although the koiplot can be viewed as an enhanced scatterplot, its main use is not as a way of displaying data but as a way of producing artistic-looking graphics which could be used, for example, on the cover of a report.
The graphics in this document are just for illustrative purposes; the choice of variables plotted is not supposed to be especially meaningful.
First, here are Anderson’s iris data.
# koiplot
koiplot(iris)
## Default choices of columns to be plotted:
## x: Sepal.Length
## y: Sepal.Width
## angle: Petal.Length
## colour: Species
# scatterplot
plot(iris$Sepal.Length, iris$Sepal.Width, col=c("red","green","blue")[as.factor(iris$Species)],pch=19)
# balloon plot
plot(iris$Sepal.Length, iris$Sepal.Width, col=c(rgb(1,0,0,0.5),rgb(0,1,0,0.5),rgb(0,0,1,0.5))[as.factor(iris$Species)],pch=19, cex=iris$Petal.Length)
The individual brushstrokes in a koiplot are rectangles produced using the polygon
function. The rectangle is centered at the coordinates (x, y)
. Here is a rectangle centered at (0,0)
with a width of 2
and a height of 1
.
plot(0, 0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=0, y=0, width=2, height=1, xnoise=0, ynoise=0, curvature=0)
The angle
argument can be used to adjust the angle (in radians) the rectangle makes with the positive x-axis.
plot(0, 0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=-1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=0, angle=-10*pi/180)
brushstroke(x=1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=0, angle=10*pi/180)
The xnoise
and ynoise
arguments add noise in the x- and y-direction respectively.
plot(x=0, y=0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=-1, y=0, width=1, height=0.5, xnoise=0.1, ynoise=0, curvature=0)
brushstroke(x=1, y=0, width=1, height=0.5, xnoise=0, ynoise=0.05, curvature=0)
The curvature
argument adds positive or negative curvature which helps the brushstroke look more natural. When adding large amounts of curvature, it is better to set xnoise
to zero or else the brushstrokes will look ugly.
plot(x=0, y=0, type="n", xlim=c(-2,2), ylim=c(-1,1), xlab="x", ylab="y")
brushstroke(x=-1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=0.1)
brushstroke(x=1, y=0, width=1, height=0.5, xnoise=0, ynoise=0, curvature=-0.1)
The xstrectch
and ystretch
arguments can be used to adjust the brushstrokes to avoid distortion caused by the aspect ratio of the plot not being 1:1. Compare the two plots below.
par(mfrow=c(1,2))
# distorted-looking brushstroke
plot(0, 0, xlim=c(-10, 10), ylim=c(-1, 1), type="n", xlab="x", ylab="y")
brushstroke(-5, 0, width=2, height=0.2, angle=0, xnoise=0, ynoise=0, curvature=0)
brushstroke(5, 0, width=2, height=0.2, angle=45*pi/180, xnoise=0, ynoise=0, curvature=0)
# find the aspect ratio of the plot which has just been made
usr <- par("usr")
xstretch <- usr[2] - usr[1]
ystretch <- usr[4] - usr[3]
# distortion corrected using xstretch and ystretch
plot(0, 0, xlim=c(-10, 10), ylim=c(-1, 1), type="n", xlab="x", ylab="y")
brushstroke(-5, 0, width=2, height=0.2, angle=0, xnoise=0, ynoise=0, curvature=0)
brushstroke(5, 0, width=2, height=0.2, angle=45*pi/180, xnoise=0, ynoise=0, xstretch=xstretch, ystretch=ystretch, curvature=0)
The main function of the package is koiplot
. This function is a wrapper for the lower-level function koiplotxy
, which does the actual plotting.
If a data frame is passed to koiplot
with no other arguments, the function will select the columns to be plotted and print its choices to the console, as in koiplot(iris)
. The x
variable will be the first numeric column encountered from the left and the y
variable will be the second numeric column. The angle will be the third numeric column if there is one. Angles are rescaled into the range -45 degrees to 45 degrees. Colours are selected by finding a factor or character column (the one with the fewest distinct values is used). The transparency value for the colours is 0.1
by default.
In the case of a data frame with over 10000 rows, 10000 rows are selected at random for plotting (otherwise the plotting process may be very slow.)
This function attempts to make a very quick and basic plot. Obviously it doesn’t always work well. For example, if the data frame has a numeric index in its first column, then this will be selected as the first variable, as in the following example using the crabs data of Campbell and Mahon from the MASS
package.
library(MASS)
koiplot(crabs)
## Default choices of columns to be plotted:
## x: index
## y: FL
## angle: RW
## colour: sp
Extra arguments can be passed to the koiplot
function. For example, to omit the axes, border and add a title.
koiplot(crabs, xaxt="n", yaxt="n", xlab="", ylab="", bty="n", main="Crab Data")
## Default choices of columns to be plotted:
## x: index
## y: FL
## angle: RW
## colour: sp
Instead of a data frame, vectors x
and y
can be passed to the koiplot
function, and this is usually preferable. For example, in the crab data, suppose that we want to plot carapace length and rear width. The variable cols
can be used to specify the colour column.
Caution! Do not confuse cols
with col
. The latter is also a valid argument to koiplot
but it will be passed to the plot
function and will do nothing!
koiplot(crabs$CL, crabs$RW, cols=crabs$sp)
This plot can be enhanced by adding an angle variable and forming the colours from the (sex, sp)
pairs.
koiplot(crabs$CL, crabs$RW, angle=crabs$FL, cols=rainbow(4)[as.factor(paste0(crabs$sex, crabs$sp))], curvature=0.5, xlab="CL", ylab="RW")
The following plot shows the result of performing PCA and including other optional arguments to koiplot
including arguments to plot
.
out <- prcomp(crabs[,-c(1:3)])
koiplot(out$x[,2], out$x[,3], angle=out$x[,1], cols=rainbow(4)[as.factor(paste0(crabs$sex, crabs$sp))], curvature=0.25, xaxt="n", yaxt="n", xlab="PC2", ylab="PC3", main="Crab data after projection", bty="n")
By default, the alpha value for the colours in koiplot
is 0.1. The variable alpha
can be specified as a vector containing the alpha values for the individual brushstrokes, but usually it will just be given as a single number.
A value of alpha
less than 1 is generally advisable, as this can help the viewer to see which regions of the plot are relatively dense.
x <- data.frame(x=c(rnorm(1000), rnorm(1000, 0, 0.1)), y=c(rnorm(1000), rnorm(1000, 0, 0.3)))
par(mfrow=c(2,2))
xlim <- ylim <- c(-3,3)
plot(x, xlim=xlim, ylim=ylim)
smoothScatter(x, xlim=xlim, ylim=ylim)
koiplot(x, xlim=xlim, ylim=ylim)
koiplot(x, alpha=0.01, xlim=xlim, ylim=ylim)
It is possible to specify a numeric vector as the cols
argument. This will be converted to a vector of rainbow colours in such a way that points with nearby values of the cols argument will have similar colours.
x <- rnorm(1000)
y <- rnorm(1000)
koiplot(x, y, cols=x+y, angle=runif(1000)*2*pi)
When an angle variable is specified, the angles are rescaled into the range -45 degrees to 45 degrees by default. This can be changed using min_angle
and max_angle
. For example, compare the ouput of koiplot(iris)
to the following.
koiplot(iris, min_angle=45, max_angle=90)
## Default choices of columns to be plotted:
## x: Sepal.Length
## y: Sepal.Width
## angle: Petal.Length
## colour: Species
When a variable takes values in the circle, such as day of the week, it may not be desirable to rescale angles like this. In this case, koiplot
can be called with rescale_angles=F
and the angles can be specified by the user.
For example, in the bike sharing dataset from the UCI Machine Learning Respository, the variable hr
is the hour of the day (in the range 0-23.) Loading the data frame hours.csv
into R as an object bike
, we can define the angle for the brushstrokes to be horizontal when hr = 11.5
and vertical when hr = 0
.
angle <- ((bike$hr-11.5)/11.5)*pi/2
Here, we use the month
variable to define the colour of the brushstrokes.
month <- substr(bike$dteday, 6, 7)
month <- as.numeric(month)
koiplot(bike$temp, bike$cnt, angle=angle, cols=-sin(pi*month/12), curvature=10,
rescale_angles=F, las=1, xlab="temp", ylab="cnt")
## Choosing 10000 random rows. Change nmax to plot more rows
The angles are easier to see if the dimensions (length
and width
) of the brushstrokes are changed from their default values. In the following plot, the axes are omitted and the y-limits are increased so that all the brushstrokes appear.
koiplot(bike$temp, bike$cnt, angle=angle, cols=-sin(pi*month/12), curvature=10, width=1, height=10, rescale_angles=F, las=1, xlab="temp", ylab="cnt", xaxt="n", yaxt="n", bty="n", ylim=c(-300, max(bike$cnt)+10))
## Choosing 10000 random rows. Change nmax to plot more rows
Because the brushstrokes are plotted in a random order and xnoise
and ynoise
are also random, the plots produced by koiplot
are subject to some random variation. It is therefore a good idea to set the seed before running koiplot
in order to make sure the plots are reproducible.