I academically grew up among graph drawers, that is, computer scientists and mathematicians interested in deriving two-dimensional depictions of graphs. One may despicably call it pixel science, yet a lot of hard theoretical work is put into producing pretty graph layouts. Although I am not at all an expert in this field, I have learned a thing or two about that subject. As such, I have always been surprised why one of the (potentially) best algorithms is not implemented in R. This post is about my humble try to change this.

If you read this and say: Hey! there is already a package for that! please do let me know.

#used libraries
library(tidyverse) # for data wrangling
library(igraph)    # for network data structures and tools
library(ggraph)    # for prettier network visualizations
library(igraphdata)    # some network data 
library(patchwork) # combine ggplot objects 

Graph layouts in igraph

The R package igraph comes with a lot of inbuilt layout algorithms. Just type layout_ in Rstudio and you will get overwhelmed by the possibilities. As a minor side note: If you ever struggle with anything in igraph, consult the excellent tutorial from Katherine Ognyanova.

I usually have mixed feelings about using R to draw my networks and mostly resort to dedicated software such as visone. Mostly, because I feel that the algorithms in igraph tend to not be nice, even with the layout_nicely() function.

Consider a typical benchmark graph for graph drawing, which can be downloaded here.

el <- read_delim("power-1138-bus.mtx",delim=" ",col_names = F)
g <- graph_from_data_frame(el,directed=F)
g <- igraph::simplify(g)

Let’s see what igraph thinks a nice layout looks like.

par(mar=c(0,0,0,0))
plot(g,layout=layout_nicely,vertex.size=0.5,vertex.label=NA)

I know, “beauty lies in the eyes of the beholder”, but I personally do not think that this is particularly nice. Below, you see a collection of layouts, produced by different algorithms.

par(mfrow=c(2,2),mar=c(0,0,0,0))

plot(g,layout=layout_with_drl,vertex.size=0.5,vertex.label=NA)
plot(g,layout=layout_with_lgl,vertex.size=0.5,vertex.label=NA)
plot(g,layout=layout_with_fr,vertex.size=0.5,vertex.label=NA)
plot(g,layout=layout_with_mds,vertex.size=0.5,vertex.label=NA)

Notice the big differences. Personally, I would prefer the layout_with_lgl (top right). Below is a bigger version drawn with ggraph.

ggraph(g,layout="lgl")+
  geom_edge_link(width=0.2,colour="grey")+
  geom_node_point(col="black",size=0.3)+
  theme_graph()

You will notice that this layout looks different than above. This is due to the fact, that the algorithm underlying layout_with_lgl is non-deterministic, meaning that it produces different pictures in consecutive runs. In fact, most of the other layout algorithm have this (annoying?) feature. More than once I have found myself layouting the network over and over again until I was satisfied.

Stress majorization

The first thing I learned from my graph drawing peers was to minimize stress. Not necessarily in the sense of work (which doesn’t work anyway while being a PhD student), but for graph layouting. Stress majorization is actually an optimization strategy used in multidimensional scaling where the goal is to minimize the so-called stress function defined as \[ \sigma(X)=\sum_{i<j} w_{ij}(\delta_{ij}-d_{ij})^2, \] where \(w_{ij} \geq 0\) is a weight between a pair of points \((i,j)\) , \(d_{ij}\) is the geodesic distance between \(i\) and \(j\) and \(\delta _{ij}\) is the euclidean distance of coordinates \(X_i\) and \(X_j\). By minimizing stress, we thus seek to find cartesian coordinates for each node so that the euclidean distance is as close as possible to the geodesic distance. If you are interested in more technical details, please see the original contribution by Gansner et al..

Implementation with Rcpp and the smglr package

I implemented stress majorization with Rcpp. While the code is not that involved, it still is a bit lengthy. I created a very rudimentary R package containing the stress majorization graph layout algorithm, which is available via github.

# devtools::install_github("schochastics/smglr")
library(smglr)

So what does our benchmark network look like using stress majorization?

l <- stress_majorization(g)

ggraph(g,layout="manual",node.positions=data.frame(x=l[,1],y=l[,2]))+
  geom_edge_link(width=0.2,colour="grey")+
  geom_node_point(col="black",size=0.3)+
  theme_graph()

In my opinion, this looks definitely better than any of the layouts before.

More examples

Here are two more examples to convince you of stress based layouts (always the right one).

# preferential attachment
pa <- sample_pa(1000,1,1,directed = F)

ggraph(pa)+
  geom_edge_link(width=0.2,colour="grey")+
  geom_node_point(col="black",size=0.3)+
  theme_graph() -> p1


l <- stress_majorization(pa)
ggraph(pa,layout="manual",node.positions=data.frame(x=l[,1],y=l[,2]))+
  geom_edge_link(width=0.2,colour="grey")+
  geom_node_point(col="black",size=0.3)+
  theme_graph()-> p2

p1+p2

# yeast protein interactions from igraphdata (only biggest component)
data(yeast)
comps <- components(yeast)
bcomp <- which.max(comps$csize)
yeast <- induced_subgraph(yeast,comps$membership==bcomp)

ggraph(yeast)+
  geom_edge_link(width=0.2,colour="grey")+
  geom_node_point(col="black",size=0.3)+
  theme_graph() -> p1


l <- stress_majorization(yeast)
ggraph(yeast,layout="manual",node.positions=data.frame(x=l[,1],y=l[,2]))+
  geom_edge_link(width=0.2,colour="grey")+
  geom_node_point(col="black",size=0.3)+
  theme_graph()-> p2

p1+p2

Caveats

Stress majorization produces nice layouts, is deterministic and easy to implement. The downside is, that it is rather slow for large networks (I also partially blame my implementation for that). But there is also a way out of that problem. Former colleagues of mine published a sparse stress model which allows stress based layouting for really large graphs. The java code can be found on github. Also, keep an eye out for an R package called visone3 which will, among other things, also allow for stress based layouts.