```
<- function(n) {
fmat <- matrix(NA, n, n)
res for (i in 1:n) {
<- runif(n)
res[i, ]
}
res }
```

The other day I was helping to refactor an R package and came across one of the biggest performance blockers there is: dynamically growing matrices. Of course I repeated the mantra “Always preallocate your variables” but in this case, it is not clear how big (in terms of rows) the final matrix will be. So there is no way around growing the matrix dynamically.

The chosen approach in the package was to use `rbind`

similar to this

```
<- vector()
mat while(<condition>) {
if(<condition>) {
<- <do calculation>
tmp <- rbind(mat, tmp)
mat
} }
```

Disregarding performance, this seems like a sensible approach. Just add new rows to the end of the matrix. But a little bit of profiling showed that this was an extreme bottleneck in the function it appears. So what are viable alternatives? let’s benchmark some solutions. Note that we assume to know `n`

here but in reality, we do not know how big it will be at the end.

As a baseline we implement a function that preallocates memory.

The first contender is the `rbind`

approach.

```
<- function(n) {
frbind <- vector()
res for (i in 1:n) {
<- rbind(res, runif(n))
res
}
res }
```

For the second approach, we try to reduce the number of rbinds by growing the final matrix in chunks. For `csize = 1`

we obtain `frbind()`

and `csize = n`

we have `fmat()`

.

```
<- function(n, csize = 10) {
fchunks <- matrix(NA, csize, n)
chunk <- vector()
res for (i in 1:n) {
if (i %% csize == 0) {
<- runif(n)
chunk[csize, ] <- rbind(res, chunk)
res <- matrix(NA, csize, n)
chunk else {
} %% csize, ] <- runif(n)
chunk[i
}
}!is.na(res[, 1]), ]
res[ }
```

The last approach is to grow list which is converted to a matrix at the end.

```
<- function(n) {
flist <- list()
res for (i in 1:n) {
length(res) + 1]] <- runif(n)
res[[
}do.call(rbind, res)
}
```

```
<- 1000
n <- microbenchmark::microbenchmark(
bench fmat(n),
frbind(n),
fchunks(n, csize = 10),
flist(n),
times = 1, unit = "ms"
)
```

expr | min | lq | mean | median | uq | max | neval |
---|---|---|---|---|---|---|---|

flist(n) | 41.5171 | 41.5171 | 41.5171 | 41.5171 | 41.5171 | 41.5171 | 1 |

fmat(n) | 66.3531 | 66.3531 | 66.3531 | 66.3531 | 66.3531 | 66.3531 | 1 |

fchunks(n, csize = 10) | 692.2926 | 692.2926 | 692.2926 | 692.2926 | 692.2926 | 692.2926 | 1 |

frbind(n) | 6115.4161 | 6115.4161 | 6115.4161 | 6115.4161 | 6115.4161 | 6115.4161 | 1 |

The performance of the `rbind`

approach is really terrifyingly bad. The list approach on the other hand is extremely efficient. It performs equally well as the preallocated matrix approach. I unfortunately lack the understanding of the R internals here, but it seems as if dynamically growing a list does not have any (or at least not much) overhead.

**Update**

Lluís Revilla suggested that `cbind`

might be more efficient than `rbind`

, given that R mostly deals in columns.

```
<- function(n) {
fcbind <- vector()
res for (i in 1:n) {
<- cbind(res, runif(n))
res
}t(res)
}
```

```
<- 1000
n <- microbenchmark::microbenchmark(
bench fmat(n),
frbind(n),
fcbind(n),
fchunks(n, csize = 10),
flist(n),
times = 1, unit = "ms"
)
```

expr | min | lq | mean | median | uq | max | neval |
---|---|---|---|---|---|---|---|

flist(n) | 37.3641 | 37.3641 | 37.3641 | 37.3641 | 37.3641 | 37.3641 | 1 |

fmat(n) | 46.4889 | 46.4889 | 46.4889 | 46.4889 | 46.4889 | 46.4889 | 1 |

fchunks(n, csize = 10) | 565.9169 | 565.9169 | 565.9169 | 565.9169 | 565.9169 | 565.9169 | 1 |

fcbind(n) | 1330.2777 | 1330.2777 | 1330.2777 | 1330.2777 | 1330.2777 | 1330.2777 | 1 |

frbind(n) | 6103.8364 | 6103.8364 | 6103.8364 | 6103.8364 | 6103.8364 | 6103.8364 | 1 |

So `cbind`

is indeed a lot faster than `rbind`

, but still much worse than the `list`

approach.

## Reuse

## Citation

```
@online{schoch2024,
author = {Schoch, David},
title = {Be Kind Don’t Rbind},
date = {2024-02-13},
url = {http://blog.schochastics.net/posts/2024-02-13_be-kind-dont-rbind},
langid = {en}
}
```