This post is the conclusion (for now) of my second miniseries on Rust in R. My first series was about building two R packages without much context (part1, part2) and the second about how to comunicate between Rust and R (part1, part2). In this post, I will introduce a small package, urlparser which wraps the url crate to parse urls.
Setting up the package
Thanks to usethis and rextendr, setting up a new package that uses Rust code is incredibly simple. You can get a large chunk of the work done in three lines of R code.
usethis::create_package("urlparser")rextendr::use_extendr()rextendr::use_crate("url", version ="2.5.4")
This sets up everything you need and all that is left to do is write the Rust code in /src/rust/src/lib.rs. In our case, we just need to wrap one single function parse. The function takes a url as a string and extracts the different parts of the url. These parts, we want to store in a data frame.
If you have followed the last two posts, you should understand what is going on here, although it is a bit more complex. struct ParsedUrl defines what a row in our final data frame should look like and fn url_parse() uses Url::parse to extract the different parts of the url. On the R side of the package I just added a small wrapper around it
rs_url_parse <-function(url){url_parse(url)}
And just like that, we are done. What remains is to check how the package compares to existing solutions.
Benchmark
We compare the performance of the package with adaR, an R package to parse URLs that wraps the C++ library ada-url. I have also blogged about the creation of that package (link).
Let us look at an example what both packages return.
The naming scheme is a bit different, but they essentially return the exact same result.
I will skip correctness benchmarks here and skip right to the runtime, because that is what interested me the most. We take a list of diverse urls provided from ada-url for this purpose
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 adaR 317ms 324ms 3.09 47.4MB 3.09
2 rust 126ms 128ms 7.78 8.28MB 0
Our simple package outperforms adaR with a factor of 2. That is wild to me, given how much time we spent on optimizing the interface between R and C++ to create as little overhead as possible. Here, we did not do any real optimization efforts, so the performance boost can probably be attributed to Rust alone.
It might seem flashy but you should still take these results with a grain of salt. While adaR is also relatively new, I’d still say that it is far more robust than the package we built here. No testing beyond eyeballing has been done so far. Maybe jsut take the result as a proof of concept on how quickly one can spin up a solution in Rust that could speed up your own workflows, without spending too much time in optimizing the code/interface.