Creating data frames in Rust for R

R
Rust
Author

David Schoch

Published

January 30, 2025

The content of this post is actually what I wanted to do in my last. But I went down so many rabbit holes when researching the topic that I decided to split it into a more general exploration of working with non-standard types and a post that is specifically dedicated to data frames.

I have to add the same disclaimer to this post as to the last: I am still a beginner in Rust, trying to figure things out. My main goal for now is not to write perfect/best practice Rust code, but to be able to call Rust from R (and send things to R from Rust) that goes beyond simple vectors and does what I want it to do.

Existing frameworks

Data frames in R are essential because they provide a structured, table-like format for handling heterogeneous data, seamlessly integrating numeric, character, and factor variables within a single dataset. Their flexibility, built-in functions, and compatibility with R’s data manipulation packages (like the tidyverse) make them the go-to structure for efficient data analysis and visualization.

For general data science, there is the crate polars. Polars is extremely powerful and we can already benefit from it in R with r-polars. The syntax might be a bit strange for newcomers, but there is tidypolars which promisses to provide a polars backend for the tidyverse. What this essentially means is that you can keep your tidyverse code while using polars in the background. That is a pretty awesome prospect and we might see this far more use cases in the future.

But we are not as ambitious as creating a full fletched data science machinery in this post. All we want is to create a data frame in Rust and bring it to R.

Data frame support in extendr-api

In the last post, I tried to built something that allows to move non-standard (anything that is not vector) types from R to Rust and back. That involved stuff like traits, impl and TryFrom blocks. Luckily, this is not needed for data frames. At least not for how I want to use them here. extendr-api comes with a struct Dataframe<T> (a representation of a typed data.frame). A data.frame can be created from Rust with the trait IntoDataFrameRow. This trait can be derived for a single struct that represents a single row. The type of the row is then captured by the marker T. Now what does that mean?

In order to construct a data.frame row by row, we need to define a struct that tells Rust what a row of the final data.frame should look like.

code <- r"(
use extendr_api::prelude::*;

#[derive(Debug,IntoDataFrameRow)]
struct Person {
  first: String,
  last: String,
  age: i32
}

#[extendr]
fn person2df(firstR: String, lastR: String, ageR: i32) -> Dataframe<Person> {
  let v = vec![Person {first: firstR, last: lastR, age:ageR}];
  let df = v.into_dataframe();
  df.expect("Failed to return a data.frame")
}
)"

rextendr::rust_source(
  code = code
)

df <- person2df("Alice", "Smith", 31L)
class(df)
[1] "data.frame"
df
  first  last age
1 Alice Smith  31

Note that the type of the inport arguments is important! firstR and lastR need to be characters

person2df(1, 2, 3)
Error in person2df(1, 2, 3): Expected Strings got Doubles

Building a larger data frame

The function above is not very exciting. It can only return a one row data.frame and will never return more. What we need to do is to extend the function by allowing vector inputs.

code <- r"(
use extendr_api::prelude::*;

#[derive(Debug, IntoDataFrameRow)]
struct Person {
    first: String,
    last: String,
    age: i32,
}

#[extendr]
fn people2df(firstR: Vec<String>, lastR: Vec<String>, ageR: Vec<i32>) -> Dataframe<Person> {

    let people: Vec<Person> = firstR.into_iter()
        .zip(lastR.into_iter())
        .zip(ageR.into_iter())
        .map(|((first, last), age)| Person { first, last, age })
        .collect();

    people.into_dataframe().expect("Failed to return a data.frame")
}
)"

rextendr::rust_source(
  code = code
)

Maybe some clarifications are needed here. zip() is a method in Rust that combines two iterators into a single iterator of tuples. Each tuple contains elements from both iterators at the same index. In our case, it creates something like nested tuples. First it combines firstR and lastR and then adds ageR. The final tuples then look like this:

(("Alice", "Smith"), 25)
(("Bob", "Johnson"), 30)
(("Charlie", "Brown"), 22)

The map() method converts each typle into a object of type Person by converting ((first, last), age) into Person { first, last, age }. collect() then creates a Vec<Person> out of this. Now lets see if this works.

df <- people2df(
  firstR = c("Alice", "Bob", "Charlie"),
  lastR = c("Smith", "Johnson", "Brown"),
  ageR = c(31L, 12L, 22L)
)

df
    first    last age
1   Alice   Smith  31
2     Bob Johnson  12
3 Charlie   Brown  22

Perfect! Now we have a simple example of how to turn given R input into a data frame in Rust. This approach is just a little limitting, because the our Rust data frame has a fixed setup: Three columns, two must be characters and one numeric. This is good if we want to create a standard output. I used this approach in a small R package called urlparser (I will write a separate post on it.). The input is a vector of urls and the output a fixed data frame with nine columns, each containing a part of the parsed url. For other tasks we need some more flexibility.

Oh god there is data_frame!

Ok, so I was ready to move on to call it a day with this post when I learned about data_frame!. It is almost ridiculous how easy it is to create data frame from vectors with this.

code <- r"(
    use extendr_api::prelude::*;
    #[extendr]
    fn people2df_short(firstR: Vec<String>, lastR: Vec<String>, ageR: Vec<i32>) -> List {
      let res = data_frame!(
        first = firstR,
        last = lastR,
        age = ageR
      );
      res.try_into().unwrap()
    }
)"

rextendr::rust_source(
  code = code
)

This is so much easier and clearer comming from the perspective of an R programmer!

Here is the function in action.

df <- people2df_short(
  firstR = c("Alice", "Bob", "Charlie"),
  lastR = c("Smith", "Johnson", "Brown"),
  ageR = c(31L, 12L, 22L)
)

class(df)
[1] "data.frame"
df
    first    last age
1   Alice   Smith  31
2     Bob Johnson  12
3 Charlie   Brown  22

How does this work? What we are using here is a “Macro”. Macros are a powerful metaprogramming feature that allows code to be generated at compile time. While functions operate on values, macros operate on the syntax of the code itself. This enables more flexible and reusable patterns. If you want to learn more, check out the book.

Let us build a small illustrative example.

code <- r"(
    use extendr_api::prelude::*;

    macro_rules! sum_and_cheer {
        ($a:expr, $b:expr) => {{
            let sum = $a + $b;
            rprint!("Rust rules!");
            sum
        }};
    }

    #[extendr]
    fn cheer_sum(a: f64, b: f64) -> f64 {
        sum_and_cheer!(a, b)
    }
)"

rextendr::rust_source(
  code = code
)

cheer_sum(1.1, 2.2)
Rust rules!
[1] 3.3

The cool thing about Macros is that you can essentially hide away complex code and produce an easier to use API. That is pretty nice and does make working with Rust from R a lot easier!

Conclusion

In this post I did a little exploration on how to create data frames in Rust. I am super happy to have found the macro data_frame! which also helped me to understand what macros actually are. I am pretty sure I am still missing parts that would make working with data frames even simpler, but my Rust journey has not ended yet.

Reuse

Citation

BibTeX citation:
@online{schoch2025,
  author = {Schoch, David},
  title = {Creating Data Frames in {Rust} for {R}},
  date = {2025-01-30},
  url = {http://blog.schochastics.net/posts/2025-01-30_data-frames-in-rust/},
  langid = {en}
}
For attribution, please cite this work as:
Schoch, David. 2025. “Creating Data Frames in Rust for R.” January 30, 2025. http://blog.schochastics.net/posts/2025-01-30_data-frames-in-rust/.