Best Practices for R Continuous Integration

Best Practices for R Continuous Integration

Categories:

  • Author Avatar
    Written by:

    Nathan Rosidi

Implementing Continuous Integration in R: Automate Tests, Detect Errors Early, and Keep Your Analysis Consistent


You wouldn't ship an app without testing. So why skip it for your R scripts?

Continuous integration runs checks every time you update your code. It catches bugs early and keeps your analysis solid. For example, it can test your data preprocessing before retraining a model.

In this post, I’ll show you how to set up CI for R. You’ll see tools, workflows, and best practices that work.

What is R Continuous Integration?

Continuous integration (CI) means testing your code every time you make a change. It happens automatically, often when you push to GitHub.

In R projects, this could mean checking if your scripts still run. Or if your model training code fails after a new update.

CI tools give fast feedback. You know right away when something breaks.

This helps in data science, where many things can silently fail, such as a missing column or a new NA value in your dataset.

CI ensures your work remains reliable. Even when projects become large or teams expand.

Benefits of Continuous Integration in R

Benefits of Continuous Integration in R

Key Challenges in R Continuous Integration

Key Challenges in R Continuous Integration

Choosing the Right Tools for R Continuous Integration

There are several platforms that support CI for R.

GitHub Actions

We will also use GitHub Actions later in this article.

  • It works directly inside GitHub.
  • The r-lib/actions/setup-r action sets up R.
  • It runs tests, checks the package, and lints code.

Why do we use it?

  • Free for public repositories
  • Easy to set up
  • Actively maintained by the R community

Docker

Docker helps define a consistent R environment.

  • We can test the same environment locally and remotely.
  • It's useful if the project has system-level dependencies.

Less common tools

  • Travis CI: Still used, but less common.
  • CircleCI / GitLab CI: Flexible but more complex.
  • Jenkins: Powerful, mostly used in large teams.
  • AppVeyor: For Windows testing.

Summary

For most R projects, GitHub Actions is enough. It's simple, stable, and supported by the R ecosystem.

Example CI Workflow for an R Project with GitHub Actions

Let's say you have a machine learning project in R, where a team of three people collaborates on the same repository. They all edit the same model training script. Each time someone pushes a change, your manager wants to make sure:

  • The model object is created successfully
  • Predictions are numeric and binary (only 0 or 1)
  • The script doesn't silently fail due to bad data or broken code

To manage this, you decide to add tests and CI.

Understanding the R Project Structure for CI

Here’s a simplified structure of your repo:

r-ci/
├── R/
│   └── train_model.R 
├── tests/
│   └── testthat/
│       └── test_model.R 
├── .github/
│   └── workflows/
│       └── r-ci.yaml
└── r-ci.Rproj


Let’s check how each file works in our project:

  • R/train_model.R: Core script that trains the model.
  • tests/testthat/test_model.R: Contains tests for model output.
  • .github/workflows/r-ci.yaml: Defines CI steps for GitHub.
  • r-ci.Rproj : RStudio project file for development.

We’ll update the model code, push it to GitHub, and watch CI validate the changes automatically.

Writing and Running Automated Tests in R

You’ve already written tests to make sure your model predictions are numeric and binary, and the model object is valid. You could run these checks manually every time, but that’s error-prone and easy to forget.

A better way is to automate this with CI.

That’s where we use GitHub Actions. It runs your tests automatically every time you push to GitHub. These tests are written using the testthat package, a popular unit testing framework in R. GitHub Actions installs this package and uses it to run your test scripts inside tests/testthat/test_model.R.

Here’s what those tests might look like:

test_that("All predictions are numeric and binary", {
  expect_true(is.numeric(output$predictions))
  expect_true(all(output$predicted_class %in% c(0, 1)))
})

test_that("Model object exists", {
  expect_s3_class(output$model, "glm")
})


This way:

  • GitHub Actions knows where to find your tests
  • The testthat package can run them properly
  • Your CI workflow (r-ci.yaml) knows what to execute

So, once you've written your tests and pushed the code, CI automatically runs them on every commit or pull request.

Now, every time someone pushes to the repo, CI runs these tests for you automatically.

No one needs to remember the rule. No one needs to manually run checks. If something breaks, CI fails the build.

The CI Workflow File

Now that you’ve got tests in place, you need to tell GitHub Actions how to run them.

To do that, you write a workflow configuration file.

  • Filename: r-ci.yaml
  • Location: .github/workflows/

Here is the code.

name: R-CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  check:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up R
        uses: r-lib/actions/setup-r@v2

      - name: Install dependencies
        run: |
          Rscript -e 'install.packages("remotes")'
          Rscript -e 'remotes::install_deps(dependencies = TRUE)'

      - name: Run tests
        run: |
          Rscript -e 'install.packages("testthat")'
          Rscript -e 'testthat::test_dir("tests/testthat")'

      - name: Lint code
        run: |
          Rscript -e 'install.packages("lintr")'
          Rscript -e 'lintr::lint_dir()'


Whenever someone pushes to main branch (or opens a pull request), GitHub Actions will:

  1. Set up an R environment
  2. Install your dependencies
  3. Run your testthat tests (tests/testthat/test_model.R)
  4. Lint the code for style issues

That’s the glue between your repo and continuous integration. From here, every commit gets checked automatically.

Testing and Debugging Your CI Pipeline

To test the workflow, we’ll start with a dataset that looks simple enough.

  • Filename: train_model.R
  • Location: R/train_model.R

# R/train_model.R

student_data <- data.frame(
  age = c(21, 22, 23, 24, NA),
  gender = c("M", "F", "F", "M", "F"),
  score = c(85, 90, 78, 88, 92),
  passed = c(1, 1, 0, 1, 1)
)

student_data <- na.omit(student_data)

predictor_vars <- c("age", "score")
if (!all(sapply(student_data[, predictor_vars], is.numeric))) {
  stop("All predictor variables must be numeric.")
}

At first glance, nothing looks wrong. But notice the dataset:

  • age has an NA
  • gender is categorical

When we run this through our tests in GitHub Actions, the pipeline fails with an error.

Create a Repository

Now, if you don’t have a GitHub repository, you will need one to upload all these files, as we will use GitHub Actions to test. Read this one to learn how to create a GitHub repository.

Pushing to GitHub

Now, let’s send the changes you made to GitHub. To do that, you can use the following commands.

git add . && git commit -m "CI" && git push

GitHub Actions

After sending the changes, go ahead to the “Github Actions” on your repository.

GitHub Actions in R Continuous Integration


You can also click on it to see the progress.

GitHub Actions in R Continuous Integration


You can also click on it to see the progress.

GitHub Actions in R Continuous Integration

Here, it ran tests we’ve defined, but we’ve got an error:

GitHub Actions in R Continuous Integration


If we click on “Run tests”, it directs us to the location where the error occurs. Please check.

GitHub Actions in R Continuous Integration

As you can see, the error happens because “All predictions should be numeric”. Now let’s update our code and send it one more time.

Correction

We removed the missing value and excluded the non-numeric gender variable from the model to ensure all predictors are numeric. Here is the latest version of train_model.R.

# R/train_model.R

student_data <- data.frame(
  age = c(21, 22, 23, 24, 25),
  gender = c("M", "F", "F", "M", "F"),   # kept, but not used in the model
  score = c(85, 90, 78, 88, 92),
  passed = c(1, 1, 0, 1, 1)
)

student_data <- na.omit(student_data)

predictor_vars <- c("age", "score")

if (!all(sapply(student_data[, predictor_vars], is.numeric))) {
  stop("All predictor variables must be numeric.")
}

model <- glm(passed ~ age + score, data = student_data, family = binomial)

predictions <- predict(model, type = "response")
predicted_class <- ifelse(predictions > 0.5, 1, 0)

output <- list(
  model = model,
  predictions = predictions,
  predicted_class = predicted_class,
  student_data = student_data
)

saveRDS(output, file = "model_output.rds")

Pushing to GitHub After Correction

And send all updated files to GitHub once again with the following commands.

git add . && git commit -m "correction" && git push


Next, visit your GitHub repository and click on the “Actions”. Here, you’ll see the new changes.

GitHub Actions in R Continuous Integration


Here, we see the tests are now running.

GitHub Actions in R Continuous Integration


And voila! All tests have passed.

GitHub Actions in R Continuous Integration


Now your code has been successfully uploaded to GitHub because it has passed the tests!

Conclusion

Continuous Integration isn't just for big teams or packages; it fits everyday R workflows too.

By writing a few simple tests and connecting them to GitHub Actions, you’ve added an automated safety net to your project. Every push now triggers a full check of your model, ensuring your code runs, your output makes sense, and bugs don’t silently slip in.

Share

Become a data expert. Subscribe to our newsletter.