Optimizing parallelization with {furrr}

May 14, 2021 R furrr

I was using the {furrr} package to parallelize the execution of an R program, but wasn’t seeing the performance improvement I was expecting. It took a bit of digging through the package documentation to figure out that the defaults used in the package weren’t optimal for my use case.

The default chunking strategy used by {furrr} works well when each task takes around the same time to run. Building upon one of the examples from the package site:

library(furrr)
library(tictoc)

plan(multisession, workers = 2)
tic()
nothingness <- future_map(c(2, 2, 2, 2), ~Sys.sleep(.x))
toc()

## 4.205 sec elapsed

In this scenario, the first two elements of the input vector are sent to worker one, and the second two elements are sent to worker two. Since each task takes two seconds to run, this parallelizes easily.

This implementation works less well when the element run times are not as balanced:

tic()
nothingness <- future_map(c(12, 12, 1, 1), ~Sys.sleep(.x))
toc()

## 24.093 sec elapsed

Here worker two finishes both its elements and sits idle before worker one is even done with the first one. The run time is barely faster than if I hadn’t employed parallelization at all.

If the input vector was better balanced, the parallelization performance would better match my expectations:

tic()
nothingness <- future_map(c(12, 1, 12, 1), ~Sys.sleep(.x))
toc()

## 13.152 sec elapsed

But, that is not always easy to determine or setup beforehand.

A solution in my case was the set the scheduling argument within furrr_options() to Inf. Now to start off worker one will get the first element, and worker two will get the second one. Once an element is complete, the next one will be allocated:

options = furrr_options(scheduling = Inf)
tic()
nothingness <- future_map(c(12, 12, 1, 1), ~Sys.sleep(.x), .options = options)
toc()

## 13.444 sec elapsed