I was using the {furrr}
package to parallelize the execution of an R program,
but wasn’t seeing the performance improvement I was expecting. It took a bit of
digging through the
package documentation
to figure out that the defaults used in the package weren’t optimal for my use
case.
The default chunking strategy used by {furrr}
works well when each task takes
around the same time to run. Building upon one of the examples from the
package site:
library(furrr)
library(tictoc)
plan(multisession, workers = 2)
tic()
nothingness <- future_map(c(2, 2, 2, 2), ~Sys.sleep(.x))
toc()
## 4.205 sec elapsed
In this scenario, the first two elements of the input vector are sent to worker one, and the second two elements are sent to worker two. Since each task takes two seconds to run, this parallelizes easily.
This implementation works less well when the element run times are not as balanced:
tic()
nothingness <- future_map(c(12, 12, 1, 1), ~Sys.sleep(.x))
toc()
## 24.093 sec elapsed
Here worker two finishes both its elements and sits idle before worker one is even done with the first one. The run time is barely faster than if I hadn’t employed parallelization at all.
If the input vector was better balanced, the parallelization performance would better match my expectations:
tic()
nothingness <- future_map(c(12, 1, 12, 1), ~Sys.sleep(.x))
toc()
## 13.152 sec elapsed
But, that is not always easy to determine or setup beforehand.
A solution in my case was the set the scheduling
argument within
furrr_options()
to Inf
. Now to start off worker one will get the first element,
and worker two will get the second one. Once an element is complete, the next one
will be allocated:
options = furrr_options(scheduling = Inf)
tic()
nothingness <- future_map(c(12, 12, 1, 1), ~Sys.sleep(.x), .options = options)
toc()
## 13.444 sec elapsed