ParalleLAZYing in R | Bioinformatics Bits and Pieces

I found myself loading and processing huge files in R. For each of those files, I needed to analyze many DNA motifs, so I naturally started with a for loop that, once my gargantuan file was comfortably sitting in a memory, went motif by motif and created an output for each motif separately.

This is a wonderful candidate situation for parallelization since:

I can process each motif independently
I can assign each motif to a single processor
Workers/processes don’t have to communicate
They write their output to a separate files (not a shared one so they don’t have to fight over access to it)

I quickly googled following resource by Max Gordon which demonstrates that basic implementation of parallelization (in simple cases like this one) can be very straightforward in R:

library(parallel)

list<-c("CGG","CGGT","CG","CGT") #my loooong list of motifs, shortened for illustration purposes

# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores,type="FORK")

print("PARALLEL")
ptm <- proc.time() #start timer
parLapply(cl, list,
 function(motif)
 processMotif(motif) #PROCESS EACH MOTIF INDIVIDUALLY
 )
stopCluster(cl)
proc.time() - ptm

print("NON-PARALLEL")
ptm <- proc.time()
for (motif in list) { 
 processMotif(motif)
}
proc.time() - ptm #stop timer

Let’s check how much time this chunk takes to run:

   user  system elapsed

908.340 300.906 432.906

And let’s compare that with the for-loop solution:

   user  system elapsed

8544.826 3079.385 6089.453

Happy paralleLAZYing!

(The code was run on GNU/Linux with 64 processors and 500GB RAM.)

One thought on “ParalleLAZYing in R”

Nice tutorial on implementing parallel processing in R.

Modern laptops and PCs today have multi-core processors with sufficient amount of memory available and one can use it to generate outputs quickly.

Once you learn, how to parallelize your code, you will only regret that why didn’t you learn it sooner.

https://stantyan.com

One thought on “ParalleLAZYing in R”

Leave a Reply Cancel reply