Where have my nanopore reads dissapeared?

You might have been so excited to see your fastq reads coming immediately out of the MinION that you might not (initially) realized that the data you started to analyze are NOT all the data there is! In case your computer wasn’t able to do all the base-calling on the fly, there will be plenty of the data in your skip folder — in our recent run, this was as much as 90% of it!

One way to fix this is to catch up with the base-calling using albacore on Linux machine. In my case, I achieved it using following commands:

PYTHON_VERSION=3.5
conda create --prefix /biomonika/conda/nanopore_basecalling python=${PYTHON_VERSION} anaconda
source activate nanopore_basecalling
conda update --all
wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-2.2.7-cp35-cp35m-manylinux1_x86_64.whl
pip install ont_albacore-2.2.*.whl

So how does our loop submitting jobs look like?

for i in `seq 0 48`; do
  echo $i;
  sbatch basecalling_job.sh /biomonika/nanopore_run/data/reads/20180426_2154_/fast5/skip/${i} ${i};
done;

And this is the insides of basecalling_job.sh

#!/bin/bash
#SBATCH --job-name=biomonika
#SBATCH --output=biomonika-%j.out
#SBATCH --error=biomonika-%j.err
#SBATCH --mem-per-cpu=2G
#SBATCH --ntasks=63
#SBATCH --cpus-per-task=1

set -x
set -e

input_dir=$1
output_dir=$2

echo "input dir: " ${input_dir}
echo "output dir: " ${output_dir}

which conda
conda info

source activate /galaxy/home/biomonika/conda/nanopore_basecalling
read_fast5_basecaller.py --flowcell FLO-MIN106 --kit SQK-LSK108 --barcoding --output_format fastq --input ${input_dir} --save_path ${output_dir} --worker_threads 63
echo "Done."

As usual, please comment if you have suggestions or advice. Happy base-calling!