Rcount: dealing with multi-mapping reads in RNAseq data

Rcount: simple and flexible RNA-Seq read counting

Marc W. Schmid* and Ueli Grossniklaus

Institute of Plant Biology and Zu€rich-Basel Plant Science Center, University of Zurich, 8008 Zu€rich, Switzerland

Bioinformatics. doi:10.1093/bioinformatics/btu680, PMID: 25322836

Nate showed me this paper today which is of some interest to us given my obsession with finding a better way to deal with the issue of multi-mapping reads in small RNA-seq data (e.g., with the butter program). This paper describes a tool called Rcount, which is a counter for ‘normal’ mRNA-seq data. As described in the paper, Rcount takes in a BAM file, and deals with multireads. According to figure 1 (copied below), the way they do this is to use the density of local uniquely mapped reads and make a probability assessment… the more uniquely mapped reads in an area, the more likely it is that the multi-read also came from that location. They then place it, noting their calculated probability in the SAM line with a custom tag. Rcount then performs another task (dealing with counting reads that overlap more than one gene annotation) and counts up reads in annotated genes for the user.

Rcount is clearly geared toward counting reads in annotated genes with reference to mRNA-seq data. For that reason, I doubt the program itself will be that useful for small RNA-seq data, where we are not generally interested in counting reads in pre-defined intervals (like gene annotations). But it is striking that Rcount is using pretty much exactly the method that my butter program uses for assigning reads … using the density of the unique mappers to create a probability set used to guide decisions on multi-mappers. I think Nate is going to try and use Rcount for small RNA-seq data.

I don’t think this precludes continued development of butter or it’s successor, because Rcount is pretty clearly geared toward mRNA-seq data. But it is worth testing, if possible, against butter and other methods for small RNA-seq to try and determine for our own lab purposes an optimal method for aligning multi-mapped small RNA-seq reads that is both precise and reproducible.

– Mike Axtell

One response to “Rcount: dealing with multi-mapping reads in RNAseq data

  1. Nathan R Johnson

    Just a comment:

    As I read through the Rcount manual, it seems somewhat clear that even the authors are unconfident in the ability of the program to accurately estimate the placement of multi-mapped reads in samples with lower proportions of uniquely mapped reads.

    “It is important that the number of unique alignments is well above the number of multireads. If not, it is better to use only the uniquely aligned reads” – Rcount User Guide, Rcount-multireads section

    This may be a perfectly reasonable stricture with mRNA-seq data, but with small RNAs it seems somewhat unrealistic. For example, the library I have been using to analyze butter’s accuracy only has 24% uniquely mapped with 45% multi-mapping, well out of their speculated bounds.

    I’m still going to try this program out, to see if it acts similarly to butter.

Leave a Reply

Your email address will not be published. Required fields are marked *