Rcount: simple and flexible RNA-Seq read counting
Marc W. Schmid* and Ueli Grossniklaus
Institute of Plant Biology and Zu€rich-Basel Plant Science Center, University of Zurich, 8008 Zu€rich, Switzerland
Nate showed me this paper today which is of some interest to us given my obsession with finding a better way to deal with the issue of multi-mapping reads in small RNA-seq data (e.g., with the butter program). This paper describes a tool called Rcount, which is a counter for ‘normal’ mRNA-seq data. As described in the paper, Rcount takes in a BAM file, and deals with multireads. According to figure 1 (copied below), the way they do this is to use the density of local uniquely mapped reads and make a probability assessment… the more uniquely mapped reads in an area, the more likely it is that the multi-read also came from that location. They then place it, noting their calculated probability in the SAM line with a custom tag. Rcount then performs another task (dealing with counting reads that overlap more than one gene annotation) and counts up reads in annotated genes for the user.
Rcount is clearly geared toward counting reads in annotated genes with reference to mRNA-seq data. For that reason, I doubt the program itself will be that useful for small RNA-seq data, where we are not generally interested in counting reads in pre-defined intervals (like gene annotations). But it is striking that Rcount is using pretty much exactly the method that my butter program uses for assigning reads … using the density of the unique mappers to create a probability set used to guide decisions on multi-mappers. I think Nate is going to try and use Rcount for small RNA-seq data.
I don’t think this precludes continued development of butter or it’s successor, because Rcount is pretty clearly geared toward mRNA-seq data. But it is worth testing, if possible, against butter and other methods for small RNA-seq to try and determine for our own lab purposes an optimal method for aligning multi-mapped small RNA-seq reads that is both precise and reproducible.
– Mike Axtell