Patra et al. (2014). plantDARIO: web based quantitative and qualitative analysis of small RNA-seq data in plants. Frontiers in Plant science.
This manuscript describes a web-based service for the annotation of small RNA-producing genes in Arabidopsis thaliana, Beta vulgaris, and Solanum lycopersicum (the authors also state that they plan to extend the number of plant species to “…include most of the available plant genomes.”. Users provide aligned small RNA data in BAM or bed format, and the authors provide a script for condensing reads aligned to the same position. Thus the authors reduce the burden of large data transfers. The web server parses the aligned small RNA data with respect to several pre-loaded annotation tracks, including known miRNAs (from miRBase), known tasi-RNAs, tRNAs, and other ncRNAs from Rfam. Global stats are spit out for the library. Clusters of reads that don’t overlap any annotated regions are flagged, and some miRNA finding and snoRNA finding programs are run. Results can be integrated onto other publically available genome browsers for the species of interest, located on other servers.
I found this manuscript interesting for a couple of reasons. First, I had often wondered about how to make my own small RNA-seq program, ShortStack, available as a web-service. I have not done this, primarily because the input for ShortStack is raw small RNA-seq data, or BAM files of aligned small RNA-seq data, along with the reference genome. This would be tedious to upload for users because of the file sizes. The large file sizes could also place a big demand of the server, as could the intense number of CPU cycles that might be run. It looks like the authors of plantDARIO have gone around this issue by outsourcing the alignments to the user, and enforcing a read-condensation scheme.
The second thing I found interesting about this work was a brief mention of the alignment methods. In particular, the authors state “Unlike many other mapping tools, segemehl has full support for multiple-mapping reads which is very important for small RNA-seq”. I am quite interested in improving the treatment of how multi-mapped small RNA-seq reads are placed and used (see butter). I have not heard of the program “segemehl” before. The relevant paper is Otto et al., 2014, which I will need to put on my reading list.
The third thing I was interested in was the method for annotating small RNA clusters that didn’t overlap a known gene. The authors are using a tool called “blockbuster”, which was described in another earlier paper from this group, Langenberger et al. 2009. Will have to check this out too.
My final thoughts on this paper have to do with comparing a web-based service like plantDARIO to a stand-alone program like ShortStack. The authors of this paper make a plug for a web-based service and ding stand-alone programs by stating “The other sncRNA prediction tools need to be downloaded, installed and run locally, requiring more than basic computer skills.” Well yes, this is true. But there are significant advantages of a stand-alone vs. their approach to web-based analysis. With a standalone, you can use any genome assembly or assembly version you want. But with their approach, you are limited to whatever they have pre-configured. Moving to new species, or even updating with a newer genome assembly version, is not possible except by requesting the authors to update their site. There is a lot more flexibility to be gained with a standalone.
In any event, an interesting read. I’m looking forward to trying out the tool, and to reading some more of the background methods, especially alignments and de-novo cluster finding.
PS. One error: My ShortStack paper is erroneously cited as “Allen et al. (2013)” instead of “Axtell (2013)”. The author lists of my paper and a 2004 paper from the Carrington Lab, with Ed Allen as lead author, appear to have been swapped in the ref. cited section.