Category Archives: Uncategorized

Mobile sRNAs can induce methylation on a genome-wide scale

Posted on April 8, 2016 by Nate Johnson | Leave a comment

The paper I chose to discuss was “Mobile small RNAs regulate genome-wide DNA methylation”, from the Ecker and Baulcombe groups. (PMID: 26787884, doi: 10.1073/pnas.1515072113). This goal of this study was to identify mobile sRNA loci and methylation loci, based on their interaction. To do this, the authors used shoot/root grafts of wild types: Col-0 and C24 and a mutant lacking siRNA formation: dcl234, as to elucidate the requirement of a mobile signal. Mobile sRNAs were identified where WT shoot could produce transcripts that were sequenced in dcl234 roots.

They identified 3 relevant classes of mobile-sRNAs and targets: direct interaction, indirect interaction and de novo methylation. Direct interaction was shown where mobile-sRNAs were were enriched in a methylation site, whereas indirect had methylation due to mobile-sRNAs, but have no clear sRNA culprit. De novo loci are shown where methylation is induced by mobile-sRNAs in Col-0 root by a C24 shoot, but is not present with a Col-0 shoot. This study found widespread accounts of direct and indirect loci, as well as significant de novo methylated sited.

The huge quantity of loci identified as indirectly methylated was an interesting point of this paper. The authors have several thoughts for why this might occur, focusing on a possible secondary signal bridging the mobile signal and methylation or the possibility of aggressive threshold for mobile-sRNA significance inducing false negatives. Another point brought up is the possibility of requiring perfect matching with sRNA alignment resulting in missing valid secondary alignment of transcripts. This certainly seems possible to me, as allowing for a single mis-match opens a much more inclusive set of sRNA targets. These could be biologically relevant despite the mis-match.

Genomically, these direct and indirectly targeted loci are localized in transposable elements, while depleted in coding regions. This is true with both CHH and CHG methylation. Using mutant libraries from another study (Stroud et al. 2013 – Cell), the authors made a clear connection that mobile-sRNA methylation targets are dependent on DRM1/2. This makes a strong case for methylation through an AGO-dependent pathway.

This was an interesting paper, which gave strong evidence to support several of the claims. As an observational study, it gave support to previous work which indicated methylation but on a loci-specific scale. It is clear that this is taking place on a much broader level.

Leave a comment

Posted in Uncategorized

NO BAR CHARTS! (Weissgerber et al. 2015)

Posted on November 17, 2015 by mja18 | 1 comment

I came across this article in PLoS Biology (Weissgerber et al. 2015 doi: 10.1371/journal.pbio.1002128) and it very clearly laid out the issues of using barcharts to display continuous data, especially with low sample sizes. Bottom line: barcharts are misleading summaries of the data, and it’s better to show the data themselves with univariate scatter plots or boxplots. Figure 1 from their paper dramatically shows how barcharts can mask very different distributions of the data:

Figure 1: Weissgerber et al. (2015).

Figure 3 from this paper is also very striking. Here we can easily see the fallacy of using standard error of the mean (SEM), or standard deviation (SD) for non-normally distributed, low sample size data:

Figure 3: Weissgerber et al. (2015)

Some people in my lab probably remember me complaining about barcharts for qRT-PCR experiments where n=3 or n=6 on one of our recent papers. I prevailed, and we had scatter plots instead of barcharts. Maybe now my motivations are clearer? 🙂

Figure 7D from our recent paper (Coruh et al. 2015 Plant Cell doi: 10.1105/tpc.15.00228) .. note avoidance of barchart!

1 Comment

Posted in Uncategorized

Annotation of miRNAs and phasiRNAs from Norway Spruce

Posted on October 2, 2015 by Saima Shahid | Leave a comment

This week I selected a recent article by Xia et al. (2015) titled “Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants” (doi: 10.1093/molbev/msv164, PMID: 26318183) for our journal club discussion. This was a pretty neat paper, with some interesting insights about the phasiRNA and miR482/2118 superfamily network in spruce. I also looked up the Norway spruce genome paper (Nystedt et al. 2013) to check some details about the small RNA seq data, and it seemed like the genome also has some distinct characteristics. Firstly, the coniferous Norway spruce (Picea abies) has a considerably large genome -20 gigabase pairs (the biggest plant genome assayed up to now is a monocot Paris japonica with 150 Gbp genome). Despite its huge size, the number of coding genes in spruce is comparable to that of arabidopsis, which has a much smaller genome ~135 Mbp (Nystedt et al. 2013). The bulk of the spruce genome is contributed by transposons, but the overall frequency of repeat-associated 24 mer small RNAs is also much lower in spruce compared to angiosperms. There were previously some conflicting studies about the presence of 24 mers in spruce, but the small RNA seq data from as many as 22 samples in the Nystedt et al. (2013) study showed that 24 mer small RNAs are indeed expressed in spruce but mainly in reproductive tissues. Based on these small RNA seq datasets, Nystedt et al. (2013) also reported 2,719 de novo miRNA gene annotations in spruce using UEA sRNA workbench tools, which is quite big compared to the typical range (~100 – 750) of miRNA annotations generated in diverse plant species listed in miRBase 21.

Xia et al (2015) reanalyzed the small RNA sequencing data (~352 million reads) published in spruce genome paper for de novo discovery of miRNA and phased siRNA loci. Their relatively more stringent pipeline for de novo annotation generated a much smaller set of 585 miRNA loci than previously reported (Nystedt et al. 2013). Nearly half of these miRNA loci were new without any known homologs in miRBase. Subsequent comparison of the mature sequences with ginkgo small RNAs indicated one-third of the new loci may be conifer-specific. The miRNA discovery pipeline included PatMaN for mapping small RNAs to the spruce genome, which were then filtered based on a set of conditions and then passed on to mireap for de novo annotation (Xia et al. 2015). One of these conditions was selecting only 20-22 mer fraction of the mapped small RNAs for miRNA gene discovery, but it was not clearly explained why the 23-24 mer fraction was discarded. Our analysis of Amborella small RNAs indicated that 24 mer miRNAs are expressed in the basal lineage of angiosperm (Amborella genome project, 2013) so I was interested to see if any such long miRNAs are also present in gymnosperms.

Among the Xia et al. (2015)- reported unique miRNA sequences, 21 mers were most predominant as expected (~46%), followed by 22 mers (20%, 119 miRNAs) and the rest were 20 mers. The most striking result was that spruce showed an expanded miR482/2118 superfamily (26 members) with relatively higher mature miRNA sequence diversity (only half of the positions in the consensus sequence was deeply conserved). These members not only triggered phasiRNAs from hundreds of NB-LRR genes (similar to miR482/2118 function in dicots), but also targeted noncoding transcripts in reproductive tissues (similar to grasses). This led to the authors propose a dual function of miR482/2118 family in the gymnosperms. Some of the spruce miR482 precursors also showed strong evidence of evolutionary emergence from NB-LRR genes. The miR390-TAS3 phasiRNA network in spruce also showed high diversity in miR390 target sites (1 -3 sites per transcript) as well as tasiARF regions. The spruce TAS3 gene family was also the largest one (16 genes) identified to-date. These results indicate an extensive network of miRNA and phasiRNAs in spruce, and it will be really interesting to figure out their actual function. The spruce 24 mer siRNAs were not analyzed in detail by Xia et al (2015), and I am curious about the role of these in heretrochromatin silencing in spruce. The authors suggested that the phased siRNAs may have a role in regulating the transposons in spruce genome, but there is no clear evidence of these targeting the transposons. Another interesting point is how the handling of multi-mappers by different tools affects the number of miRNA discovery. It will be great to compare these annotations with de novo annotations from ShortStack, which has a relatively better approach for assigning multi-mappers compared to PatMan, which I believe keeps all possible locations for multi-mapped reads.

References

Xia, Rui, et al. “Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants.” Molecular biology and evolution (2015): msv164.

Nystedt, Björn, et al. “The Norway spruce genome sequence and conifer genome evolution.” Nature 497.7451 (2013): 579-584.

Leave a comment

Posted in Uncategorized

Plant miRNA evolution update

Posted on September 25, 2015 by mja18 | Leave a comment

So here’s another figure I’ve prepared for the Plant Cell review article I am writing. This is an update on the patterns of miRNA conservation across land plants according solely to the information present in miRBase release 21.

For this analysis, I first placed each land plant miRNA family in miRBase 21 into one of eight broad plant groups (Eudicots-Rosids, Eudicots-Asterids, Basal Eudicots, Monocots, Basal Angiosperms, Gymnosperms, Lycophytes, and Bryophytes). I then defined a conserved miRNA family as one which had at least one high-confidence annotation, and which was annotated in two or more of these broad groups. By this definition, there are just 36 conserved families out of the more than 2,000 that are currently annotated.

This figure also highlights a major issue in the large-scale analysis of land plant miRNA conservation .. highly unequal sampling density within the different groups. Rosids and monocots have received the most attention, with large numbers of species represented, and high numbers of overall annotations (barcharts at the top). In contrast, basal eudicots, basal angiosperms, lycophytes, and bryophytes are much less well-sampled, with each group represented at present by just one species in miRBase 21. Inferring secondary losses in some of these lineages (basal eudicots, basal angiosperms, lycophytes) is NOT believable.

This chart also illustrates how important the high-quality miRNA annotation set for bryophytes is (the sole species represented is Physcomitrella patens). Based on the presence of high-confidence annotations in both Physcomitrella and one or more angiosperm group, we can confidently say that nine families (miR156, miR160, miR166, miR171, miR319, miR390, miR477, miR529, and miR535) were most likely present in the last common ancestor of all land plants. Another three families (miR167, miR395, and miR408) might also belong in the ultra-conserved set, but they are less certain because their Physcomitrella annotations are not high-confidence. Another two families (miR396 and miR482) clearly predate divergence of all seed plants. Other patterns are less certain because of the frequent presence of annotations that are not (yet) known to be high confidence. I think this again highlights the need for a systematic review of miRNA annotations, based on re-analysis of all available small RNA-seq data with a single, high-confidence MIRNA identification methodology. We are working toward this goal in my lab.

Leave a comment

Posted in Uncategorized

Printing on a Fabric Poster

Posted on September 2, 2015 by Nate Johnson | Leave a comment

Hi Everyone!

Mike suggested that I post some information on how and where to print fabric posters, like the one I made for ASPB this year.

There is a great website providing information on this whole topic, written by Jessica Polka through the American Society of Cell Biologists. This article is extremely informative, and can easily walk you through the steps to printing your own.

Some thoughts of mine:

The cost for printing is comparable to, if not better than anywhere in town. About $25 for their slow service which takes about 2 weeks and about $45 for a rush order (around 3 days to turn-around).
Quality is excellent! As good as any poster you’ve seen. If you don’t believe me, come look at mine.
The only drawback is some size restrictions… The quoted price can print a poster that is 36″x 58″, which is fine, but to make it larger would cost more.

Leave a comment

Posted in Uncategorized

CMA33/XCT Regulates Small RNA Production through Modulating the Transcription of Dicer-Like Genes in Arabidopsis.

Posted on March 31, 2015 by Ceyda Coruh | Leave a comment

Fang X¹, Shi Y¹, Lu X², Chen Z³, Qi Y⁴.

Mol Plant. 2015 Mar 11. pii: S1674-2052(15)00170-7.

doi: 10.1016/j.molp.2015.03.002

PMID:25770820

Using a forward genetic screen, this paper identifies a new component, XCT/CMA33, which seems to affect miRNA, tasiRNA and heterochromatic siRNA levels to some extent. This protein is highly conserved across eukaryotes and was previously shown to be involved in circadian rhythms and ethylene responses in Arabidopsis. Overall, the data suggests that XCT/CMA33 is required for the accumulation of miRNAs, tasiRNAs and heterochromatic siRNAs, through modulating the transcription of DCL1, DCL3, and DCL4 genes, respectively (based on Pol II occupancy assays). Although miRNAs have been found to be associated with circadian rhythms in animals, there are no data yet to suggest a direct link (or indirect link via XCT) between miRNAs and circadian rhythms in plants. It’s important to point out that XCT seems to be specific to only these three DCL genes but not other components of the small RNA biogenesis machinery. Despite the low degree of changes between wild-type and xct mutant, I found this paper interesting, especially because all of the phenotypes were able to be rescued by XCT transgene. Below are my detailed notes about the experiments performed..

They performed forward genetic screen and analyzed one of the mutants, cma33 (compromised miRNA activity 33), which displayed decreased trichome clustering, and plant stature with curled leaves and shorter siliques (Fig. 1A, B). They also observed increased accumulation of some miRNA target transcripts indicating an impairment in miRNA activity (Fig. 1C). Then, they found that cma33 carries an early stop codon in a gene encoding for a nuclear localized protein XAP5 CIRCADIAN TIMEKEEPER (XCT), which is highly conserved across species. XCT was previously shown to be involved in circadian clock and ethylene signaling. Transgenic expression of amiR-trichome causes an increased trichome clustering in amiR-triOX. A cross between amiR-triOX and xct-2 (T-DNA insertion mutant) showed reduced clustering of trichomes, indicating the role of XCT in miRNA activity.

Small RNA Northern blot showed reduced accumulation of amiR-trichome in cma33 compared to amiR-triOX (Fig. 2A). Also endogenous miRNA levels were decreased in xct-2 relative to Col-0 (Fig. 2B). They also observed a decrease in accumulation of DCL4-dependent miRNA, miR822 (Fig. 2C). Small RNA phenotype of xct-2 was fully complemented by introducing wild-type XCT gene. Increase accumulation of pri-miRNAs in xct–2 mutant suggested that XCT/CMA33 is involved in regulating pri-miRNA processing (Fig. 2D).

xct-2/ago1-25 double mutant displayed more severe developmental phenotype compared to single mutants. Also, it showed reduced accumulation of miRNAs while increased target transcript levels (Fig. 3). They further looked at the accumulation levels of tasiRNAs and a several heterochromatic siRNA loci. Reduced tasiRNA levels in correlation with increased target RNA transcripts seems to be an indirect effect since tasiRNAs are dependent on miRNA cleavage (Fig. 4A, B). Reduced accumulation of heterochromatic siRNAs was rescued by the XCT transgene. Similarly, decreased methylation at SIMPLEHAT2 and MEA-ISR loci was also rescued by the XCT transgene (Fig. 4C, D). However, I found the degree of change in both tasiRNA targets and methylation levels at the heterochromatin low.

In order to identify XCT/CMA33-interacting components in the small RNA biogenesis pathway, authors have tried Y2H, BiFC, and CoIP assays but they all failed. Then, they decided to check expression levels of genes involved in the biogenesis of different small RNAs. Accumulation of DCL (both transcript and protein levels) was reduced in the xct-2 mutant, whereas miR168-targeted AGO1 transcript levels were increased, possibly because of decreased miR168 levels (Fig.5 A-C). From the tasiRNA and heterochromatic siRNA biogenesis pathways, only DCL4 and DCL3 levels were reduced in the xct-2 mutant, respectively (Fig. 5D, E). Reduction of DCL3 protein was rescued by XCT transgene (Fig. 5F, G). Overall, data suggests that XCT/CMA33 regulates miRNAs, tasiRNAs and heterochromatic siRNAs through regulating the expression of DCL1, DCL4, and DCL3, respectively.

They ChIP’ed the promoter and coding regions of DCL genes using an antibody against the largest subunit of Pol II. Pol II occupancy seemed to be decreased at all regions in DCL1-3, but not in DCL4 (Fig. 6). Thus, data suggests that XCT/CMA33 affects the accumulation of small RNAs via promoting Pol II occupancy at DCL1, DCL2, and DCL3 genes.

Leave a comment

Posted in Uncategorized

miRNA annotation in Capsella rubella (Camelineae) indicates rapid divergence

Posted on February 17, 2015 by Nate Johnson | Leave a comment

Rapid divergence and high diversity of miRNAs and miRNA targets in the Camelineae

Lisa M. Smith, Hernan A. Burbano, Xi Wang, Joffrey Fitz, George Wang, Yonca Ural-Blimke and Detlef Weigel

Department of Animal and Plant Sciences, University of Sheffield, Western Bank, Sheffield S10 2TN, UK

Department of Molecular Biology, Max Planck Institute for Developmental Biology, Spemannstrasse 35, 72076 Tubingen

doi: 10.1111/tpj.12754

PMID: 25557441

This paper is from the most recent issue of The Plant Journal, and I thought it made some rather interesting points. The paper focused on small RNA seq of several tissues from Capsella rubella, a member of the Camelineae tribe and frequent outgroup to the Arabidopsis genus. With sRNA annotations from A. thaliana and A. lyrata, the authors look at the evolution of miRNAs within closely related species.

First of all, I was interested in the bioinformatics suite that this group chose to perform their annotation. After aligning unique reads with bowtie, they used several clustering softwares (miR-deep 1.3, DSAP, UEA sRNA toolkit), resulting in some wildly different loci and annotations. Within figure 1c, it appears that only half of known miRNA loci annotated by DSAP or miR-deep are found in common with each other, though this is higher when looking at miRNA families. Is this because of failings within these softwares (the authors mention a high false negative rate in miR-deep)?

The article goes on to look at the variation in miRNAs in relation to their target genes between the 3 species. They found that unique miRNA-target pairs were highly species divergent, with most pairings being unique to the different species. Of the non-divergent pairs, almost all are more ancient pairings that are present outside of brassicaceae, leading the authors to hypothesize that there are two differentially evolving subsets of miRNAs: “young, evolutionarily dynamic miRNAs, and older miRNAs with a conserved subset of mRNA targets”.

The authors go on to look at the levels of polymorphism in miRNAs and their targets throughout A. thaliana. This analysis ultimately lead to higher mutation rate in miRNA sequences themselves, forcing the authors to conclude that the target sites are undergoing stronger selection. I thought this was a bit confusing, as you might expect higher conservation in target sites which could be in the CDS of genes, a point mentioned by the authors but not elaborated upon.

– Nate

Leave a comment

Posted in Uncategorized

AGO4 and AGO6 are more specific in mediating RdDM than we expect

Posted on January 29, 2015 by Feng Wang | Leave a comment

Paper: Specific but interdependent functions for Arabidopsis AGO4 and AGO6 in RNA-directed DNA methylation by Duan et al.

EMBO J. doi: 10.15252/embj.201489453 PMID:25527293

The function of AGO6 has been considered redundant with AGO4 previously. This paper, however, shows that the redundancy of AGO4 and AGO6 in mediating RdDM is much smaller than we would expect. AGO4 and AGO6 dependent methylation is profiled by genome-wide bisulfite sequencing. Interestingly, DNA methylation in only a small subset of loci is redundantly regulated by AGO4 and AGO6. In more than half of the hypomethylation loci, DNA methylation is similarly reduced in either ago4-6 or ago6-2, and no significant reduction is observed in double mutant. This result indicates that AGO4 and AGO6 have related yet specific function in RdDM.

The authors also want to show the distinct function of AGO4 and AGO6 by studying their subcellular localization. The conclusion of the paper is that AGO4 and AGO6 show different co-localization patterns with DNA dependent RNA polymerases. However, I am not quite convinced by these immuno-staining figures. The localization of AGO4 and AGO6 are scattered in the nucleus and the co-localization signal with Pol IV or Pol V is not obvious. Even though the co-localization data is not convincing to me, I do agree with the authors that studying the localization of AGO4 and AGO6, especially the co-localization pattern with Pol IV or Pol V, is very important.

The other thing I am interested in is that the authors studied the accumulation of Pol V transcripts as well as Pol V occupancy in ago4 and ago6 mutant. A very interesting result is that Pol V occupancy at most tested IGN loci obviously decreases in ago6 mutant. The accumulation of most tested Pol V transcripts decreases in ago6 mutant but increases in ago4 mutant. These results indicate that AGO6 is required for Pol V recruitment. It is very intriguing that AGO4 and AGO6 show such distinct effect on Pol V occupancy. In my own study, I am trying to pull down AGO4 associated Pol V transcripts. It might be interesting to see if Pol V transcripts could also be pull down by AGO6. We have to notice that only a small number of Pol V transcripts are studied here. It remains unclear that whether this small subset can represent the real pattern.

Last thing to mention, another paper from the Slotkin lab (McCue et al. 2015) shows that AGO6 can load 21-22nt siRNAs and establish RdDM, which is also distinct from AGO4. In conclusion, these two proteins may have more specific functions than we expect.

Leave a comment

Posted in Uncategorized

Areas Around Micro RNA Targets Sites Are Typically Unstructured So As to Not Hinder RISC

Posted on January 28, 2015 by Seth Polydore | Leave a comment

One of my projects is to observe what (if any) effects the sequences flanking the miRNA target sites and the structure of the RNA transcript has on the miRNAs efficacy. I found a rather old paper (published in about 2013) that has found that areas flanking miRNAs target sites are typically unstructured. This paper uses a computational approach to the determined the aforementioned results. It goes without saying that I think this paper is interesting because my experiment could very well substantiate with experimental data or disagree with this study.

In this article, Selection on Synonymous Sites for Increased Accessibility around miRNA Binding Sites in Plants, the researchers retrieved the genomes and miRNAs for Arabidopsis thaliana, Zea mays, Oryza sativa, and Populus trichocarpa. They also downloaded expression data for miRNAs and their targets in A. thaliana from the Massively Parallel Signature Sequencing project. Using RNAFold, the researchers determined delta G open (the difference between the free energy of all secondary structures and the free energy of all structures in which the target site is unpaired), delta G local (the free energy of the local secondary structure of the miRNA target sites), and the GC content as typically higher GC content typically correlates with higher structure. They also calculated the Z-scores of these values as well.

Compared to the randomized sites, it was found that the area near the miRNA target sites are depleted in GC nucleotides and are typically unstructured. This trend is true regardless of the expression level of the mRNA target and the miRNA which targets it. However, what is really interesting is that the Z-score of delta G open (which is essentially the measure of how much energy is needed to “open” the miRNA target site) shows an obvious trend of decreasing as one moved closer to the miRNA target site and increasing as one moved away (in either 5′ or 3′ prime) from the same in all the species analyzed. However, this trend was apparent but much “weaker” in Arabidopsis. Also of note,targets of miRNAs that repress their targets by cleavage or translational repression show the exact same trend. I wonder if there is anything worth experimenting on this issue or if it’s merely an artifact of the data.

Again, I found this study interesting but some problems jump out at me. Firstly, programs like RNAFold are not totally accurate in determining the structure of transcripts in vivo. I wonder how the data will change if they used Sally Assmann’s DMS-seq data for Arabidopsis. Another issue is that the study only takes the 17 nucleotides upstream and 13 nucleotides downstream of the target into account when doing these analyses. This is because Kerterz et al. 2008 found that this region played an important role in animal miRNA repression efficiency. I wonder how this squares up with the collaboration we did with Christophe in which he hypothesized that because plants miRNA extensively binding with their target, flanking sequence context doesn’t change the miRNA efficacy.

In conclusion, I still think this paper is worth reading (or at least skimming through). One way or another, the experiment I mentioned will be important to this study. There are ways to transiently express genes in other species (such as Arabidopsis & rice), so it may be worth testing out the transient expression in these systems and see if the data is different from Nicotiana transient expression.

Note the papers can be found here:

PMID:22490819

doi: 10.1093/molbev/mss109

Leave a comment

Posted in Uncategorized

plantDARIO – a web-based tool for small RNA-seq analysis in select plant genomes

Posted on January 19, 2015 by mja18 | Leave a comment

Patra et al. (2014). plantDARIO: web based quantitative and qualitative analysis of small RNA-seq data in plants. Frontiers in Plant science.

doi: 10.3389/fpls.2014.00708
PMID: 25566282

This manuscript describes a web-based service for the annotation of small RNA-producing genes in Arabidopsis thaliana, Beta vulgaris, and Solanum lycopersicum (the authors also state that they plan to extend the number of plant species to “…include most of the available plant genomes.”. Users provide aligned small RNA data in BAM or bed format, and the authors provide a script for condensing reads aligned to the same position. Thus the authors reduce the burden of large data transfers. The web server parses the aligned small RNA data with respect to several pre-loaded annotation tracks, including known miRNAs (from miRBase), known tasi-RNAs, tRNAs, and other ncRNAs from Rfam. Global stats are spit out for the library. Clusters of reads that don’t overlap any annotated regions are flagged, and some miRNA finding and snoRNA finding programs are run. Results can be integrated onto other publically available genome browsers for the species of interest, located on other servers.

I found this manuscript interesting for a couple of reasons. First, I had often wondered about how to make my own small RNA-seq program, ShortStack, available as a web-service. I have not done this, primarily because the input for ShortStack is raw small RNA-seq data, or BAM files of aligned small RNA-seq data, along with the reference genome. This would be tedious to upload for users because of the file sizes. The large file sizes could also place a big demand of the server, as could the intense number of CPU cycles that might be run. It looks like the authors of plantDARIO have gone around this issue by outsourcing the alignments to the user, and enforcing a read-condensation scheme.

The second thing I found interesting about this work was a brief mention of the alignment methods. In particular, the authors state “Unlike many other mapping tools, segemehl has full support for multiple-mapping reads which is very important for small RNA-seq”. I am quite interested in improving the treatment of how multi-mapped small RNA-seq reads are placed and used (see butter). I have not heard of the program “segemehl” before. The relevant paper is Otto et al., 2014, which I will need to put on my reading list.

The third thing I was interested in was the method for annotating small RNA clusters that didn’t overlap a known gene. The authors are using a tool called “blockbuster”, which was described in another earlier paper from this group, Langenberger et al. 2009. Will have to check this out too.

My final thoughts on this paper have to do with comparing a web-based service like plantDARIO to a stand-alone program like ShortStack. The authors of this paper make a plug for a web-based service and ding stand-alone programs by stating “The other sncRNA prediction tools need to be downloaded, installed and run locally, requiring more than basic computer skills.” Well yes, this is true. But there are significant advantages of a stand-alone vs. their approach to web-based analysis. With a standalone, you can use any genome assembly or assembly version you want. But with their approach, you are limited to whatever they have pre-configured. Moving to new species, or even updating with a newer genome assembly version, is not possible except by requesting the authors to update their site. There is a lot more flexibility to be gained with a standalone.

In any event, an interesting read. I’m looking forward to trying out the tool, and to reading some more of the background methods, especially alignments and de-novo cluster finding.

PS. One error: My ShortStack paper is erroneously cited as “Allen et al. (2013)” instead of “Axtell (2013)”. The author lists of my paper and a 2004 paper from the Carrington Lab, with Ed Allen as lead author, appear to have been swapped in the ref. cited section.

–Mike Axtell

Leave a comment

Posted in Uncategorized

Tagged bioinformatics, ShortStack