Monthly Archives: October 2015

A One Precursor One siRNA Model for Pol IV-Dependent siRNA Biogenesis

A One Precursor One siRNA Model for Pol IV-Dependent siRNA Biogenesis (Zhai J, Bischof S, Wang H, Feng S, Lee TF, Teng C, Chen X, Park SY, Liu L, Gallego-Bartolome J, Liu W, Henderson IR, Meyers BC, Ausin I, Jacobsen SE, PMID: 26451488)

In this work the authors demonstrate that the Arabidopsis Pol IV-dependent siRNA precursors, named P4RNAs, are not as long as it was previously assumed: P4RNAs are indeed 30÷40-nt. The characterization of the P4RNAs length and sequence composition give insights to the mechanisms of Pol IV transcription initiation and termination and of DCL processing of the P4RNAs into siRNAs.

P4RNAs are the precursors of Pol IV siRNAs

P4RNAs are 30÷40-nt, as shown by the size distribution of the PATH libraries, and are dependent on both Pol IV and RDR2, suggesting that in vivo the two enzymes work in tight association.

Multiple experiments confirm that these long RNAs are the precursors of siRNAs and not misprocessed siRNAs, for example in the dcl2/3/4 mutant, siRNAs are mainly lost while P4RNAs are increased in abundance but AGO4 still selectively binds to the remaining 22-24-nt siRNAs and not to the longer RNAs. At Pol IV siRNA loci, siRNAs and P4RNAs show positively correlated abundances and interestingly, restricting the analysis on the Pol IV siRNA loci with a strand bias of siRNA accumulation and DNA methylation, the P4RNAs accumulation shows the same strand bias. This result suggests that Pol IV-derived strands, rather than the RDR2-derived strands, are strongly favored to become the final 24-nt siRNAs.

Because of the small length of P4RNAs on average only one 24-nt siRNA is processed by each P4RNA precursor.

P4RNA 5’ end

Pol IV is demonstrated to have retained the same TSS preference from its evolutionary ancestor Pol II (Y/R rule) but the two polymerases are here shown to occupy different genomic territories.

At 5’ end, P4RNAs are enriched in A, as it is known for the siRNAs, and the majority appear to have a 5’ monophosphate: I think this last result was in some way expected because of the cloning technique used to construct the PATH libraries.

P4RNA 3’end

P4RNAs that perfectly match to the genome are shown to have an enrichment of ACU in their three last positions, but more than 50% of the total P4RNAs present mismatches at their 3’ ends and these non-templated P4RNAs have a different nucleotide pattern in their 3’ end. 3’ end mismatches are enriched in CG dinucleotides, being C the last matched base and G the first mismatched base, so where a C is found on the template DNA. The level of nucleotide mismatches at 3’ end is strongly decreased in ddm1/dcl3 compared to dcl3, proving the DNA methylation is influencing the misincorporation of nucleotides by Pol IV. In this model, the DNA cytosine methylation causes the termination of Pol IV transcription to give rise to the short siRNA precursors. What still remains unclear to me is: why exactly after 30÷40-nt? It would be interesting to know what is the frequency of finding a methylated C after a Pol II-like TSS in the genome.

By contrast to the P4RNAs, only 1% of the total siRNAs have mismatches at their 3’ end. This result, together with the shared 5’ A enrichment and strand bias between siRNAs and P4RNAs, suggest that siRNAs are preferentially cleaved from the 5’ portion of their P4RNA precursors.


Another recent work “Identification of Pol IV and RDR2-dependent precursors of 24 nt siRNAs guiding de novo DNA methylation in Arabidopsis” (Blevins T, Podicheti R, Mishra V, Marasco M, Wang J, Rusch D, Tang H, Pikaard CS, PMID: 26430765) confirms the short nature of the siRNA precursors but with a main difference: here, the precursors of siRNAs are found to have a strong preference for a 5’ purine but with similar frequencies for A and G. Compared to precursors with 5’ A, those with 5’ G have 3’ end pattern more similar to that of siRNAs, suggesting that these 5’ G precursors might be processed from their 3’ portion to give rise to siRNAs. It would be interesting to understand why these 5’ G siRNA precursors were not observed in the previous described work.


ShortStack now on iPlant

A big thanks to Nate Johnson for coordinating the loading of ShortStack 3.3 onto iPlant’s Discovery Environment. There is now a ‘public app’ for ShortStack on the discovery environment. Just search ‘ShortStack’ in the apps.

This will allow users to leverage iPlant’s compute resources instead of their local machines to run ShortStack. It also uses a gui interface instead of the command line to run the software.

Anyone who uses this, please let me know! ( .. would love to track how much use the community is getting out of the iPlant app.

CRISPR/Cas9 ‘toolbox’ of vectors for plants

Since today’s seminar speaker here is Dr. Daniel Voytas from Minnesota, I went looking for some of his recent papers. I came across this recent one, from him and his collaborators, reporting a series of vectors optimized for plant transgenesis with various types of CRISPR activities, including multi-plexed targeting and transcriptional activation. At a glance the experiments look convincing, and the vector series is available at Addgene. Folks in my lab who are interested in multi-plexed CRISPR/Cas9 targeting might want to look at this study.

A CRISPR/Cas9 Toolbox for Multiplexed Plant Genome Editing and Transcriptional Regulation. (2015; Plant Physiology. doi: 10.1104/pp.15.00636)


Annotation of miRNAs and phasiRNAs from Norway Spruce

This week I selected a recent article by Xia et al. (2015) titled Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants” (doi: 10.1093/molbev/msv164, PMID: 26318183) for our journal club discussion. This was a pretty neat paper, with some interesting insights about the phasiRNA and miR482/2118 superfamily network in spruce. I also looked up the Norway spruce genome paper (Nystedt et al. 2013) to check some details about the small RNA seq data, and it seemed like the genome also has some distinct characteristics. Firstly, the coniferous Norway spruce (Picea abies) has a considerably large genome -20 gigabase pairs (the biggest plant genome assayed up to now is a monocot Paris japonica with 150 Gbp genome). Despite its huge size, the number of coding genes in spruce is comparable to that of arabidopsis, which has a much smaller genome ~135 Mbp (Nystedt et al. 2013). The bulk of the spruce genome is contributed by transposons, but the overall frequency of repeat-associated 24 mer small RNAs is also much lower in spruce compared to angiosperms. There were previously some conflicting studies about the presence of 24 mers in spruce, but the small RNA seq data from as many as 22 samples in the Nystedt et al. (2013) study showed that 24 mer small RNAs are indeed expressed in spruce but mainly in reproductive tissues. Based on these small RNA seq datasets, Nystedt et al. (2013) also reported 2,719 de novo miRNA gene annotations in spruce using UEA sRNA workbench tools, which is quite big compared to the typical range (~100 – 750) of miRNA annotations generated in diverse plant species listed in miRBase 21.

Xia et al (2015) reanalyzed the small RNA sequencing data (~352 million reads) published in spruce genome paper for de novo discovery of miRNA and phased siRNA loci. Their relatively more stringent pipeline for de novo annotation generated a much smaller set of 585 miRNA loci than previously reported (Nystedt et al. 2013). Nearly half of these miRNA loci were new without any known homologs in miRBase. Subsequent comparison of the mature sequences with ginkgo small RNAs indicated one-third of the new loci may be conifer-specific. The miRNA discovery pipeline included PatMaN for mapping small RNAs to the spruce genome, which were then filtered based on a set of conditions and then passed on to mireap for de novo annotation (Xia et al. 2015). One of these conditions was selecting only 20-22 mer fraction of the mapped small RNAs for miRNA gene discovery, but it was not clearly explained why the 23-24 mer fraction was discarded. Our analysis of Amborella small RNAs indicated that 24 mer miRNAs are expressed in the basal lineage of angiosperm (Amborella genome project, 2013) so I was interested to see if any such long miRNAs are also present in gymnosperms.

Among the Xia et al. (2015)- reported unique miRNA sequences, 21 mers were most predominant as expected (~46%), followed by 22 mers (20%, 119 miRNAs) and the rest were 20 mers. The most striking result was that spruce showed an expanded miR482/2118 superfamily (26 members) with relatively higher mature miRNA sequence diversity (only half of the positions in the consensus sequence was deeply conserved). These members not only triggered phasiRNAs from hundreds of NB-LRR genes (similar to miR482/2118 function in dicots), but also targeted noncoding transcripts in reproductive tissues (similar to grasses). This led to the authors propose a dual function of miR482/2118 family in the gymnosperms. Some of the spruce miR482 precursors also showed strong evidence of evolutionary emergence from NB-LRR genes. The miR390-TAS3 phasiRNA network in spruce also showed high diversity in miR390 target sites (1 -3 sites per transcript) as well as tasiARF regions. The spruce TAS3 gene family was also the largest one (16 genes) identified to-date. These results indicate an extensive network of miRNA and phasiRNAs in spruce, and it will be really interesting to figure out their actual function. The spruce 24 mer siRNAs were not analyzed in detail by Xia et al (2015), and I am curious about the role of these in heretrochromatin silencing in spruce. The authors suggested that the phased siRNAs may have a role in regulating the transposons in spruce genome, but there is no clear evidence of these targeting the transposons. Another interesting point is how the handling of multi-mappers by different tools affects the number of miRNA discovery. It will be great to compare these annotations with de novo annotations from ShortStack, which has a relatively better approach for assigning multi-mappers compared to PatMan, which I believe keeps all possible locations for multi-mapped reads.


Xia, Rui, et al. “Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants.” Molecular biology and evolution (2015): msv164.

Nystedt, Björn, et al. “The Norway spruce genome sequence and conifer genome evolution.” Nature 497.7451 (2013): 579-584.