Author Archives: Saima Shahid

Annotation of miRNAs and phasiRNAs from Norway Spruce

This week I selected a recent article by Xia et al. (2015) titled Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants” (doi: 10.1093/molbev/msv164, PMID: 26318183) for our journal club discussion. This was a pretty neat paper, with some interesting insights about the phasiRNA and miR482/2118 superfamily network in spruce. I also looked up the Norway spruce genome paper (Nystedt et al. 2013) to check some details about the small RNA seq data, and it seemed like the genome also has some distinct characteristics. Firstly, the coniferous Norway spruce (Picea abies) has a considerably large genome -20 gigabase pairs (the biggest plant genome assayed up to now is a monocot Paris japonica with 150 Gbp genome). Despite its huge size, the number of coding genes in spruce is comparable to that of arabidopsis, which has a much smaller genome ~135 Mbp (Nystedt et al. 2013). The bulk of the spruce genome is contributed by transposons, but the overall frequency of repeat-associated 24 mer small RNAs is also much lower in spruce compared to angiosperms. There were previously some conflicting studies about the presence of 24 mers in spruce, but the small RNA seq data from as many as 22 samples in the Nystedt et al. (2013) study showed that 24 mer small RNAs are indeed expressed in spruce but mainly in reproductive tissues. Based on these small RNA seq datasets, Nystedt et al. (2013) also reported 2,719 de novo miRNA gene annotations in spruce using UEA sRNA workbench tools, which is quite big compared to the typical range (~100 – 750) of miRNA annotations generated in diverse plant species listed in miRBase 21.

Xia et al (2015) reanalyzed the small RNA sequencing data (~352 million reads) published in spruce genome paper for de novo discovery of miRNA and phased siRNA loci. Their relatively more stringent pipeline for de novo annotation generated a much smaller set of 585 miRNA loci than previously reported (Nystedt et al. 2013). Nearly half of these miRNA loci were new without any known homologs in miRBase. Subsequent comparison of the mature sequences with ginkgo small RNAs indicated one-third of the new loci may be conifer-specific. The miRNA discovery pipeline included PatMaN for mapping small RNAs to the spruce genome, which were then filtered based on a set of conditions and then passed on to mireap for de novo annotation (Xia et al. 2015). One of these conditions was selecting only 20-22 mer fraction of the mapped small RNAs for miRNA gene discovery, but it was not clearly explained why the 23-24 mer fraction was discarded. Our analysis of Amborella small RNAs indicated that 24 mer miRNAs are expressed in the basal lineage of angiosperm (Amborella genome project, 2013) so I was interested to see if any such long miRNAs are also present in gymnosperms.

Among the Xia et al. (2015)- reported unique miRNA sequences, 21 mers were most predominant as expected (~46%), followed by 22 mers (20%, 119 miRNAs) and the rest were 20 mers. The most striking result was that spruce showed an expanded miR482/2118 superfamily (26 members) with relatively higher mature miRNA sequence diversity (only half of the positions in the consensus sequence was deeply conserved). These members not only triggered phasiRNAs from hundreds of NB-LRR genes (similar to miR482/2118 function in dicots), but also targeted noncoding transcripts in reproductive tissues (similar to grasses). This led to the authors propose a dual function of miR482/2118 family in the gymnosperms. Some of the spruce miR482 precursors also showed strong evidence of evolutionary emergence from NB-LRR genes. The miR390-TAS3 phasiRNA network in spruce also showed high diversity in miR390 target sites (1 -3 sites per transcript) as well as tasiARF regions. The spruce TAS3 gene family was also the largest one (16 genes) identified to-date. These results indicate an extensive network of miRNA and phasiRNAs in spruce, and it will be really interesting to figure out their actual function. The spruce 24 mer siRNAs were not analyzed in detail by Xia et al (2015), and I am curious about the role of these in heretrochromatin silencing in spruce. The authors suggested that the phased siRNAs may have a role in regulating the transposons in spruce genome, but there is no clear evidence of these targeting the transposons. Another interesting point is how the handling of multi-mappers by different tools affects the number of miRNA discovery. It will be great to compare these annotations with de novo annotations from ShortStack, which has a relatively better approach for assigning multi-mappers compared to PatMan, which I believe keeps all possible locations for multi-mapped reads.

References

Xia, Rui, et al. “Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants.” Molecular biology and evolution (2015): msv164.

Nystedt, Björn, et al. “The Norway spruce genome sequence and conifer genome evolution.” Nature 497.7451 (2013): 579-584.