Siwaret Arikit (a,b), Rui Xia (a,b), Atul Kakrana (b), Kun Huang (a,b), Jixian Zhai (a,b), Zhe Yan (c), Oswaldo Valdés-López (d), Silvas Prince (e), Theresa A. Musket (e), Henry T. Nguyen (e), Gary Stacey (c), and Blake C. Meyers (a,b)
a Department of Plant and Soil Sciences, University of Delaware, Newark, Delaware 19711
b Delaware Biotechnology Institute, University of Delaware, Newark, Delaware 19711
c Division of Plant Science, University of Missouri, Columbia, Missouri 65211
d Unidad de Morfologia y Función, FES Iztacala, Universidad Nacional Autónoma de México, Los Reyes Iztacala, Tlalnepantla 54090, Mexico
e National Center for Soybean Biotechnology and Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211
This article is an interesting take on the challenges associated with small RNA annotation from Blake Meyers. Done on a large-scale basis in soybean, this project seeked to classify small RNA loci based on more modern interpretations, making use of both small RNA-seq libraries and degradome PARE sequencing.
First, the authors re-evaluated miRBase-20 genes based on several rules, namely trying to clarify genes that act canonically as miRNAs from ones that don’t. This came in the form of several classes: (1) miRNAs that are weakly expressed but resemble siRNAs, (2) genes that are likely siRNAs, (3) genes that marginally meet the strict definition of miRNA and (4) well characterized and defined miRNAs. 530 plant miRNA aligned to the soybean genome, and fell under the following classifications: (1) 191 weakly expressed, (2) 203 siRNA-like, (3) 15 marginal miRNAs and (4) 121 highly expressed and canonical miRNAs. This breakdown made some of the failings of miRBase pretty apparent, as so few of these genes could be clearly defined as miRNAs in soy. Also, it seems clear that these genes make up a spectrum of classifications, as these classes had to be defined by some seemingly arbitrary cutoffs for strand and abundance ratios. It is a challenge to define these classes. The authors also used these cutoffs to filter and identify new candidate miRNAs, through which they found numerous canonical and novel genes. The mapping procedure used in this study is a bit non-descript, as they just mention using Bowtie to map perfectly matched reads, and filtered out structural RNAs. It looks like they allow multi-mapping reads with up to 20 alignments. I would expect that if this procedure was refined using a method like butter, we might see less ambiguous and weakly expressed miRNAs. Are these erroneous?
Another portion of this article I found interesting was their attempts to identify phasiRNA loci, where they identified 504 loci with a “stringent threshold” for their phasing P-value. Almost all of the found loci overlapped protein coding genes. The intriguing part about their PHAS loci identifications is that they found some non-canonical patterns of phasing from variants of TAS3 loci. These included circumstances that required 3-hits from a miRNA to trigger phasi induction, as well as phasing in a downstream direction. If we have PHAS loci like this in a dataset analyzed by shortstack, in my understanding it should be annotated without a problem… (is this correct?).
The most highly represented group of genes targeted by phasiRNA in soybean encode NB-LRR proteins, which have over 300 members characterized in legumes. The authors cite several hypotheses for why this family is so plentiful as targets, hypothesizing that the phasiRNA act as regulators in the absence of a pathogen trigger, or that this is control over a rapidly expanding gene-family, citing studies by Shivaprasad et al., 2012 and Kallman et al., 2013, respectively. Could it be both? I will have to read some of their cited papers to get more context for phasiRNA gene-regulation.
This paper also has a huge amount of information on tissue specificity of phasi and micro-RNA genes, providing a more complete picture on this regulation in soy. They saw wide diversity in tissue specific small RNA expression, finding several sub-groups of highly specific sRNA genes. Overall, a very interesting article with a large amount of content, making it hard to show all of it here.