Category Archives: Data

Plant microRNAs in miRBase 21

So I’ve been asked to contribute a review article to The Plant Cell on the general subject of microRNA / small RNA evolution in plants.  I set out to make an up to date figure on microRNA annotations across plants based solely on the annotations in miRBase 21. The results were surprising to me.

72 different land plant species are represented in miRBase 21, with a total of 2,247 different miRNA FAMILIES annotated. Note, that is families, not loci. Seemingly a huge diversity of different miRNA sequences.

But, the issue is that many of these annotations are likely to be false positives .. either other types of regulatory small RNAs, or even worse, just degraded garbage. The curators of miRBase, since release 20, have set out to try and make a ‘high-confidence’ list of microRNA loci, based on their internal parsing of public small RNA-seq data. (See their paper in NAR). In miRBase 21, from plants, there are just 176 high-confidence miRNA families (a high-confidence family is a family for which at least one locus has a high-confidence designation).  Furthermore, only 17 of the 72 plant species have ANY high-confidence annotations at all! The scatterplot below illustrates this, as we can see that most species have few annotated families and no high-confidence ones.


Scatterplot of numbers of high-confidence miRNA families from 72 plant species as a function of the total number of annotated families. Data are coded according to broad taxonomic groups. A few species of interest are labeled. Data were processed from miRBase 21.

Of course, there are some caveats here. The biggest is that the ‘high-confidence’ designation is based on whether or not the miRBase folks have analyzed the available high-throughput small RNA-seq data for a given species. In some cases, they may not have (though I haven’t dug into that specifically). The second caveat is that all species are NOT equally treated here. Some species (for instance, Oryza sativa, Arabidopsis thaliana) have high-quality reference genomes and have had lots of experimental attention over the years. Many others have neither of these traits, and so their annotations are necessarily more piecemeal. Overall I think this points out the need clearly for a more uniform approach to retrospective analysis of miRBase annotations.

I was also pleased to see that Physcomitrella patens has a very high percentage of high-confidence miRNA families .. since my lab annotated the vast majority of those families!

— Mike Axtell