Software

Most of our tools and software are hosted at GitHub: https://github.com/Shao-Group

  • DCJ-SAT is an exact, fast algorithm to compute the DCJ distance between two genomes with duplicate genes, using a SAT formulation. It is available at github.
  • Beaver is an assembler for single-cell RNA-seq data, featuring accurate assembly at single-cell resolution. It is available at github.
  • EquiRep implements an algorithm to detect tandem repeats from error-prone sequences, available at github.
  • TENNIS is an evolution-based model that is also able to predict missing isoforms from an annotation, available at TENNIS.
  • SubseqHash2 implements an improved algorithm to find the smallest subsequence as seed. It is about 10-50 times faster than SubseqHash while preserving the high accuracy. It is available at github.
  • Aletsch is a meta-assembler (i.e., assembling multiple samples/cells), available at github.
  • Anchorage is an assembler for synthetic long reads (SLR) with anchors, available at github.
  • lsb-learn implements an approach to learn LSB functions, available at github.
  • TERRACE is an assembler for circular RNAs, available at github.
  • SubseqHash implements a new seeding algorithm for sequencing data with high error-rate. The tool is available at github.
  • Scallop2 is an improved transcript assembler that is optimized for paired-end RNA-seq data and multi-end RNA-seq data (such as Smart-seq3 data). Software is available at github.
  • Altai is an allele-specific transcript assembler. Software is available at github.
  • rnabridge-denovo implements an efficient algorithm to reconstruct the full sequences of fragments given paired-end RNA-seq reads. Software is available at github.
  • rnabridge-align implements an efficient algorithm to reconstruct the alignments of fragments given the alignments of paired-end RNA-seq reads. Software is available at github.
  • Aletsch is a meta-assembler (i.e., one that can assemble a set of RNA-seq samples). Software is available at github.
  • Scallop-LR is a reference-based transcriptome assembler for PacBio Iso-Seq data. Software is available at github. Manuscript is published at Genome Biology.
  • Scallop is an accurate reference-based transcriptome assembler. Software is available at github. Scallop has been published at Nature Biotechnology. A podcast about Scallop (thanks to Roman Cheplyaka for the interview) is available at both bioinformatics.chat and iTunes.
  • DeepBound presents a new framework to identify boundaries of expressed transcripts from RNA-seq alignments using convolutional neural networks. Software is available at github. DeepBound has been published at Bioinformatics.
  • Catfish implements an efficient algorithm for the flow decomposition problem, the abstracted mathematical formulation for transcript assembly. Software is available at github. Catfish has been published at IEEE/ACM Transactions on Computational Biology and Bioinformatics.
  • SQUID presents an algorithm to identity transcriptomic structural variations from RNA-seq alignments. Software is available at github. SQUID has been published at Genome Biology.
  • GREDU (Genome REarrangements with DUplications) is a software package that implements fast and exact algorithms for five edit distance problems between pairwise genomes with duplicate genes. Released source code is at github. Reference manual is here.