Small RNA-seq analysis

This is just a memorandum of my analysis workflow… investigators are welcome to use below for your own analysis ! :)

(run in our High Performance Computation cluster)

module load fastx_toolkit

fastq_quality_filter -Q33 -q 20 -p 80 -i test.fastq | fastq_quality_trimmer -Q33 -t 20 -l 10 -o test_filtered.fastq #filtering poor quality reads

fastq_to_fasta -r -n -Q33 -i test_filtered.fastq -o test.fa #convert to multi-fasta format

fastx_clipper -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -l 18 -i test.fa -o test_clipped.fa #clip adapter (BioO)

fastx_trimmer -t 4 -i test_clipped.fa -o test_clipped_trimmed.fa #clip 4 bases from 3’ end (BioO introduced random 4 bases in its both adapters)

fastx_trimmer -f 5 -i test_clipped_trimmed.fa -o test_clipped_trimmed2.fa #clip 4 bases from 5’ end (BioO introduced random 4 bases in its both adapters)

#load test_clipped_trimmed2.fa to mirAnalyzer

#read count from mirAnalyzer to be analyzed for differential expression by DESeq

Illumina user group meeting 100214 @NYC

Took a 5am train to go to Manhattan!
Mike Smith I know since I was at Yale gave a beautiful presentation on their Portfolio and below is what I made a note from his talk.

There were a couple of exciting scientific presentations as well!
Worth the trip! :)


Elana Simons-15WGS and RNAseq to find her own disease gene -Science
Balor-VCrome-only 25% found causative mutations
FORGE-264 rare disorders
Ebola-Aug2014 paper 314 SNVs
Coffee genome

1/3 price of HiSeq
first paper-ribosome profiling in S.pombe
NIPT-fetal cell fee DNA (cfDNA) in maternal blood
Oncology-cfDNA non-invasive biomarkers, circulating DNA, circulating tumor cells, exosomes, immune cells

government agencies are using public health genomic epidemiology, food borne pathogen outbreaks, genomic epidemiology…FDA,CDC,etc
Host lifestyle affects human microbiota on daily timescales-sampled their oral and gut micro biome daily for one year, wrote an iOS app to chronicle daily activities
metatranscriptomics of human oral micro biome during health and disease compared OTUs of oral microbiota of health and periodontal disease samples, OTUs present varied across samples
pathogen detection using MiSeq-NEJMpaper-lepto infection found to be causative and treated with penicillin G

TruSight sequencing panels
One,inherited diseases, autism cancer cardiomyopathy tumor myeloid

MiSeq Dx, FDA cleared NGS system

pre-implantation genetic screening
VeriSeq PGS-chromosome anneuploidiity screen
HLA typing coming in 2014!
Forensics-new kit coming, new forensic MiSeq coming too

BaseSpace-Over 30 Apps available
VariantStudio-variants-filter-interpret-report-biological insight (available in BaseSpace)

MEGA consortium, multi-ethnic GWAS exome array
ImmunoChip v2
Ag Consortia

Illumina grant opportunities

You can apply on your own, and win your own reagent!

Good luck!


Agilent technology seminar, Sept 23th, 1-2pm

I am helping Agilent to host a seminar next Tuesday (Sept 23th, 1-2pm).
Josh from Agilent and Kathryn from IPM/Biochem will present NGS applications using Agilent’s products.
Both of them are AWESOME!

Not only that, Agilent will be taking applications from PSU COM investigators to support their research and future grant applications.

Please come join us for more details!


Agilent PSU NGS seminar Sept 23 2014FF

Partnership with GENEWIZ

I’m excited to see that we could announce our official partnership with GENEWIZ for Sanger DNA sequencing and genomics services got kicked off on July 1, 2014.

We keep doing our in-house Sanger service as well, but as we observe greater part of our time has been put in to NGS services, we have decided to ask help for GENEWIZ.

Any questions, plea feel free to contact me or Xuan, she is AWESOME!

Also don’t forget 10 free reactions is for all new customers on campus!


Dear Researcher,

We are excited to announce that GENEWIZ is now partnering with the Penn State College of Medicine for Sanger DNA sequencing and genomics services, effective July 1, 2014.

Working with GENEWIZ, Penn State College of Medicine benefits include, but will not be limited to:

  • Cost Savings: Competitive pricing for Sanger DNA sequencing services.
  • Fast & Reliable Turnaround Time: Results delivered by 5:00 p.m. the day following sample submission.
  • High Quality: Reads from 750-1000 bp with Quality Score & Contiguous Read Length.
  • Superior Support: Easily accessible, award-winning technical support available Monday – Friday, 8:00 a.m. – 8:00 p.m.
  • Free Sample Submission: Free shipping via GENEWIZ Drop Boxes.

Convenient GENEWIZ Drop Box location:

  •  College of Medicine, Basic Science Wing, Room C2705
    Sample pick-up: 5:00 p.m., Monday – Friday

New to GENEWIZ? Register for your new account today, and receive 10 free reactionswith your first Sanger DNA sequencing order!

For more information about GENEWIZ research services, please visit the Penn State College of Medicine Genome Science Facility website or visit If you have any questions or would like assistance registering for a new GENEWIZ account, please contact GENEWIZ Technical Support.

We look forward to helping advance your research!

Kind regards,

Xuan Pan
Sales Operations, Team Leader
Phone: 877-436-3949 ext 3555

Success Story @ BioRadiations

Screenshot 2014-06-18 12.08.41

My experience in detecting genomic variations using BioRad’s droplet digital PCR got featured in BioRadiation journal.

Screenshot 2014-06-18 11.57.19

The title of my article is:

Validation of Allele-Specific Expression Predicted by RNA-Seq in Human Brain Specimens

and I discuss about how you can make good use of digital PCR (especially Bio-Rad’s QX100 ddPCR system! :) I used ddPCR to accurately and directly quantify ASE (allele specific expression, a.k.a. allelic imbalance) after I struggled quite a bit from not getting convincing data by using real-time PCR. The costs and time constraints of ddPCR are negligible compared to other known methods, such as the SNaPshot multiplex system, for accurate ASE detection.

I have also successfully detected gene expression in single cell w/o any pre-amplification. When it comes to sensitivity and accuracy, ddPCR is the way to go! :)