DNA/RNA QC
|
DNA/RNA QC – High Sensitivity
|
DNA/RNA QC
|
DNA/RNA QC – High Sensitivity
|
Windows users:
\\smb.files.pennstatehealth.net\cores\ResultsDropoffPickup\GenomeSciences\Results
If you are accessing from Mac, there are 3 different possible ways to access your Core folder from your MacBook determined by HOW you login to your Macbook.
1. You login with a local computer account NOT associated with your epass id.
2. You login with an epass account and your Macbook is part of the PennStateHealth domain.*
3. You login with an epass account and your Macbook is part of the Hersheymed domain.*
Please proceed to the appropriate section for your situation:
—————————————-Scenario 1—————————————-
In Finder, select Go…Connect to Server.
Enter:
smb://smb.files.pennstatehealth.net/cores
When prompted, enter your epass id and password.
—————————————-Scenario 2—————————————-
In Finder, select Go…Connect to Server.
Enter:
smb://smb.files.pennstatehealth.net/cores
You should not be prompted, and your lab folder should connect automatically.
—————————————-Scenario 3—————————————-
In Finder, select Go…Connect to Server.
Enter:
smb://*:*@smb.files.pennstatehealth.net/cores
When prompted, enter your epass in this format: (reference screenshot on next page)
<epass>@adds.pennstatehealth.net and then your current epass password.
Since we often get asked, here is a quick reference for MiSeq (we have two), NextSeq (UP Core has one), and NovaSeq (we have one).
flow cell type | read# | read length | |||
MiSeq Reagent Kit v2 | 12-15 million | 1 × 36 bp | 2 × 25 bp | 2 × 150 bp | 2 × 250 bp |
MiSeq Reagent Kit v3 | 22–25 million | 2 × 75 bp | 2 × 300 bp | ||
MiSeq Reagent Kit v2 Micro | 4 million | 2 × 150 bp | |||
MiSeq Reagent Kit v2 Nano | 1 million | 2 × 150 bp | 2 × 250 bp | ||
NextSeq 550 System Mid-Output Kit* | 130 Million | 2 × 75 bp | 2 × 150 bp | ||
NextSeq 550 System High-Output Kit* | 400 Million | 1 × 75 bp | 2 × 75 bp | 2 × 150 bp | |
NovaSeq SP | 650–800 Million | 2 × 50 bp | 2 × 100 bp | 2 × 150 bp | 2 x 250 bp |
NovaSeq S1 | 1.3–1.6 Billion | 2 × 50 bp | 2 × 100 bp | 2 × 150 bp | |
NovaSeq S2 | 3.3 B–4.1 Billion | 2 × 50 bp | 2 × 100 bp | 2 × 150 bp | |
NovaSeq S4 | 8-10 Billion | 1 × 35 bp | 2 × 100 bp | 2 × 150 bp | |
*: available at UP Genomics Core |
Since I get asked often but the official workflow is not made available in our iLab portal yet, please find below description handy to plan your 10X Genomics’s single-cell RNA-seq projects!
The library prep cost can be found in the iLab:
10x Genomics Library Prep
|
initiate request
$1,713.00 (Internal) |
For RNA-seq we sequence 20K reads per sample for up to 10K cells per the prep, so ~200M reads by NovaSeq 100 cycles will also need to be requested.
NovaSeq_SP ( 100 cycles – SR100, PE 2×50 ) (per flow cell)
|
initiate request
$2,751.38 (Internal) |
is the smallest flowcell you will likely be using and it can hold ~750M reads. Even you occupy 400M reads, we will have to find someone else to fill up the rest of 350M. Depending on how soon you would like the sequence goes, you may have to pay more $ to fill up the flow cell.
This is what I just shared with my staff! Hope this helps to everyone working in the NGS field!
Above is a must have for working on the bead cleanup! The more bead you have (actually the bead itself doesn’t matter but the volume of PEG/NaCl that buffers the bead matters but I am just describing the PEG/NaCl buffer as ‘bead’ here) the smaller DNA you can bind. 1x is good to get rid of primer dimers (<100bp) which is mostly what we have to get rid of after PCR. But sometimes we also see adapter dimers which usually size around 120bp. To get rid of that, we have to lower the bead amount to 0.8x or 0.9x. The lower the volume of beads you add, the more you will lose your target fragment as well, so I change between 0.8x and 0.9x depending on the amount of your target (the more you have the more aggressive (i.e. the lower bead) you can go).0.6x binds things >400bp so this is often used to get rid of bigger fragment in ‘dual size selection’ setup. The supernatant after 0.6x contains <400bp and to there we add *final* 1x or 0.8x of beads to bind the <400bp fragment. 0.6x-1x dual size selection captures 100-400bp fragments and 0.6x-0.8x captures 200-400bp.e.g. 0.6x-1x dual size selection: Prepare your DNA in 100ul Tris (or Elution buffer from BioO etc), add 60ul of bead, wait 15min and transfer 160ul supernatant to a new well. Add 40ul of beads to this supernatant and wait 15min, pellet/wash the beads by EtOH and elute the DNA in x ul of Tris. You can cut everything to half if you want but you have to be careful watching the beads dried up as 20ul of beads can get dried up fast (unless you use 2x concentrated beads :).When we are sequencing long inserts e.g. 2X150, the dual size selection can be tweaked to 0.5x-0.7x to enrich bigger fragments.
https://sites.psu.edu/yuka/useful-resources…-hershey-library/
Our new library is doing a great job!!!
-Yuka
Total of 7 sessions have been running this spring! This is part of my education project namely EIBDG (Education Initiative in Big Data Genomics). Since this was my 1st sessions, I invited only existing Core users, but in the future sessions (coming back in fall!) I will call any level of participants! Only that the space is limited so I will limit participants by first come first served basis.
date | contents | |
NAT1 | 3/3/17 | useful unix commands |
NAT2 | 3/24/17 | FastQC, Fastq filtering, RNA-seq alignment, Cufflinks |
NAT3 | 4/7/17 | smRNA-seq, how to transfer files, RNA–seq; manipulating more on.bam; QC Picard, visualizing in IGV, UCSC track, subfractioning |
NAT4 | 4/21/17 | Reviewing NAT1-3… More analysis by R; PCA, heatmap, etc, differential gene expression (other than Cufflinks) |
NAT5 | 5/10/17 (Wed) in HG318 | Exome alignment, variant calling, variant annotation |
NAT6 | 6/2/17 (Fri) in C2610 | SRA, dbGap, TCGA, optional: GSEA |
NAT7 | 6/9/17 | ChIP-seq alignment, peak calling, differential binding analysis |
Some bioinformatic tips I shared with my client:
The task is splitting .bam files by each chromosome (1-22, X, and Y for humans). I referred this site:
https://www.biostars.org/p/9130/
but what if you have many .bam files you have in a directory? We had ~40 .bams to start with so you would definitely want to run the job in one loop command instead of running the above 40 times manually.
This is what it worked:
#split bams
for file in *_chr.bam; do filename=`echo $file | cut -d “.” -f 1`; for chrom in `seq 1 22` X Y; do samtools view -bh $file chr${chrom} > ${filename}_${chrom}.bam; done; done
#make index file
for file in *_chr_*.bam; do samtools index $file; done
The result is 24 .bam files + 24 .bai files X ~40 samples = about 1,500 files were created overnight! 🙂
-Yuka
is saved in here for Hershey users to grab it!
Next session is scheduled on Nov 17 (Thu) at 4pm!
Thank you,
Yuka
This is just a memorandum of my analysis workflow… investigators are welcome to use below for your own analysis ! 🙂
(run in our High Performance Computation cluster)
module load fastx_toolkit
fastq_quality_filter -Q33 -q 20 -p 80 -i test.fastq | fastq_quality_trimmer -Q33 -t 20 -l 10 -o test_filtered.fastq #filtering poor quality reads
fastq_to_fasta -r -n -Q33 -i test_filtered.fastq -o test.fa #convert to multi-fasta format
fastx_clipper -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -l 18 -i test.fa -o test_clipped.fa #clip adapter (BioO)
fastx_trimmer -t 4 -i test_clipped.fa -o test_clipped_trimmed.fa #clip 4 bases from 3’ end (BioO introduced random 4 bases in its both adapters)
fastx_trimmer -f 5 -i test_clipped_trimmed.fa -o test_clipped_trimmed2.fa #clip 4 bases from 5’ end (BioO introduced random 4 bases in its both adapters)
#load test_clipped_trimmed2.fa to mirAnalyzer
#read count from mirAnalyzer to be analyzed for differential expression by DESeq