Kaitlyn’s notebook: sex/stage plots on primers

Primers by sex & stage:

sex-stage-cqmean

Female Cq.Means by Stage:

female-cqmean

Male Cq.Means by Stage:

male-cqmean

 

Sam’s Notebook: Trimming/FastQC/MultiQC – C.bairdi RNAseq FastQ with fastp on Mox

After receiving our RNAseq data from Genewiz earlier today, needed to run FastQC, trim, check trimmed reads with FastQC.

FastQC on raw reads was run locally and files were kept on owl/nightingales/C_bairdi.

fastp trimming was run on Mox, followed by MultiQC.

FastQC on trimmed reads were run locally, followed by MultiQC.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=cbai_fastp_trimming_RNAseq ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=10-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200318_cbai_RNAseq_fastp_trimming ### C.bairdi RNAseq trimming using fastp. # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log # Set number of CPUs to use threads=27 # Input/output files trimmed_checksums=trimmed_fastq_checksums.md5 raw_reads_dir=/gscratch/scrubbed/samwhite/data/C_bairdi/RNAseq/ # Paths to programs fastp=/gscratch/srlab/programs/fastp-0.20.0/fastp multiqc=/gscratch/srlab/programs/anaconda3/bin/multiqc ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() programs_array=() R1_names_array=() R2_names_array=() # Programs array programs_array=("${fastp}" "${multiqc}") # Capture program options for program in "${!programs_array[@]}" do { echo "Program options for ${programs_array[program]}: " echo "" ${programs_array[program]} -h echo "" echo "" echo "

Kaitlyn’s notebook: boxplots on sex & stage for geoduck primers

All primer pairs by sex and stage:

primers-Cqmean

Cq. mean values based on sex for each primer pair tested on either just pooled or both pooled and known samples. Pooled samples are a combination of males and females across all stages.

stage-Cqmean

Cq. mean values based on reproductive stage for each primer pair tested on either just pooled or both pooled and known samples. Pooled samples are a combination of males and females across all stages.


Primers by Stage:

This slideshow requires JavaScript.

This slideshow requires JavaScript.

Primers by Sex:

sex-Cqmean

Primers tested on known samples and reported Cq.mean values.

sex-peakheight-zeros

A value of 0 was plotted for values not reported.

sex-melttemp-zeros

A value of 0 was plotted for values not reported.

Sam’s Notebook: Data Received – C.bairdi RNAseq Data from Genewiz

We received the RNAseq data from the RNA that was sent out by Grace on 20200212.

Sequencing is 150bp PE.

Grace has a Google Sheet that describes what the samples constitute (e.g. ambient/cold/warm, infected/uninfect, day, etc.)

Genewiz report:

Project Sample ID Barcode Sequence # Reads Yield (Mbases) Mean Quality Score % Bases >= 30
30-343338329 72 ACTCGCTA+TCGACTAG 27,249,335 8,175 34.16 85.82
30-343338329 73 ACTCGCTA+TTCTAGCT 25,856,008 7,757 33.87 84.36
30-343338329 113 ACTCGCTA+CCTAGAGT 31,638,462 9,492 32.38 77.77
30-343338329 118 ACTCGCTA+GCGTAAGA 29,253,455 8,776 33.50 82.62
30-343338329 127 ACTCGCTA+CTATTAAG 27,552,329 8,266 33.14 81.13
30-343338329 132 ACTCGCTA+AAGGCTAT 27,518,702 8,256 34.86 88.87
30-343338329 151 ACTCGCTA+GAGCCTTA 33,430,314 10,029 35.01 89.35
30-343338329 173 ACTCGCTA+TTATGCGA 33,262,459 9,979 34.45 87.06
30-343338329 178 GGAGCTAC+TCGACTAG 29,495,389 8,849 35.01 89.62
30-343338329 221 GGAGCTAC+TTCTAGCT 25,902,415 7,771 34.76 88.40
30-343338329 222 GGAGCTAC+CCTAGAGT 53,808,137 16,142 30.90 71.11
30-343338329 254 GGAGCTAC+GCGTAAGA 16,771,613 5,031 35.14 90.03
30-343338329 272 GGAGCTAC+CTATTAAG 27,818,893 8,346 33.30 81.70
30-343338329 280 GGAGCTAC+AAGGCTAT 61,008,799 18,303 30.85 70.86
30-343338329 294 GGAGCTAC+GAGCCTTA 28,539,233 8,562 35.12 90.04
30-343338329 334 GGAGCTAC+TTATGCGA 25,916,895 7,775 34.98 89.39
30-343338329 349 GCGTAGTA+TCGACTAG 32,868,756 9,861 33.53 82.69
30-343338329 359 GCGTAGTA+TTCTAGCT 27,274,149 8,182 34.96 89.20
30-343338329 425 GCGTAGTA+CCTAGAGT 66,224,932 19,867 29.54 65.13
30-343338329 427 GCGTAGTA+GCGTAAGA 18,918,640 5,676 33.31 80.87
30-343338329 445 GCGTAGTA+CTATTAAG 30,745,388 9,224 33.07 80.83
30-343338329 463 GCGTAGTA+AAGGCTAT 19,531,145 5,859 34.27 86.08
30-343338329 481 GCGTAGTA+GAGCCTTA 50,592,084 15,178 31.92 75.59
30-343338329 485 GCGTAGTA+TTATGCGA 26,010,208 7,803 34.63 87.48

Confirmed that SFTP transfer from Genewiz to owl/nightingales/C_bairdi/ was successful:

screencap of md5sum output

Shelly’s Notebook: Mon. Mar. 16, 2020 Trimming Geoduck RRBS data

This entry is about trimming for the 2016 juvenile geoduck RRBS data Hollie generated.

Multi-core TrimGalore!

TrimGalore! can be run with multi-core settings if you use version 0.6.1 or newer. Reference to the multi-core TrimGalore! update is here: https://github.com/FelixKrueger/TrimGalore/pull/39. Using 8 cores reduced the run time for 100M reads from ~2hr10min to ~30min.

Trimming history of RRBS data

Illumina Recommended Trimming:

  • I spoke with Dina from Illumina tech support today and she found trimming recommendations for the Illumina Truseq Methylation Kit here on page 45: Illumina adapter sequences reference
    • R1 Adapter: AGATCGGAAGAGCACACGTCTGAAC
    • R2 Adapter: AGATCGGAAGAGCGTCGTGTAGGGA
      • The first 13 bases (bolded above) correspond to the universal illumina adapter sequence you can specify in TrimGalore (AGATCGGAAGAGC)
      • There is an additional 12bp added on by the Illumina Truseq Methylation Kit that are recommeneded to be trimmed off
  • Bismark User Guide sections TruSeq DNA-Methylation Kit (formerly EpiGnome) and Random priming and 3’ Trimming in general
  • Illumina Truseq Methylation Kit workflow
  • Adaptor-tagged%20TruSeq%20DNA%20Methylation%20LIbrary%20Kit%20Workflow.png

Testing Recommended Trimming Parameters

TrimGalore! with new parameters

I performed a test on just one sample: EPI-167

(base) [strigg@mox2 raw]$ wget --no-check-certificate https://owl.fish.washington.edu/nightingales/P_generosa/EPI-167_S10_L002_R1_001.fastq.gz --2020-03-16 20:29:13-- https://owl.fish.washington.edu/nightingales/P_generosa/EPI-167_S10_L002_R1_001.fastq.gz Resolving owl.fish.washington.edu (owl.fish.washington.edu)... 128.95.149.83 Connecting to owl.fish.washington.edu (owl.fish.washington.edu)|128.95.149.83|:443... connected. WARNING: cannot verify owl.fish.washington.edu's certificate, issued by ‘/C=US/ST=MI/L=Ann Arbor/O=Internet2/OU=InCommon/CN=InCommon RSA Server CA’: Unable to locally verify the issuer's authority. HTTP request sent, awaiting response... 200 OK Length: 1451174652 (1.4G) [application/x-gzip] Saving to: ‘EPI-167_S10_L002_R1_001.fastq.gz’ 100%[=============================================================================================>] 1,451,174,652 27.6MB/s in 51s 2020-03-16 20:30:04 (26.9 MB/s) - ‘EPI-167_S10_L002_R1_001.fastq.gz’ saved [1451174652/1451174652] (base) [strigg@mox2 raw]$ wget --no-check-certificate https://owl.fish.washington.edu/nightingales/P_generosa/EPI-167_S10_L002_R2_001.fastq.gz --2020-03-16 20:30:08-- https://owl.fish.washington.edu/nightingales/P_generosa/EPI-167_S10_L002_R2_001.fastq.gz Resolving owl.fish.washington.edu (owl.fish.washington.edu)... 128.95.149.83 Connecting to owl.fish.washington.edu (owl.fish.washington.edu)|128.95.149.83|:443... connected. WARNING: cannot verify owl.fish.washington.edu's certificate, issued by ‘/C=US/ST=MI/L=Ann Arbor/O=Internet2/OU=InCommon/CN=InCommon RSA Server CA’: Unable to locally verify the issuer's authority. HTTP request sent, awaiting response... 200 OK Length: 1496018906 (1.4G) [application/x-gzip] Saving to: ‘EPI-167_S10_L002_R2_001.fastq.gz’ 100%[=============================================================================================>] 1,496,018,906 27.2MB/s in 53s 2020-03-16 20:31:01 (27.1 MB/s) - ‘EPI-167_S10_L002_R2_001.fastq.gz’ saved [1496018906/1496018906] 

Alignments with new trimming

  • ran this script 20200316_BmrkAln_EpiTest2.sh
  • NEXT STEPS:
    • check Mbias plots in report
    • check percent methylation
      • previous
      • new trimming
        Trim date 03/16/2020 05/16/2018
        Read pairs analyzed 23436512 24481250
        mapping efficiency (%) 40.9 42.6
        ambiguously mapped read pairs (%) 11.8 8.2
        unaligned read pairs 47.3 49.2
        mC in CpG (%) 25.3 27.9
        mC in CHG (%) 1.7 2.9
        mC in CHH (%) 2.7 3
        mC in CN or CHN (%) 4.9 8.5
    • determine if deduplicating should be done
      • previous report showed 26.85% duplicate alignments were removed – NOTE: previous alignments were done using genome v074. Although there shouldn’t be a difference between this genome and the one on OFS (Panopea-generosa-v1.0.fa), I am currently performing alignment of the 5/16/19 trimmed reads and the 9/23/19 trimmed reads for EPI-167/

from shellytrigg https://ift.tt/33rFOxp
via IFTTT

Kaitlyn’s notebook: Primers next steps/goals

  1. Look at product and compare to theoretical size
    • Must do visually (order based on sequence)
  2. Make plots for each primer w/ pooled
    • Cq.mean
    • Melt temp
    • Melt peak height
  3. Analyze Cq values
    • Sex or development differences (ANOVA?)
      • mean, melt temp and melt peak height by sex
        • And then by dev. stage
  4. Make summary of performance of each primer
    • Via a table (rank performances of each primer w/ grade & notes on grade [3 columns])
      • With pooled sample
      • And known samples

Kaitlyn’s notebook: primers on known geoduck hemolymph samples

Previous qPCR test on pooled samples determined these 4 primers would be run on known samples. Ran qPCR as previously described.

Results from qPCR on known samples:

Two runs had to be done to be able to include all samples.


Results

    • contamination in no template controls (NTC)
      • unknown when contamination could have occurred
        • Water used for NTC seems unlikely since similar amplification is occurring in each sample
        • Possible contamination in primers, master mix, or DNase
          • Tris-EDTA added to lyophilized primers?
            • used 2017 DNase for RT

 

Regardless, there is non-specific amplification which also occurred in the last qPCR run on pooled samples.

The non-specific binding could instead be the presence of leftover DNA from a less effective DNase (since it was from 2017). In order to test this:

    • one RNA sample left (G38) that hasn’t been DNased
      • run qPCR on 4ng RNA from G38 (equivalent to RNA amount run in previous qPCRs)
        • Include NTC to test if primers are contaminated as well

I can dilute my cDNA 1:10 to try to make it last longer which can be seen in amplification image. I used up a substantial portion testing the 4 selected primers during these runs.

Next, Sam suggested I run pooled samples at a gradient of temperatures to identify an ideal temperature with reduced non-specific binding.