Sam’s Notebook: Transcriptome Annotation – C.bairdi Trinity Assembly BLASTx on Mox

In preparation for complete transcriptome annotation of the C.bairdi de novo assembly fro 20191218, I needed to run BLASTx. The assembly was BLASTed against the SwissProt database that comes with Trinotate

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=blastx_cbai ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20191224_cbai_blastx_outfmt-11 # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log wd="$(pwd)" timestamp=$(date +%Y%m%d) # Paths to input/output files blastx_out="${wd}/${timestamp}-20191218.C_bairdi.Trinity.fasta.blastx.asn" sp_db="/gscratch/srlab/programs/Trinotate-v3.1.1/admin/uniprot_sprot.pep" trinity_fasta="/gscratch/scrubbed/samwhite/outputs/20191218_cbai_trinity_RNAseq/trinity_out_dir/20191218.C_bairdi.Trinity.fasta" # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastx="${blast_dir}/blastx" threads=28 # Run blastx on Trinity fasta "${blastx}" \ -query "${trinity_fasta}" \ -db "${sp_db}" \ -max_target_seqs 1 \ -outfmt 11 \ -evalue 1e-4 \ -num_threads "${threads}" \ > "${blastx_out}" 

Sam’s Notebook: Transdecoder – C.bairdi De Novo Transcriptome from 20191218 on Mox

Ran Trinity to de novo assemble the C.bairdi RNAseq data we had on 20191218 and now will begin annotating the transcriptome using TransDecoder on Mox.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=transdecoder_cbai ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20191220_cbai_transdecoder # Exit script if a command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log # Set workind directory as current directory wd="$(pwd)" # Capture date as YYYYMMDD timestamp=$(date +%Y%m%d) # Set input file locations and species designation trinity_fasta="/gscratch/srlab/sam/data/C_bairdi/transcriptomes/20191218.C_bairdi.Trinity.fasta" trinity_gene_map="/gscratch/srlab/sam/data/C_bairdi/transcriptomes/20191218.C_bairdi.Trinity.fasta.gene_trans_map" species="cbai" # Capture trinity file name trinity_fasta_name=${trinity_fasta##*/} # Paths to input/output files blastp_out_dir="${wd}/blastp_out" transdecoder_out_dir="${wd}/${trinity_fasta_name}.transdecoder_dir" pfam_out_dir="${wd}/pfam_out" blastp_out="${blastp_out_dir}/${timestamp}.${species}.blastp.outfmt6" pfam_out="${pfam_out_dir}/${timestamp}.${species}.pfam.domtblout" lORFs_pep="${transdecoder_out_dir}/longest_orfs.pep" pfam_db="/gscratch/srlab/programs/Trinotate-v3.1.1/admin/Pfam-A.hmm" sp_db="/gscratch/srlab/programs/Trinotate-v3.1.1/admin/uniprot_sprot.pep" # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastp="${blast_dir}/blastp" hmmer_dir="/gscratch/srlab/programs/hmmer-3.2.1/src" hmmscan="${hmmer_dir}/hmmscan" transdecoder_dir="/gscratch/srlab/programs/TransDecoder-v5.5.0" transdecoder_lORFs="${transdecoder_dir}/TransDecoder.LongOrfs" transdecoder_predict="${transdecoder_dir}/TransDecoder.Predict" # Make output directories mkdir "${blastp_out_dir}" mkdir "${pfam_out_dir}" # Extract long open reading frames "${transdecoder_lORFs}" \ --gene_trans_map "${trinity_gene_map}" \ -t "${trinity_fasta}" # Run blastp on long ORFs "${blastp}" \ -query "${lORFs_pep}" \ -db "${sp_db}" \ -max_target_seqs 1 \ -outfmt 6 \ -evalue 1e-5 \ -num_threads 28 \ > "${blastp_out}" # Run pfam search "${hmmscan}" \ --cpu 28 \ --domtblout "${pfam_out}" \ "${pfam_db}" \ "${lORFs_pep}" # Run Transdecoder with blastp and Pfam results "${transdecoder_predict}" \ -t "${trinity_fasta}" \ --retain_pfam_hits "${pfam_out}" \ --retain_blastp_hits "${blastp_out}" 

Shelly’s Notebook: Thur. Dec. 19, Geoduck Brood Conditioning

Broodstock Experimental Setup

New algae feeding system

Matt set up a new algae feeing system!

  • all parts are listed below under the heading “Equipment received”
  • This is an image of the tank empty uc?export=view&id=15HXsbDojN97A0jxBvWhjLv9VDUducBfT
  • This is an image of the tank with algae being pumped in. The algae flows in by the float valve and is then pumped straight up through the flow meter. Excess algae (not going through the flow meter) gets overflowed back into the tank uc?export=view&id=1gy5i828fz2ryOlOXf9o8DBXY-XRNI-I1
  • The next two images show where the algae goes after it passes the flow meter, up and over into the manifold which feeds the totes uc?export=view&id=1hVcD_8v-dowquTg52s2aNZrZB1BJylvo uc?export=view&id=1kaB-Hh5LKCZ0z_rSwDnOh6LXs_dId2y8

Distributing Broodstock into treatment tanks

  • There were a total of ~120 broodstock that came in on 12/11/19 from Port Gamble. And there were 16 that came in on 10/17/19 (not sure where these came from; these will be used for commercial spawning)
  • Matt and I split out the broodstock randomly into the 4 experimental tanks:
    • B1 (pH7.2)
    • B2 (amb)
    • B4 (amb)
    • B5 (pH7.2)
    • we added 23 healthy looking animals/tank
  • Remaining 12/11/19 animals (~20) were put back in tank B6 to be used in commercial spawning

B1: uc?export=view&id=1XIgjyuVWah3fc4aAUIwgVnAb_qMB2AA7 uc?export=view&id=1yD3Wh4J7fvVZKrAnp9END0-F7SWlyYVP

B2: uc?export=view&id=1be7_Rtq4e0IJrCRmmcMtZ8rVyP0v9_Lf

B4: uc?export=view&id=1_bq1RqRNDIaXS_3ph2EkQtAMqIOl_WsX

B5: uc?export=view&id=1MPiWBTXWTaexwEkaKztty4wfPaloaKIY

B6: uc?export=view&id=1v5XFR9RorU9JQECOpO9jozz5pY0CvU78 uc?export=view&id=1tvXv2PHcKe0pbeSquOju9qzFG7ZH137t

B3 (silos with juveniles from Fall exp): uc?export=view&id=14nDzP-rc3geUYqZt0VoghPczLH_oIjx2

Water chem

FFAR meeting

Equipment received

next steps:

  • check Apex probe calibration
  • run water chem samples from 10/3/2019 and 10/18/2019
  • label broodstock with shellfish tags
  • take hemolymph samples from broodstock?
  • screen, clean and count juveniles in silos from Fall experiment

from shellytrigg https://ift.tt/35N3mwU
via IFTTT

Sam’s Notebook: Transcriptome Assembly – C.bairdi Trimmed RNAseq Using Trinity on Mox

Earlier today, I trimmed our existing C.bairdi RNAseq data, as part of producing generating a transcriptome (per this GitHub issue). After trimming, I performed a de novo assembly using Trinity (v2.9.0) with the stranded library option (--SS_lib_type RF) on Mox.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=trin_cbai ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=30-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20191218_cbai_trinity_RNAseq # Exit script if a command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log # User-defined variables reads_dir=/gscratch/scrubbed/samwhite/outputs/20191218_cbai_fastp_RNAseq_trimming threads=27 assembly_stats=assembly_stats.txt timestamp=$(date +%Y%m%d) fasta_name="${timestamp}.C_bairdi.Trinity.fasta" # Paths to programs trinity_dir="/gscratch/srlab/programs/trinityrnaseq-v2.9.0" samtools="/gscratch/srlab/programs/samtools-1.10/samtools" ## Inititalize arrays R1_array=() R2_array=() # Variables for R1/R2 lists R1_list="" R2_list="" # Create array of fastq R1 files R1_array=(${reads_dir}/*_R1_*.gz) # Create array of fastq R2 files R2_array=(${reads_dir}/*_R2_*.gz) # Create list of fastq files used in analysis ## Uses parameter substitution to strip leading path from filename for fastq in ${reads_dir}/*.gz do echo "${fastq##*/}" >> fastq.list.txt done # Create comma-separated lists of FastQ reads R1_list=$(echo "${R1_array[@]}" | tr " " ",") R2_list=$(echo "${R2_array[@]}" | tr " " ",") # Run Trinity using "stranded" setting (--SS_lib_type) ${trinity_dir}/Trinity \ --seqType fq \ --max_memory 500G \ --CPU ${threads} \ --SS_lib_type RF \ --left "${R1_list}" \ --right "${R2_list}" # Rename generic assembly FastA mv trinity_out_dir/Trinity.fasta trinity_out_dir/${fasta_name} # Assembly stats ${trinity_dir}/util/TrinityStats.pl trinity_out_dir/${fasta_name} \ > ${assembly_stats} # Create gene map files ${trinity_dir}/util/support_scripts/get_Trinity_gene_to_trans_map.pl \ trinity_out_dir/${fasta_name} \ > trinity_out_dir/${fasta_name}.gene_trans_map # Create FastA index ${samtools} faidx \ trinity_out_dir/${fasta_name} 

Sam’s Notebook: Trimming/FastQC/MultiQC – C.bairdi RNAseq FastQ with fastp on Mox

Grace/Steven asked me to generate a de novo transcriptome assembly of our current C.bairdi RNAseq data in this GitHub issue. As part of that, I needed to quality trim the data first. Although I could automate this as part of the transcriptome assembly (Trinity has Trimmomatic built-in), I would be unable to view the post-trimming results until after the assembly was completed. So, I opted to do the trimming step separately, to evaluate the data prior to assembly.

Trimming was performed using fastp (v0.20.0) on Mox.

I used the following Bash script to initiate file transfer to Mox and then call the SBATCH script for trimming:

#!/bin/bash ## Script to transfer C.bairdi RNAseq files and then run SBATCH script for fastp trimming. # Exit script if any command fails set -e # Transfer files rsync -av --progress owl:/volume1/web/nightingales/C_bairdi/*.gz . # Run SBATCH script to begin fastp trimming sbatch 20191218_cbai_fastp_RNAseq_trimming.sh 

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=pgen_fastp_trimming_EPI ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=10-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20191218_cbai_fastp_RNAseq_trimming ### C.bairdi RNAseq trimming using fastp. # This script is called by 20191218_cbai_RNAseq_rsync.sh. That script transfers the FastQ files # to the working directory from: https://owl.fish.washington.edu/nightingales/C_bairdi/ # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log # Set number of CPUs to use threads=27 # Input/output files trimmed_checksums=trimmed_fastq_checksums.md5 # Paths to programs fastp=/gscratch/srlab/programs/fastp-0.20.0/fastp ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() R1_names_array=() R2_names_array=() # Create array of fastq R1 files for fastq in *R1*.gz do fastq_array_R1+=("${fastq}") done # Create array of fastq R2 files for fastq in *R2*.gz do fastq_array_R2+=("${fastq}") done # Create array of sample names ## Uses awk to parse out sample name from filename for R1_fastq in *R1*.gz do R1_names_array+=($(echo "${R1_fastq}" | awk -F"." '{print $1}')) done # Create array of sample names ## Uses awk to parse out sample name from filename for R2_fastq in *R2*.gz do R2_names_array+=($(echo "${R2_fastq}" | awk -F"." '{print $1}')) done # Create list of fastq files used in analysis for fastq in *.gz do echo "${fastq}" >> fastq.list.txt done # Run fastp on files for index in "${!fastq_array_R1[@]}" do timestamp=$(date +%Y%m%d%M%S) R1_sample_name=$(echo "${R1_names_array[index]}") R2_sample_name=$(echo "${R2_names_array[index]}") ${fastp} \ --in1 "${fastq_array_R1[index]}" \ --in2 "${fastq_array_R2[index]}" \ --detect_adapter_for_pe \ --thread ${threads} \ --html "${R1_sample_name}".fastp-trim."${timestamp}".report.html \ --json "${R1_sample_name}".fastp-trim."${timestamp}".report.json \ --out1 "${R1_sample_name}".fastp-trim."${timestamp}".fq.gz \ --out2 "${R2_sample_name}".fastp-trim."${timestamp}".fq.gz # Generate md5 checksums for newly trimmed files { md5sum "${R1_sample_name}".fastp-trim."${timestamp}".fq.gz md5sum "${R2_sample_name}".fastp-trim."${timestamp}".fq.gz } >> "${trimmed_checksums}" # Remove original FastQ files rm "${fastq_array_R1[index]}" "${fastq_array_R2[index]}" done 

Ronit’s Notebook: Identifying Unknown Oyster Sample from Marinelli Shellfish Company (C. gigas vs C. sikamea)

The Marinelli Shellfish Company had an issue with one of their bags of oysters being labelled as Kumamoto oysters (C. sikamea) AND as Pacific Oysters (C. gigas). Obviously, confusion abounded, and ultimately we were tasked with figuring out what the true identity of these mystery oysters were. To do so, DNA was isolated from mantle tissue from the unknown oysters, a sample set of known C. gigas, and a sample set of known C. sikamea. 4 PCR primers were targetting the cytochrome oxidase gene: universal forward and reverse primers (HC02198, LCO1490); reverse primer specific to C. gigas (COCgi269r); and a reverse primer specific to C. sikamea (COCsi546r). Note: this was a multiplex PCR.

Cycling parameters were as follows:

95°C for 10 mins; 30 cycles of 95°C (1 min), 51°C (1 min), 72°C (1 min); 72°C (10 min).

PCR reactions were run on a gel and results are visualized below:

IMG_0493.jpg

The first set of 4 samples (offset by ladders) are the unknown samples; the second set of 4 samples are C. gigas; and the third set of 4 samples are C. sikamea.

Using the GeneRuler DNA Ladder as a guide,

Screen Shot 2019-12-17 at 8.04.15 PM.png

First, we can see that there is a band of approximately 700bp in all samples, indicating that the universal forward and reverse primers did their job (positive primer). Next, we expect to see a band of approximately 260-270 bp in the known C. gigas samples, which we do! Similarly, we also expect to see a band of 550 bp in the known C. sikamea samples, which we also do. (Note: it looks like there is a faint band of 270bp in the C. sikamea samples. Could be a sign of contamination with C. gigas samples?).

In the unknown samples, a prominent band of 270bp is clearly visible, which is what we should see in C. gigas samples, Thus, it seems that these mystery samples are in fact C. gigas. Case closed!

Kaitlyn’s notebook: Geoduck RNA extraction

RNA was isolated with a Quick-DNA/RNA Microprep Plus Kit by ZymoResearch according to the manufacturer’s protocol from geoduck samples .

For hemolymph, 150ul of sample was taken and 600ul of lysis buffer was added for prep, except for sample 27 which contained 120ul of sample and had 48-ul of lysis buffer added. All centrifuge steps were done at 16,000 rpm.

The on-column DNase step was done, and the elution volume is 15ul.

Samples were quantified with the hsRNA Assay for Qubit according to manufacturer’s protocol. 1ul of sample was used and 199ul of working solution was used in each assay tube.

The samples are stored in a box in the -80C freezer in 3, 3, 2, labelled “RNA isolations; geoduck 12/17”.

Sample RNA (ng/ul)
Star 11/15 47.2
Chewy 11/15 92.6
Star 11/21 170
Chewy 11/21 180
1 12.5
2 33.8
40 low
62 4.2
58 low
47 low
53 low
50 low
19 4.6
66 4.2
22 low
29 low
32 low
59 100
63 low
27 134
64 low
Standard 1 38.07
Standard 2 395.24

Samples are located in:

  • Box 8,1,1 (2014): Geo- 27, 31, 29
  • Box 8,1,3 (2015): Geo-  50, 58, 62, 19, 64, 40, 59, 47, 53, 63, 22, 66.
20191217_082151

Box 3,1,1 on left and Box 8,1,3 on right separated by Star and Chewy samples. Samples 1 and 2 and Star and Chewy samples are in box 5,3,1.

Sample Geo-30 H did not exist, although a Geo-30 did but it looked like gonad so I did not extract from it. 20191217_082036.jpg

Notes:

Sample 53 was dark green.