Sam’s Notebook: Transcriptome Annotation – Trinotate Hematodinium MEGAN6 Taxonomic-specific Trinity Assembly on Mox

After performing de novo assembly on our Hematodinium MEGAN6 taxonomic-specific RNAseq data on 20200122 and performing BLASTx annotation on 20200123, I continued the annotation process by running Trinotate.

Trinotate will perform functional annotation of the transcriptome assembly, including GO terms and an annotation feature map that can be used in subsequent Trinity-based differential gene expression analysis so that functional annotations are carried downstream through that process.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=trinotate_cbi ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=05-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200126_hemat_trinotate_megan # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log wd="$(pwd)" timestamp=$(date +%Y%m%d) species="hemat" prefix="${timestamp}.${species}.trinotate" ## Paths to input/output files ## New folders for working directory rnammer_out_dir="${wd}/RNAmmer_out" signalp_out_dir="${wd}/signalp_out" tmhmm_out_dir="${wd}/tmhmm_out" # Input files blastp_out="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_transdecoder_megan/blastp_out/20200123.hemat.blastp.outfmt6" blastx_out="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_diamond_blastx_megan/20200122.hemat.megan.Trinity.blastx.outfmt6" pfam_out="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_transdecoder_megan/pfam_out/20200123.hemat.pfam.domtblout" lORFs_pep="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_transdecoder_megan/20200122.hemat.megan.Trinity.fasta.transdecoder_dir/longest_orfs.pep" trinity_fasta="/gscratch/srlab/sam/data/Hematodinium/transcriptomes/20200122.hemat.megan.Trinity.fasta" trinity_gene_map="/gscratch/srlab/sam/data/Hematodinium/transcriptomes/20200122.hemat.megan.Trinity.fasta.gene_trans_map" rnammer_prefix=${trinity_fasta##*/} # Output files rnammer_out="${rnammer_out_dir}/${rnammer_prefix}.rnammer.gff" signalp_out="${signalp_out_dir}/${prefix}.signalp.out" tmhmm_out="${tmhmm_out_dir}/${prefix}.tmhmm.out" trinotate_report="${wd}/${prefix}_annotation_report.txt" # Paths to programs rnammer_dir="/gscratch/srlab/programs/RNAMMER-1.2" rnammer="${rnammer_dir}/rnammer" signalp_dir="/gscratch/srlab/programs/signalp-4.1" signalp="${signalp_dir}/signalp" tmhmm_dir="/gscratch/srlab/programs/tmhmm-2.0c/bin" tmhmm="${tmhmm_dir}/tmhmm" trinotate_dir="/gscratch/srlab/programs/Trinotate-v3.1.1" trinotate="${trinotate_dir}/Trinotate" trinotate_rnammer="${trinotate_dir}/util/rnammer_support/RnammerTranscriptome.pl" trinotate_GO="${trinotate_dir}/util/extract_GO_assignments_from_Trinotate_xls.pl" trinotate_features="${trinotate_dir}/util/Trinotate_get_feature_name_encoding_attributes.pl" trinotate_sqlite_db="Trinotate.sqlite" # Make output directories mkdir "${rnammer_out_dir}" "${signalp_out_dir}" "${tmhmm_out_dir}" # Copy sqlite database template cp ${trinotate_dir}/admin/Trinotate.sqlite . # Run signalp ${signalp} \ -f short \ -n "${signalp_out}" \ ${lORFs_pep} # Run tmHMM ${tmhmm} \ --short \ < ${lORFs_pep} \ > "${tmhmm_out}" # Run RNAmmer cd "${rnammer_out_dir}" || exit ${trinotate_rnammer} \ --transcriptome ${trinity_fasta} \ --path_to_rnammer ${rnammer} cd "${wd}" || exit # Run Trinotate ## Load transcripts and coding regions into database ${trinotate} \ ${trinotate_sqlite_db} \ init \ --gene_trans_map "${trinity_gene_map}" \ --transcript_fasta "${trinity_fasta}" \ --transdecoder_pep "${lORFs_pep}" ## Load BLAST homologies "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_swissprot_blastp \ "${blastp_out}" "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_swissprot_blastx \ "${blastx_out}" ## Load Pfam "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_pfam \ "${pfam_out}" ## Load transmembrane domains "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_tmhmm \ "${tmhmm_out}" ## Load signal peptides "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_signalp \ "${signalp_out}" ## Load RNAmmer "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_rnammer \ "${rnammer_out}" ## Creat annotation report "${trinotate}" \ "${trinotate_sqlite_db}" \ report \ > "${trinotate_report}" # Extract GO terms from annotation report "${trinotate_GO}" \ --Trinotate_xls "${trinotate_report}" \ -G \ --include_ancestral_terms \ > "${prefix}".go_annotations.txt # Make transcript features annotation map "${trinotate_features}" \ "${trinotate_report}" \ > "${prefix}".annotation_feature_map.txt 

Sam’s Notebook: Transcriptome Annotation – Trinotate C.bairdi MEGAN6 Taxonomic-specific Trinity Assembly on Mox

After performing de novo assembly on our Tanner crab MEGAN6 taxonomic-specific RNAseq data on 20200122 and performing BLASTx annotation on 20200123, I continued the annotation process by running Trinotate.

Trinotate will perform functional annotation of the transcriptome assembly, including GO terms and an annotation feature map that can be used in subsequent Trinity-based differential gene expression analysis so that functional annotations are carried downstream through that process.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=trinotate_cbi ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=05-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200126_cbai_trinotate_megan # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log wd="$(pwd)" timestamp=$(date +%Y%m%d) species="cbai" prefix="${timestamp}.${species}.trinotate" ## Paths to input/output files ## New folders for working directory rnammer_out_dir="${wd}/RNAmmer_out" signalp_out_dir="${wd}/signalp_out" tmhmm_out_dir="${wd}/tmhmm_out" # Input files blastp_out="/gscratch/scrubbed/samwhite/outputs/20200123_cbai_transdecoder_megan/blastp_out/20200123.cbai.blastp.outfmt6" blastx_out="/gscratch/scrubbed/samwhite/outputs/20200123_cbai_diamond_blastx_megan/20200122.C_bairdi.megan.Trinity.blastx.outfmt6" pfam_out="/gscratch/scrubbed/samwhite/outputs/20200123_cbai_transdecoder_megan/pfam_out/20200123.cbai.pfam.domtblout" lORFs_pep="/gscratch/scrubbed/samwhite/outputs/20200123_cbai_transdecoder_megan/20200122.C_bairdi.megan.Trinity.fasta.transdecoder_dir/longest_orfs.pep" trinity_fasta="/gscratch/srlab/sam/data/C_bairdi/transcriptomes/20200122.C_bairdi.megan.Trinity.fasta" trinity_gene_map="/gscratch/srlab/sam/data/C_bairdi/transcriptomes/20200122.C_bairdi.megan.Trinity.fasta.gene_trans_map" rnammer_prefix=${trinity_fasta##*/} # Output files rnammer_out="${rnammer_out_dir}/${rnammer_prefix}.rnammer.gff" signalp_out="${signalp_out_dir}/${prefix}.signalp.out" tmhmm_out="${tmhmm_out_dir}/${prefix}.tmhmm.out" trinotate_report="${wd}/${prefix}_annotation_report.txt" # Paths to programs rnammer_dir="/gscratch/srlab/programs/RNAMMER-1.2" rnammer="${rnammer_dir}/rnammer" signalp_dir="/gscratch/srlab/programs/signalp-4.1" signalp="${signalp_dir}/signalp" tmhmm_dir="/gscratch/srlab/programs/tmhmm-2.0c/bin" tmhmm="${tmhmm_dir}/tmhmm" trinotate_dir="/gscratch/srlab/programs/Trinotate-v3.1.1" trinotate="${trinotate_dir}/Trinotate" trinotate_rnammer="${trinotate_dir}/util/rnammer_support/RnammerTranscriptome.pl" trinotate_GO="${trinotate_dir}/util/extract_GO_assignments_from_Trinotate_xls.pl" trinotate_features="${trinotate_dir}/util/Trinotate_get_feature_name_encoding_attributes.pl" trinotate_sqlite_db="Trinotate.sqlite" # Make output directories mkdir "${rnammer_out_dir}" "${signalp_out_dir}" "${tmhmm_out_dir}" # Copy sqlite database template cp ${trinotate_dir}/admin/Trinotate.sqlite . # Run signalp ${signalp} \ -f short \ -n "${signalp_out}" \ ${lORFs_pep} # Run tmHMM ${tmhmm} \ --short \ < ${lORFs_pep} \ > "${tmhmm_out}" # Run RNAmmer cd "${rnammer_out_dir}" || exit ${trinotate_rnammer} \ --transcriptome ${trinity_fasta} \ --path_to_rnammer ${rnammer} cd "${wd}" || exit # Run Trinotate ## Load transcripts and coding regions into database ${trinotate} \ ${trinotate_sqlite_db} \ init \ --gene_trans_map "${trinity_gene_map}" \ --transcript_fasta "${trinity_fasta}" \ --transdecoder_pep "${lORFs_pep}" ## Load BLAST homologies "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_swissprot_blastp \ "${blastp_out}" "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_swissprot_blastx \ "${blastx_out}" ## Load Pfam "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_pfam \ "${pfam_out}" ## Load transmembrane domains "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_tmhmm \ "${tmhmm_out}" ## Load signal peptides "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_signalp \ "${signalp_out}" ## Load RNAmmer "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_rnammer \ "${rnammer_out}" ## Creat annotation report "${trinotate}" \ "${trinotate_sqlite_db}" \ report \ > "${trinotate_report}" # Extract GO terms from annotation report "${trinotate_GO}" \ --Trinotate_xls "${trinotate_report}" \ -G \ --include_ancestral_terms \ > "${prefix}".go_annotations.txt # Make transcript features annotation map "${trinotate_features}" \ "${trinotate_report}" \ > "${prefix}".annotation_feature_map.txt 

Sam’s Notebook: DNA Isolation and Quantification – C.bairdi Hemocyte Pellets in RNAlater

Isolated DNA from 56 samples (see Qubit spreadsheet in “Results” below for sample IDs) using the Quick DNA/RNA Microprep Kit (ZymoResearch; PDF) according to the manufacturer’s protocol for liquids/cells in RNAlater.

These samples were from RNA isolations on the following dates:

Brief rundown of method:

  • Used 35uL from each RNAlater/hemocyte slurry.
  • Mixed with equal volume of H2O (35uL).
  • Retained DNA on the Zymo-Spin IC-XM columns at 4oC for isolation after RNA isolation.
  • DNA was eluted in 15uL H2O

DNA was quantified on the Roberts Lab Qubit 3.0 using the 1x DNA High Sensitivity Assay (Invitrogen), using 1uL of each sample.

Kaitlyn’s notebook: C. baridi hemocyte pellet RNA isolation and quantification

Samples

Isolated RNA from the following hemolymph pellet samples:

  • 58
  • 61
  • 64
  • 72
  • 79
  • 93
  • 94
  • 96
  • 99
  • 110
  • 114
  • 126
  • 140
  • 155
  • 165
  • 173
  • 180
  • 208
  • 252
  • 256

Samples 114 and 126 were previously done successfully, but due to changes in spreadsheet organization, they were repeated accidentally.

The following samples could not be found, all of which have green caps:

  • 6
  • 43
  • 76
  • 130

Samples 58 and 61 were clear (left) in contrast to the cloudy appearance of the other samples (right). 20200127_134701.jpg

RNA Isolations

Isolated RNA using the Quick DNA/RNA Microprep Kit (ZymoResearch) according to the manufacturer’s protocol for liquids/cells in RNAlater.

  • Used 35uL from each RNAlater/hemocyte slurry.
    • Except for sample 58 where all 20ul of sample were used
  • Mixed with equal volume of H2O and 8x lysis buffer
  • Retained DNA on the Zymo-Spin IC-XM columns for isolation after RNA isolation.
    • Held at 4C
  • Performed on-column DNase step.
  • RNA was eluted in 15uL H2O

RNA quantification: HS Assay on Qubit

RNA was quantified on the Roberts Lab Qubit 3.0 using the RNA High Sensitivity Assay (Invitrogen), using 2uL of each sample.

    • For samples 58 and 61 which registered LOW:
      • Added 2ul more for for a total of 4ul to assay tube
        • still too low

Samples are currently in box in front of Rack 9 (all racks in -80C are full), but I will add to the Shellfish RNA boxes!