Sam’s Notebook: Transcriptome Annotation – Geoduck Gonad with BLASTx on Mox

I’ll be annotating the transcriptome assembly (from 20190215) using Trinotate and part of that process is having BLASTx output for the Trinity assembly, so have run BLASTx on Mox.

SBATCH script:

  #!/bin/bash ## Job Name #SBATCH --job-name=blastx_gonad_01 ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190318_blastx_geoduck_gonad_01_RNAseq # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log wd="$(pwd)" # Paths to input/output files blastx_out="${wd}/blastx.outfmt6" sp_db="/gscratch/srlab/programs/Trinotate-v3.1.1/admin/uniprot_sprot.pep" trinity_fasta="/gscratch/scrubbed/samwhite/outputs/20190215_trinity_geoduck_gonad_01_RNAseq/trinity_out_dir/Trinity.fasta" # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastx="${blast_dir}/blastx" # Run blastx on Trinity fasta ${blastx} \ -query ${trinity_fasta} \ -db ${sp_db} \ -max_target_seqs 1 \ -outfmt 6 \ -evalue 1e-3 \ -num_threads 28 \ > ${blastx_out} 

Sam’s Notebook: Transcriptome Annotation – Geoduck Ctenidia with BLASTx on Mox

I’ll be annotating the transcriptome assembly (from 20190215) using Trinotate and part of that process is having BLASTx output for the Trinity assembly, so have run BLASTx on Mox.

SBATCH script:

  #!/bin/bash ## Job Name #SBATCH --job-name=blastx_ctendia ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190318_blastx_geoduck_ctenidia_RNAseq # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log wd="$(pwd)" # Paths to input/output files blastx_out="${wd}/blastx.outfmt6" sp_db="/gscratch/srlab/programs/Trinotate-v3.1.1/admin/uniprot_sprot.pep" trinity_fasta="/gscratch/scrubbed/samwhite/outputs/20190215_trinity_geoduck_ctenidia_RNAseq/trinity_out_dir/Trinity.fasta" # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastx="${blast_dir}/blastx" # Run blastx on Trinity fasta ${blastx} \ -query ${trinity_fasta} \ -db ${sp_db} \ -max_target_seqs 1 \ -outfmt 6 \ -evalue 1e-3 \ -num_threads 28 \ > ${blastx_out} 

Shelly’s Notebook: Thurs. Mar 14, 2019, Oyster Seed Proteomics

Mapping to SR lab GO slim terms

  • I was unable to get simantic similarity for these terms (which is needed in order to relate proteins to one another through their terms) because they don’t map to terms in the goslim_generic.obo file or to the go data that ontologyX parses. So I’m not going to use these.

Cleaning up analysis for poster figs

See analysis “CreateNodeAnd0.5GoSemSimEdgeAttr_ChiSqPval0.1_ASCA_EvalCutoff.R” to make files to import to cytoscape to generate poster figs.

quick summary of code:

make protein node attribute file

  1. read in fold-change and chisq pvalue data
  2. extract proteins that mapped to uniprot DB with an e-value cutoff of 10^-10
  3. select proteins with FDR-corrected ChiSq prop test pvalue < 0.1
  4. read in ASCA data and combine with ChiSq prop test pvalue 0.1 selected proteins to make a comprehensive list of selected proteins with their GO terms
  5. Translate all GO terms of selected proteins to GO slim terms:
  • use GSEA to make a list of BP and MF GO slims in the data
  • use OntologyX to make a list to GO IDs in the data and other GO IDs (ancester, parent, child) that they map to
  • parse out GO slims from the ‘other GO IDs’ column to make a file that contains the original GO ID and the GO slim ID
  • ignore GO slim terms that are too broad (“GO:0008150 GO:0003674 GO:0005575”)
  • 130 unique proteins remain after steps above

make edge attribute file

  1. Calculate GO semantic similarities for all GO slim terms using OntologySimilarity function, and only select GO relationships that have >=0.5 similarity

made GO node attribute file

  1. Calculate magnitude foldchange normalized by number of proteins to use a GO node attribute

loaded data into cytoscape to make poster figs

Should all stat. methods (clustering, ASCA, and Chi sq. proportion test) be used for selecting altered proteins?

Going back to initial protein selection and determine which methods should be used by looking at the differences in protein abunadance between temperatures of the proteins each method identifies.

  • Originally there were some proteins selected (either by ASCA, Clustering, or ChiSq prop test) that don’t appear to show a signifcant fold change between temperatures see fig in ppt. It would be good to see how these proteins were identified as significantly changed and determine if one method should not be used. Refer to VerifyStatsProteinSelection.R for analysis.

I plotted protein abundances as total number of spectra AND as average NSAF values because the Chi sq proportions test was done on total number of spectra and ASCA was done on NSAF values. There’s no reason ASCA or clustering couldn’t be done on total number of spectra, but we did it on average NSAF values per the pipeline developed by Emma.

Protein abundances (total number of spectra) of ASCA selected proteins

Protein abundances (average NSAF values) of ASCA selected proteins

Protein abundances (total number of spectra) of Chi sq pval 0.1 selected proteins

Protein abundances (average NSAF values) of Chi sq pval 0.1 selected proteins

Protein abundances (total number of spectra) of clustering selected proteins

Protein abundances (average NSAF values) of clustering selected proteins

  • interestingly, some average NSAF values show a different pattern over time than total num spectra do. I don’t really know what this means
  • also these plots show the proteins selected seem to show a difference at least on one day in abundances (whether avg NSAF value or TotNumSpec value) between treatments. So maybe something the first time I plotted this in cytoscape, it wasn’t displaying correctly?
  • clustering only identified one protein uniquely so we don’t gain much by including that method, except for the validation of other methods since 13/14 identified by clustering overlap with other methods
    • kaitlyn proposed we could use clustering after ASCA and prop test selection of proteins
      • we could take all proteins, mapped and unmapped and cluster, then assign a cluster number that can be used in network mapping
        • this might be cool for trying to infer functions of unknown proteins:
        • for each cluster, we could calculate the normalized magnitude foldchange and convert from protein to GO term (for unknown proteins, we would assign the same GO terms as are in the cluster and see how including magnitude FC of unknown compounds changes GO networks. This could be too much of a tangent though.

from shellytrigg https://ift.tt/2USdXBa
via IFTTT

Grace’s Notebook: RNeasy Extraction Day 1 on 24 samples

Today I tried out the new plan for extracting RNA. It took quite a long time and none of the 24 samples had detectable RNA. Details in post.

Set up and preparation

The longest part of this whole thing was labeling tubes.

I labeled:

  • 24 RNase-free snap cap tubes (for the 15ul of slurry)
  • 24 QIA shredder columns (cap and side of tube)
  • 24 gDNA columns
  • 24 RNeasy MinElute columns (had to do this right before use because they’re supposed to be cold… but it took forever, so they probably weren’t cold)
  • 24 1.5ml snap cap tubes that contain the eluted RNA

I prepared solutions for 24 samples plus extra:

  • 70% ethanol (7mL ethanol and 3mL DEPC-H2O)
  • 80% ethanol (10mL ethanol and 2.5mL DEPC-H2O)
  • BufferRLT Plus and B-ME (9mL Buffer RLT and 90ul B-ME)

Sampling out slurry

I selected 24 samples –> two from each of the 12 temp treatments/infection status groups

Tube number sample day infection status temp trtmnt
135 9 0 NA
31 9 0 NA
90 9 1 NA
142 9 1 NA
317 12 0 Amb
342 12 0 Amb
352 12 1 Amb
326 12 1 Amb
242 12 0 Cold
236 12 0 Cold
234 12 1 Cold
212 12 1 Cold
285 12 0 Warm
266 12 0 Warm
271 12 1 Warm
291 12 1 Warm
499 26 0 Amb
506 26 0 Amb
468 26 1 Amb
503 26 1 Amb
458 26 0 Cold
419 26 0 Cold
438 26 1 Cold
455 26 1 Cold

img

This part also took a really long time. For one, finding the tubes in the -80 took some time because I did not place them in there in number order.

Additionally, it took a long time to let them thaw, vortex for a few seconds, and then sample out 15ul of the slurry.

Thawed hemolymph slurry:
img

img

Starting protocol finally (1.5hrs for 24 samples)

  1. Added 250ul of Buffer RLT + B-ME (did under hood in 209 because it smells awful)
  2. Vortexed all for a few seconds
  3. Transfered contents to QIA shredder columns (under hood as well because stinky)
  4. Centrifuge 2min full speed (takes a while putting in and taking out 24 tubes)
  5. Transfer flow-through to gDNA eliminator column with 2ml colletion tubes. Centrifuge 30s at full speed. Discard column. Save flow-through. (While this was happening, I was furiously unwrapping and labeling RNeasy MinElute columns… took a long time… samples sat in centrifuge for a few mins…)
  6. Add 350ul 70% ethanol (pipetted individually). Mix by pipetting.
  7. Transfer sample to RNeasy MinElute column. Close lids. Vacuum.
    img
  8. Add 750ul of Buffer RW1 (used repeat pipet- amazing!). Close lid. Vacuum.
  9. Add 500ul of Buffer RPE (used repeat pipet). Close lid. Vacuum.
  10. Add 500ul 80% ethanol (used repeat pipet). I think I miscalculated my volumes in the preparation, because I ran out of it and had to run and make more. Close lid. Vacuum 5mins. (While vacuuming 5mins, I was labeling the 1.5ml snap cap tubes).
  11. Put RNeasy column in new 1.5ml snap cap. Add 14ul RNase free water to center of membrane. Cut off pink lids. Centrifuge for 1min at full speed.

img

Qubit

Made working solution: 5.6mL Buffer for RNA HS + 28ul RNA HS dye

Made standards: 10ul of each + 190ul working solution

Ran 1ul of each sample (added 199ul working solution)

Vortexed all.

ALL TUBES READ “OUT OF RANGE, TOO LOW

-80
Put the hemolymph pellets that I thawed and used 15ul of in -80 (Rack 7, col 2, row 2)
Put eluted “RNA” samples in -80 (Rack 7, col 3, row 1)

from Grace’s Lab Notebook https://ift.tt/2Yb0Pcr
via IFTTT