Grace’s Notebook: Submitted 6 pooled crab samples to NWGC for QC

Today I submitted 6 pooled samples of Crab RNA to NWGC for QC. After they run QC, they’ll let us know what our sequencing options are.

GitHub Issue: #798

Pooled the samples originally on Nov 22nd, 2019

After pooling, I realized I had to have samples concentrated to NWGC minimum requirement of 50 ng/ul… so I attempted to concentrate them using a kit from Zymo after it arrived in the mail.

Attempt at concentrating was unsuccessful becuase I didn’t do the correct volume of RNA Binding Buffer (post: here).

Sam recommended I contact NWGC and ask if we can still get some sequencing done even though our samples are not at their requirements, which brought us to today!

I put the 6 pooled samples in a plate that was provided by Jeff Weiss at NWGC, and walked them over there.

They will do QC testing, and then let us know what our sequencing options are afterwards.

Here’s manifest of what I submitted today:

Plate Well Location Investigator Sample ID Additional Sample ID Family Number Replacement Sex Date of Birth Organism Race Concentration (ng/uL) Volume (uL) RNA Quality Score Sample Source Type of Sample Suspended In Extraction Method Certified for dbGaP dbGaP ID Investigator Last Name
(Required) (Required) (Required) (Required) (Required) (Required if RNAseq) (Required) (Required) (Required)
A:1 D9_0 Male Chionoecetes bairdi 15.7 33 crab hemolymph RNA TE Zymo Research: Quick-DNA/RNA Microprep Plus Kit Crandall
B:1 D9_1 Male Chionoecetes bairdi 17.6 33 crab hemolymph RNA TE Zymo Research: Quick-DNA/RNA Microprep Plus Kit Crandall
C:1 D12_cold_0 Male Chionoecetes bairdi 24.9 33 crab hemolymph RNA TE Zymo Research: Quick-DNA/RNA Microprep Plus Kit Crandall
D:1 D12_cold_1 Male Chionoecetes bairdi 26 33 crab hemolymph RNA TE Zymo Research: Quick-DNA/RNA Microprep Plus Kit Crandall
E:1 D12_warm_0 Male Chionoecetes bairdi 26 33 crab hemolymph RNA TE Zymo Research: Quick-DNA/RNA Microprep Plus Kit Crandall
F:1 D12_warm_1 Male Chionoecetes bairdi 24.4 33 crab hemolymph RNA TE Zymo Research: Quick-DNA/RNA Microprep Plus Kit Crandall

from Grace’s Lab Notebook

Laura’s Notebook: February 2020 goals

Yikes, it’s been a few months …

  • Finish QuantSeq libraries – last step is to process some deployed juvnile Olys (RNA isolation, library prep)
  • Coordinate sequencing – UW or UMinnesota?
  • DMG and DMR analysis on Oly methylation data. Make sure that I’ve controlled for genotype (i.e. differences aren’t due to presence/absence of certain genes/loci) – does the filtering accomplish this?
  • Prepare and deliver presentation at Aquaculture America 2020
  • Revise Oly Temp/Food draft, and rough draft of introduction
  • Submit Polydora paper to Aquaculture Research
  • Meet OA/Reproduction deadlines

Also … Met with Krista – we are a go on the internship. I will get my hands on the data as soon as it’s ready (April?). She supports me doing the NSF INTERN in Fall/Winter, so should continue pursuing that. Need to have full analysis of data by November at latest.

from The Shell Game

Sam’s Notebook: Data Wrangling – Arthropoda and Alveolata Day and Treatment Taxonomic RNAseq FastQ Extractions

After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200114, I had then extracted taxonomic-specific reads and aggregated each into basic Read 1 and Read 2 FastQs to simplify transcriptome assembly for C.bairdi and for Hematodinium. That was fine and all, but wasn’t fully thought through.

For gene expression analysis, I need the FastQs based on infection status and sample days. So, I need to modify the read extraction procedure to parse reads based on those conditions. I could’ve/should’ve done this originally, as I could’ve just assembled the transcriptome from the FastQs I’m going to generate now. Oh well.

As a reminder, the reason I’m doing this is that I realized that the FastA headers were incomplete and did not distinguish between paired reads. Here’s an example:

R1 FastQ header:

@A00147:37:HG2WLDMXX:1:1101:5303:1000 1:N:0:AGGCGAAG+AGGCGAAG

R2 FastQ header:

@A00147:37:HG2WLDMXX:1:1101:5303:1000 2:N:0:AGGCGAAG+AGGCGAAG

However, the reads extracted via MEGAN have FastA headers like this:

>A00147:37:HG2WLDMXX:1:1101:5303:1000 SEQUENCE1 >A00147:37:HG2WLDMXX:1:1101:5303:1000 SEQUENCE2 

Those are a set of paired reads, but there’s no way to distinguish between R1/R2. This may not be an issue, but I’m not sure how downstream programs (i.e. Trinity) will handle duplicate FastA IDs as inputs. To avoid any headaches, I’ve decided to parse out the corresponding FastQ reads which have the full header info.

Anyway, here’s a brief rundown of the approach:

  1. Create list of unique read headers from MEGAN6 FastA files.
  2. Use list with seqtk program to pull out corresponding FastQ reads from the trimmed FastQ R1 and R2 files.

The entire procedure is documented in a Jupyter Notebook below.

Jupyter notebook (GitHub):

Laura’s Notebook: Oly DMG analysis, Jan. 30th, 2020

Today I identified 46 differentially methylated genes among two Olympia oyster populations, Hood Canal and South Sound. This was performed using a binomial GLM and Chi-square tests. The script was adapted from Hollie Putnam’s script (/hputnam/Geoduck_Meth/master/RAnalysis/Scripts/GM.Rmd), which may have been adopted from the Lieu et al. 2018 paper .

The analysis was performed in a RMarkdown notebook, please see that here: 09-DMG-analysis

Here are the GO terms associated with genes of known function. Some notes:
– 18 out of the 46 genes were annotated with GO terms
– 9 out of the 46 genes were annotated but did not have associated GO terms (may have to find those manually …)
– 19 out of the 46 genes were of unknown function

term ID description frequency pin? log10 p-value uniqueness dispensability
GO:0006468 protein phosphorylation 4.137 % -3.7877 0.40 0.00
GO:0006807 nitrogen compound metabolic process 38.744 % -2.2764 0.78 0.03
GO:0006207 ‘de novo’ pyrimidine nucleobase biosynthetic process 0.192 % -2.2764 0.46 0.06
GO:0006281 DNA repair 2.234 % -2.4853 0.50 0.20
GO:0006030 chitin metabolic process 0.077 % -1.6311 0.49 0.21
GO:0006520 cellular amino acid metabolic process 5.591 % -2.2764 0.42 0.35
GO:0006412 translation 5.686 % -2.4853 0.28 0.55
GO:0016567 protein ubiquitination 0.523 % -1.4336 0.44 0.56


from The Shell Game

Yaamini’s Notebook: Figuring Out ATAC-Seq

Looking at labwork and sequencing options

Previously, I hashed out a plan that involved doing ATAC-Seq with C. gigas tissues. Shelly helped me get some answers on what ATAC-Seq entails from some experts, and put those resources in this issue. The consensus is that:

  • frozen tissue is difficult to work with
  • the protocol will need to be optimized several times for different tissue types
  • it would be easier to optimize a protocol with live tissue

I shadowed Mac today when she worked on her embryo trial! Since there’s a potential that I can do scATAC-Seq, I would need to learn how to dissociate cells. In talking to her, it seems like I’ll need to isolate nuclei no matter what I end up doing, so that’s a better method to look into first. Mac send me a nuclei isolation protocol that I’ll review and figure out how to translate to frozen C. gigas tissues.

Going forward

  1. Determine what kits to use for RNA and DNA extractions and order necessary materials
  2. Test RNA extraction protocol with tissue in histology blocks
  3. Start processing frozen tissues
  4. Extract DNA and RNA from larvae
  5. Identify an ATAC-Seq protocol and start testing it
  6. Figure out what to do with C. virginica sperm and other potential samples

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Laura’s Notebook: Oly methylation analysis, Jan. 29th 2020

From the last meeting with Katherine and Steven, these were my tasks:

  • Re-do MACAU with pre-filtered count files <— DONE
  • Reannontate new MACAU results <— DONE
  • Reannonate DMLs <— DONE (unless I need to re-do DML filter settings)
  • DMG analysis
  • Figure out something comparable to Fst on DMLs and MACAU to do a correlation analysis
  • GO_MWU analysis redo with genes:
  • New methylation distance matrix to include just loci with new 75% threshold
  • Get manhattan distance for just DMLs
  • Get methods down on paper ASAP

Sam’s Notebook: Transcriptome Annotation – Trinotate Hematodinium MEGAN6 Taxonomic-specific Trinity Assembly on Mox

After performing de novo assembly on our Hematodinium MEGAN6 taxonomic-specific RNAseq data on 20200122 and performing BLASTx annotation on 20200123, I continued the annotation process by running Trinotate.

Trinotate will perform functional annotation of the transcriptome assembly, including GO terms and an annotation feature map that can be used in subsequent Trinity-based differential gene expression analysis so that functional annotations are carried downstream through that process.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=trinotate_cbi ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=05-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200126_hemat_trinotate_megan # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log wd="$(pwd)" timestamp=$(date +%Y%m%d) species="hemat" prefix="${timestamp}.${species}.trinotate" ## Paths to input/output files ## New folders for working directory rnammer_out_dir="${wd}/RNAmmer_out" signalp_out_dir="${wd}/signalp_out" tmhmm_out_dir="${wd}/tmhmm_out" # Input files blastp_out="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_transdecoder_megan/blastp_out/20200123.hemat.blastp.outfmt6" blastx_out="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_diamond_blastx_megan/20200122.hemat.megan.Trinity.blastx.outfmt6" pfam_out="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_transdecoder_megan/pfam_out/20200123.hemat.pfam.domtblout" lORFs_pep="/gscratch/scrubbed/samwhite/outputs/20200123_hemat_transdecoder_megan/20200122.hemat.megan.Trinity.fasta.transdecoder_dir/longest_orfs.pep" trinity_fasta="/gscratch/srlab/sam/data/Hematodinium/transcriptomes/20200122.hemat.megan.Trinity.fasta" trinity_gene_map="/gscratch/srlab/sam/data/Hematodinium/transcriptomes/20200122.hemat.megan.Trinity.fasta.gene_trans_map" rnammer_prefix=${trinity_fasta##*/} # Output files rnammer_out="${rnammer_out_dir}/${rnammer_prefix}.rnammer.gff" signalp_out="${signalp_out_dir}/${prefix}.signalp.out" tmhmm_out="${tmhmm_out_dir}/${prefix}.tmhmm.out" trinotate_report="${wd}/${prefix}_annotation_report.txt" # Paths to programs rnammer_dir="/gscratch/srlab/programs/RNAMMER-1.2" rnammer="${rnammer_dir}/rnammer" signalp_dir="/gscratch/srlab/programs/signalp-4.1" signalp="${signalp_dir}/signalp" tmhmm_dir="/gscratch/srlab/programs/tmhmm-2.0c/bin" tmhmm="${tmhmm_dir}/tmhmm" trinotate_dir="/gscratch/srlab/programs/Trinotate-v3.1.1" trinotate="${trinotate_dir}/Trinotate" trinotate_rnammer="${trinotate_dir}/util/rnammer_support/" trinotate_GO="${trinotate_dir}/util/" trinotate_features="${trinotate_dir}/util/" trinotate_sqlite_db="Trinotate.sqlite" # Make output directories mkdir "${rnammer_out_dir}" "${signalp_out_dir}" "${tmhmm_out_dir}" # Copy sqlite database template cp ${trinotate_dir}/admin/Trinotate.sqlite . # Run signalp ${signalp} \ -f short \ -n "${signalp_out}" \ ${lORFs_pep} # Run tmHMM ${tmhmm} \ --short \ < ${lORFs_pep} \ > "${tmhmm_out}" # Run RNAmmer cd "${rnammer_out_dir}" || exit ${trinotate_rnammer} \ --transcriptome ${trinity_fasta} \ --path_to_rnammer ${rnammer} cd "${wd}" || exit # Run Trinotate ## Load transcripts and coding regions into database ${trinotate} \ ${trinotate_sqlite_db} \ init \ --gene_trans_map "${trinity_gene_map}" \ --transcript_fasta "${trinity_fasta}" \ --transdecoder_pep "${lORFs_pep}" ## Load BLAST homologies "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_swissprot_blastp \ "${blastp_out}" "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_swissprot_blastx \ "${blastx_out}" ## Load Pfam "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_pfam \ "${pfam_out}" ## Load transmembrane domains "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_tmhmm \ "${tmhmm_out}" ## Load signal peptides "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_signalp \ "${signalp_out}" ## Load RNAmmer "${trinotate}" \ "${trinotate_sqlite_db}" \ LOAD_rnammer \ "${rnammer_out}" ## Creat annotation report "${trinotate}" \ "${trinotate_sqlite_db}" \ report \ > "${trinotate_report}" # Extract GO terms from annotation report "${trinotate_GO}" \ --Trinotate_xls "${trinotate_report}" \ -G \ --include_ancestral_terms \ > "${prefix}".go_annotations.txt # Make transcript features annotation map "${trinotate_features}" \ "${trinotate_report}" \ > "${prefix}".annotation_feature_map.txt