Kaitlyn’s notebook: 20190419 Pt. Whitney geoducks

Broodstock

Tank 6:

Tank 6

Tank 5:

20190419_090053.jpg

Tank 4:

20190419_090105-e1556035508146.jpg

Juveniles overwintered in heath stacks:

20190419_093604

Some signs of buildup/infestation:

  • The first geoduck is alive.
  • The second image contains two shells and a dead geoduck.

20190419_09375020190419_093952.jpg

KCL Priming experiment:

Additionally, Sam worked on an experiment to determine the optimum time for KCL egg priming. These are some important points about the experiment (more info is on Slack):

  • 6 x 10 experiment
  • Screened fertilized eggs (~1500) and took sample 1 hour post-fertilization from screen (concentrated sample).
  • Added eggs into glass bottle (0.3 um filtered seawater)
  • Bottles stored in wine cooler

Sam screening bottles:

20190419_110109.jpg

Bottles were discarded because we did not see many D-hinge larvae:

  • Sam looked and believed he saw some evidence of fertilization (formation of polar bodies) in samples taken 1 hour post-fertilization and will looks through those samples instead.

 

Kaitlyn’s notebook: Geoduck heath tray setup

Elevated header tank:IMG_20190419_085330

Dripper and outflow pipe to heath trays:

IMG_20190419_090149.jpg

Heath stack 1:

  • Red are elevated CO2 lines, blue is ambient.
  • Nozzles on the line control flow into the tray.
  • Boards are covering the top trays.20190419_090944

Elbow pipes on heath trays:

IMG_20190419_091714

Accessing the water lines:

  • It is easy to pull on the water line system when pulling out individual trays. There is access to the back of the trays so that the water line can be removed so the tray can be pulled out.
  • However, this does make the trays stick out into the walkway. We zip-tied poles onto the edge of the stack in front of the elbow pipes to prevent people from directly hitting anything.
  • Also, the elevated water treatment has some bubbles that are generated in the line.

Line from algal header tank:

  • Sam added a thinner line (connected below the green zip tie) to improve flow.
  • The line runs to an algal header tank in the breezeway that is filled in the mornings and feeds everything in this room. The tank runs out in the evening.

20190419_091615

Juvenile geoduck in H1T1:

20190419_141734.jpg

For comparison, here are 5 week old industry geoduck:

20190419_100128.jpg

Interestingly, juveniles were crawling up the side in one industry tank:

20190419_100231.jpg

 

 

Sam’s Notebook: Data Analysis – C.virginica MBD Sequencing Coverage

A while ago, Steven tasked me with assessing some questions related to the sequencing coverage we get doing MBD-BSseq in this GitHub issue. At the heart of it all was really to try to get an idea of how much usable data we actually get when we do an MBD-BSseq project. Yaamini happened to have done an MBD-BSseq project relatively recently, and it’s one she’s actively working on analyzing, so we went with that data set.

Data was initially trimmed:

Subsequently, the data was concatenated, subset, and aligned using Bismark:

Today, I finally found the time to dedicate to evaluating alignment coverage of each of the Bismark sequence subsets. It was done in a Jupyter Notebook and solely with Bash/Python! I used this as project as an excuse to dive into using Python a bit more, instead of using R. For what I needed to accomplish, I just felt like this approach was simpler (instead of creating an R project and all of that).

Sam’s Notebook: Metagenomics Annotation – P.generosa Water Samples Using BLASTn on Mox and KronaTools Visualization to Compare pH Treatments

Nearing the end of this quick metagenomics comparison of taxonomic differences between the two pH treatments (pH=7.1 and pH=8.2). Previously ran:

After this completes, I’ll run KronaTools to get a rundown on taxonomic makeup of these two different pH treatments. I don’t expect BLASTn to take terribly long (based on previous metagenomics runs wit this data set); I’d guess around 6hrs.

SBATCH script (GitHub):

  #!/bin/bash ## Job Name #SBATCH --job-name=blastn_metagenomics ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190416_metagenomics_pgen_blastn # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log wd="$(pwd)" threads=28 # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastn="${blast_dir}/blastn" # Paths to blastdbs blastdb_dir="/gscratch/srlab/blastdbs/ncbi-nr-nt-v5" blast_db="${blastdb_dir}/nt" # Directory with metagenemark FastAs fasta_dir="/gscratch/scrubbed/samwhite/outputs/20190416_metagenomics_pgen_metagenemark" # Export BLAST database directory export BLASTDB=${blastdb_dir} # Loop through metagenemark nucleotide FastAs # Create list of those FastAs for reference # Parse out sample names # Run BLASTn on each FastA for fasta in ${fasta_dir}/*nucleotides.fasta do echo ${fasta} >> input.fasta.list.txt no_ext=${fasta%%.*} sample_name=$(echo ${no_ext##*/}) # Run blastx on Trinity fasta ${blastn} \ -query ${fasta} \ -db ${blast_db} \ -max_target_seqs 1 \ -outfmt "6 std staxids" \ -evalue 1e-10 \ -num_threads ${threads} \ > ${wd}/${sample_name}.blastn.outfmt6 done 

Sam’s Notebook: Metagenomics Gene Prediction – P.generosa Water Samples Using MetaGeneMark on Mox to Compare pH Treatments

Continuing with a relatively quick comparison of pH treatments (pH=7.1 vs. pH=8.2), I wanted to run gene prediction on the MEGAHIT assemblies I made yesterday. I ran MetaGeneMark on the two pH-specific assemblies on Mox. This should be a very fast process (I’m talking, like a couple of minutes fast), so it enhances the annotation with very little effort and time.

This will output a nucleotides FastA, proteins FastA, and a GFF for each of the two assemblies (i.e. pH treatments)

SBATCH script (GitHub):

  #!/bin/bash ## Job Name #SBATCH --job-name=mgm ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=20-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190416_metagenomics_pgen_metagenemark # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log # Programs gmhmmp="/gscratch/srlab/programs/MetaGeneMark_linux_64_3.38/mgm/gmhmmp" mgm_mod="/gscratch/srlab/programs/MetaGeneMark_linux_64_3.38/mgm/MetaGeneMark_v1.mod" samtools="/gscratch/srlab/programs/samtools-1.9/samtools" # Variables assemblies_dir=/gscratch/scrubbed/samwhite/outputs/20190415_metagenomics_pgen_megahit ## Initialize array assemblies_array=() # Populate array with MegaHit FastAs assemblies_array=$(find ${assemblies_dir} -maxdepth 3 -name "*.contigs.fa") # List of files in array printf "%s\n" "${assemblies_array[@]}" >> fastas.list.txt # Loop through array and run MetaGeneMark # Parse out sample name by removing .contigs.fa from filename # and remove path for sample in ${assemblies_array[@]} do no_ext=${sample%%.*} sample_name=$(echo ${no_ext##*/}) # Run MetaGeneMark ## Specifying the following: ### -a : output predicted proteins ### -A : write predicted proteins to designated file ### -d : output predicted nucleotides ### -D : write predicted nucleotides to designated file ### -f 3 : Output format in GFF3 ### -m : Model file (supplied with software) ### -o : write GFF3 to designated file ${gmhmmp} \ -a \ -A ${sample_name}.mgm-proteins.fasta \ -d \ -D ${sample_name}.mgm-nucleotides.fasta \ -f 3 \ -m ${mgm_mod} \ ${sample} \ -o ${sample_name}.mgm.gff done # Index FastAs for fasta in *.fasta do ${samtools} faidx ${fasta} done 

Sam’s Notebook: Metagenome Assemblies – P.generosa Water Samples Trimmed HiSeqX Data Using Megahit on Mox to Compare pH Treatments

A report involving our work on the geoduck water metagenomics is due later this week and our in-depth analysis for this project using Anvi’o is likely to require at least another week to complete. Even though we have a broad overview of the metagenomic taxa present in these water samples, we don’t have data in a format for comparing across samples/treatments. So, I initiated our simplified pipeline (MEGAHIT > MetaGeneMark > BLASTn > KronaTools) for examining our metagenomic data of the two treatments:

  • pH=7.1
  • pH=8.2

I ran MEGAHIT on the trimmed HiSeqX data, but concatenated the corresponding pH treatment FastQ files to create a single assembly for each pH treatment.

SBATCH script (GitHub):

  #!/bin/bash ## Job Name #SBATCH --job-name=megahit ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190415_metagenomics_pgen_megahit # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log # variables wd=$(pwd) fastq_dir=/gscratch/srlab/sam/data/metagenomics/P_generosa megahit=/gscratch/srlab/programs/megahit_v1.1.4_LINUX_CPUONLY_x86_64-bin/megahit bbmap_dir=/gscratch/srlab/programs/bbmap_38.34 samtools=/gscratch/srlab/programs/samtools-1.9/samtools cpus=28 ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() names_array=(pH71 pH82) # Create list of input FastQs used for concatenation # Concatenate all pH7.1 R1 FastQs # Uses parameter substitution to strip path for fastq in ${fastq_dir}/Library_Geoduck_MG_[367]*_R1_*.gz do echo ${fastq#${fastq_dir}} >> fastq.list.pH71.txt cat ${fastq} >> pH71.all.R1.fq.gz done # Create list of input FastQs used for concatenation # Concatenate all pH7.1 R2 FastQs # Uses parameter substitution to strip path for fastq in ${fastq_dir}/Library_Geoduck_MG_[367]*_R2_*.gz do echo ${fastq#${fastq_dir}} >> fastq.list.pH71.txt cat ${fastq} >> pH71.all.R2.fq.gz done # Create list of input FastQs used for concatenation # Concatenate all pH8.2 R1 FastQs # Uses parameter substitution to strip path for fastq in ${fastq_dir}/Library_Geoduck_MG_[125]*_R1_*.gz do echo ${fastq#${fastq_dir}} >> fastq.list.pH82.txt cat ${fastq} >> pH82.all.R1.fq.gz done # Create list of input FastQs used for concatenation # Concatenate all pH8.2 R2 FastQs # Uses parameter substitution to strip path for fastq in ${fastq_dir}/Library_Geoduck_MG_[125]*_R2_*.gz do echo ${fastq#${fastq_dir}} >> fastq.list.pH82.txt cat ${fastq} >> pH82.all.R2.fq.gz done # Populate R1 array with concatenated R1 FastQs for fastq in *R1*.fq.gz do fastq_array_R1+=(${fastq}) done # Populate R2 array with concatenated R2 FastQs for fastq in *R2*.gz do fastq_array_R2+=(${fastq}) done # Loop through samples for sample in ${!names_array[@]} do sample_name=$(echo ${names_array[sample]}) mkdir ${sample_name} && cd ${sample_name} # Run Megahit using paired-end reads ${megahit} \ -1 ${wd}/${fastq_array_R1[sample]} \ -2 ${wd}/${fastq_array_R2[sample]} \ --num-cpu-threads ${cpus} \ --out-prefix ${sample_name} # Create FastA index file ${samtools} faidx megahit_out/${sample_name}.contigs.fa # Determine coverage ## Align reads with BBmap BBwrap ${bbmap_dir}/bbwrap.sh \ ref=megahit_out/${sample_name}.contigs.fa \ in1=${fastq_array_R1[sample]} \ in2=${fastq_array_R2[sample]} \ out=${sample_name}.aln.sam.gz ## Output contig coverage ${bbmap_dir}/pileup.sh \ in=${sample_name}.aln.sam.gz \ out=${sample_name}.coverage.txt # Return to working directory cd ${wd} done 

Sam’s Notebook: FastQC – WGBS Sequencing Data from Genewiz Received 20190408

We received whole genome bisulfite sequencing (WGBS) data from Genewiz last week on 20190408, so ran FastQC on the files on my computer (swoose). FastQC results will be added to Nightingales Google Sheet.

Each set of FastQs were processed with a bash script. This file (ends with .sh) can be found in each corresponding output folder (see below).