Sam’s Notebook:Assembly – Geoduck NovaSeq using SparseAssembler (TL;DR – it worked!)


The prior attempt using SparseAssembler failed due to a kmer size that was deemed too large.

For this run, I arbitrarily reduced the kmer size by ~half (k 61) in hopes that this will just get through an assembly. We can potentially explore the effects of kmer size on assemblies if/when this runs and depending no how the assembly looks.

The job was run on our Mox node.

Here’s the batch script to initiate the job:

 #!/bin/bash ## Job Name #SBATCH --job-name=20180313_sparse_assembler_geo_novaseq ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes (We only get 1, so this is fixed) #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=30-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/20180312_SparseAssembler_novaseq_geoduck /gscratch/srlab/programs/SparseAssembler/SparseAssembler LD 0 NodeCovTh 1 EdgeCovTh 0 k 61 g 15 PathCovTh 100 GS 2200000000 i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L002_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L001_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L001_R2_001_val_2_val_2.fastq i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L002_R1_001_val_1_val_1.fastq i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L002_R2_001_val_2_val_2.fastq 

Output folder: 20180312_SparseAssembler_novaseq_geoduck

IT WORKED!!! At last; we have an assembly of the geoduck NovaSeq data!! It took ~10days to complete.

The primary output file of interest is this FASTA file:

In order to get a rough idea of how this assembly looks, I ran it through Version: 4.5, 15ca3b9:

python software/quast-4.5/ \
-t 16

Quast output folder: results_2018_03_22_08_12_12

Here’re the stats on the assembly:

Quast output (text): results_2018_03_22_08_12_12/report.txt

Quast output (HTML):results_2018_03_22_08_12_12/report.html


Overall, the assembly doesn’t look great. The N50 = 645 is really, really low. One would hope for a much large number for a quality assembly. As it stands, this assembly is comprised of many small contigs.

Looks like we’ll have to fiddle with the kmer size used for SparseAssembler and see if we can improve upon this.

Despite that, it’s an accomplishment to finally get any sort of assembler to run to completion for this data set!

from Sam’s Notebook


Grace’s Notebook: Rna Isolation And Dia Continued

RNA Isolation

I isolated RNA from 9 more samples: img

Note that sample 305 is actually labeled as “#4”.

During the isolation, tubes 66 and 26 (infected, first day samples) I wasn’t able to see any obvious pellets like I have been able to in all other tubes.


I am in the process of checking the error rate. A lot of them look good.

Below is an example of one that I’m not 100% sure about, but I am giving it a “0” score because the peak boundaries are off and becuase the peaks are slightly different. So, currently, I am only giving “1” scores to perfect peptides across all samples. Not sure if this is reasonable. img

from Grace’s Lab Notebook

Sam’s Notebook:DNA Isolation & Quantification – Geoduck larvae metagenome filter rinses


Isolated DNA from two of the geoduck hatchery metagenome samples Emma delivered on 20180313 to get an idea of what type of yields we might get from these.

  • MG 5/15 #8
  • MG 5/19 #6

As mentioned in my notebook entry upon receipt of these samples, I’m a bit skeptical will get any sort of recovery, based on sample preservation.

Isolated DNA using DNAzol (MRC, Inc.) in the following manner:

  1. Added 1mL of DNAzol to each sample; mixed by pipetting.
  2. Added 0.5mL of 100% ethanol; mixed by inversion.
  3. Pelleted DNA 5,000g x 5mins @ RT.
  4. Discarded supernatants.
  5. Wash pellets (not visible) with 1mL 75% ethanol by dribbling down side of tubes.
  6. Pelleted DNA 5,000g x 5mins @ RT.
  7. Discarded supernatants and dried pellets for 5mins.
  8. Resuspended DNA in 20uL of Buffer EB (Qiagen).

Samples were quantified using the Roberts Lab Qubit 3.0 with the Qubit High Sensitivity dsDNA Kit (Invitrogen).

5uL of each sample were used.


As expected, both samples did not yield any detectable DNA.

Will discuss with Steven on what should be done with the remaining samples.

from Sam’s Notebook

Laura’s Notebook: Update on Olympia oyster temp/feeding experiment

Olympia oyster experiment is running well out at Manchester. Quick recap:

  • Question: how do temperature & food availability during winter (or during “pre-conditioning”) impact Olympia oyster gonad quality, and subsequent larval survival through metamorphosis?
  • Approach: precondition oysters in 4 groups, high/low temperature (~7C & ~10C) + high/low food availability (need from PSRF!) for 6 weeks and 12 weeks. Monitor gonad for evidence of resorption, maturation, via histology throughout (every 2-3 weeks); also collect and samle oysters from Mud Bay on same schedule to compare gonads in experiment to wild. Condition oysters up to 18C, monitoring gonad development via histology, and freeze 1/2 of gonad for lipid/glycogen/protein content. Spawn, collect & count larvae. Rear larvae in small static silos separated by day (~family). Measure survival to juvenile stage. Preserve subsample of larvae immediately upon collection from broodstock for lipid content (need to know how many/mass is necessary).
  • Side experiment: larval trials! Subject larvae to various stressors, measure mortality rate: salinity, heat, pH – Erin Horken (PSRF) may help with this project.
  • 1,700 Olympia oysters were collected from Mud Bay in Dyes Inlet on November, 6th 2017. The next day they underwent standard intake procedure (scrubbing, rinsing with fresh water, 1-hour freshwater+bleach soak to kill epibionts), then were acclimated to hatchery conditions in flow-through tanks (ask PSRF: feeding rate during this acclimation period?)
  • On November 30th, subsample were sacrificed for histology (n=20) and DNA samples (n=100, including 20 used for histology) (DNA collected for PSRF’s purposes)
  • 1,600 oysters were randomly sorted into 32 bags of 50, volume displacement of each bag was measured.
  • The 32 bags were randomly sorted into 8 50L tanks (4/tank), for the following 4 treatments:
    • A1 & A2: Low food, low temp
    • B1 & B2: Low food, high temp
    • C1 & C2: High food, low temp
    • D1 & D2: High food, high temp
    • Used recirculating chiller/heater to regulate temperature in 50L reservoirs, which then distributed SW to culture tanks.
  • Temperature was gradually (1C/day) adjusted acheive ~7C & ~10C starting on December 1st.
  • Animals were cleaned 3x per week, and checked for morts
  • Half of the animals were removed at week 6 for spawning – PSRF managed this phase. They have been collecting and counting all larvae, but not rearing/monitoring survival.
  • I will manage the 12-week treatment groups.
  • See my Feb. 27th post for more detail on terminating the treatments and moving groups to conditioning/spawning buckets.
  • Below are temperature plots from the Avtech system. I also have HOBO data loggers recording temperature on half of the buckets (one logger per treatment replicate), AND out at Mud Bay.

Temperatures from November 30th -> March 15th, when they reached 18C, the temperature used to induce spawning and rear larvae. The spikes correspond to cleaning events where the probes recorded temperature while they were out of the water


Here’s a closer look at temperatures during the conditioning phase


from LabNotebook

Laura’s Notebook: Assessing larval DNA integrity

Today Sam walked me through the process of using Agilent 2100 Bioanalyzer kit to assess DNA integrity (bp length) via fluorescence signal. Unique aspect of using this kit/analyzer is that it only requires 1ul of sample, at DNA concentrations between 0.5-50 ng/ul.


  • Sam removed kit reagents from fridge to sit at room temperature for 30 minutes
  • Prepared new gel-dye mix as per kit instructions. Prepared mix can be held at 4C for up to 4 weeks. All dye vials should be protected from light.
  • Combined my sample reps into -A vial, e.g. transferred DNA from 1-B and 1-C to vial 1-A.
  • Walked to Seeb lab in the 1st floor of the Marine Science building. Materials I needed: DNA samples, new DNA chip, prepared gel-dye mix, DNA marker & ladder from kit, DI water (for chip wells without my samples), 10ul pipette, 10ul pipette tips.
  • Followed kit instructions – loaded gel, marker, ladder, samples, and DI water onto chip. Samples 1-8 were loaded into the wells of the same number. Wels 9-12 were loaded with DI water.
  • Turned computer on, started DNA 1200 series software. NOTE: computer has login password – make sure I have that for future uses.
  • Inserted chip into analyzer, closed lid carefully.
  • For Assay Selection file, selected dsDNA -> DNA 1200 series 11.xsy
  • For Destination file location, navigated to Roberts Lab folder
  • Selected Start

The software displays data in real-time with fluoresence on y-axis, and time on the x-axis. Smaller DNA fragments “elute” faster, and likewise longer fragments take longer. Each sample well also has small (50bp) and large (17,000) standards/markers. We were hoping my DNA fragments would be long/as intact as possible. Here are some screen-shots of each fluoresence/time plot, along with the software’s calculated bp/concentrations:

Screenshot of all plots; my samples are labeled Sample 1 through Sample 8. Samples 9-12 are just DI water. Samples 1-7 look somewhat consistent, but Sample 8 looks very weird. Note different y-axis range.


Screenshots of the bp/concentration break-down:

Grace’s Notebook: Dia Analysis Re Do

I re-did the DIA analysis from the beginning. This time I only used settings used by Emma in her Skyline document that she sent me.

I think I must have done something better this time around, because a lot of the chromatograms are looking much better. Now I will do the error rate calculation and finally get into the data analysis after that.


Error rate assessment

Skyline doesn’t always do the best job of identifying peaks (where transitions align, meaning that the peptide is real and can be identified). So, error rate has to be calculated.

In order to do this, I will look at ~100 randomly selected peptides and give them a rating of “1” or “0”.

A rating of “1” will only be applied if the following three conditions are met:

  • The peak selected is likely a real peptide (either has an ID and/or the transitions align well)
  • The peak boundaries encompass the peak well
  • The same peak is at about the same retention time and is selected across all replicates (in my case, the four replicates)

A rating of “0” will be given if not all three conditions are met.

I’m not sure if I should figure out a way to identify which condition wasn’t met.


Yaamini’s Notebook: Gonad Histology Update 4

Another classification revision

I met with Brent last week to go through my histology classifications. He recommended that I review the some specimens with the microscope.

Retaking histology images

I read another paper he recomended, Coe 1932, to understand the differences between primary ovogonia and spermatagonia. I read the paper, as well as some guides on tissue types from Carolyn Friedman, then had some dedicated microscope time. Here are the revised classifications and some notes. The classifiation spreadsheet can be found here, and the new images I took can be found here for pre-experiment sampling and here for post-experiment sampling. Shoutout to Grace for helping me set up the iPhone + microscope contraption!

  • Gigas_02: Stage 1 Female. Need to verify that it’s not male. Ovogonia are closer to the walls of the acini, so I’m pretty sure it’s female
  • Gigas_04: Same as above
  • Gigas_05: Looking at the primary sex cells, they’re farther away from the walls of the acini. This may be male? There’s also a chance that I didn’t find any acini and I’m looking at the digestive gland or intestines instead. Need to clarify with Brent.
  • Gigas_06: Stage 1 Female
  • Gigas_07: No acini structure, so Stage 0
  • Gigas_08: Same as above
  • Gigas_10: Same as above
  • Gigas_15: Found some spermatazoa! No acini structure, so it’s a spent oyster. Stage 4 Male!
  • Gigas_18: No acini structure, so Stage 0
  • Gigas_20: Same as above
  • 4-T3: Same as above
  • 5-T3: Same as above
  • 9-T2: Same as above
  • 10-T3: Same as above
  • 12-T6: Spermatazoa but no acini structure. Stage 4 Male
  • UK-03: No acini structure, so Stage 0
  • UK-05: Spermatazoa but no acini structure. Stage 4 Male

Gonad maturation analyses

Not much changed from my previous analysis. Sex is still the only factor that explains differences in maturation. However, I now have different mature and immature classifications between treatments. None of my low pH animals were mature. In my ambient pH treatment, six individuals were immature and four were mature. When I was building my binomial GLM using stepwise regression methods, I got a p-value of 0.03 when I had a Mature ~ Treatment model. However, Sex was a more significant factor, so I added that in first to create my base model. When I used add1 to identify which covariates to add, none of them were significant! Guess I don’t have to change the story in my NSA presentation.

// Please enable JavaScript to view the comments powered by Disqus.