Sam’s Notebook:Assembly Stats – Geoduck Genome Assembly Comparisons w/Quast – SparseAssembler, SuperNova, Hi-C


Steven requested a comparison of geoduck genome assemblies.

Ran the following Quast command:

 /home/sam/software/quast-4.5/ \ -t 24 \ --labels 20180405_sparse_kmer101,supernova_pseudohap_duck4-p,20180421_Hi-C \ /mnt/owl/Athaliana/20180405_sparseassembler_kmer101_geoduck/Contigs.txt \ /mnt/owl//halfshell/bu-mox/analyses/0305b/duck4-p.fasta.gz \ /mnt/owl/Athaliana/20180419_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-21\ 18\:09\:04.514704/PGA_assembly.fasta 

Quast output folder: results_2018_04_30_08_00_42/

Quast report (HTML): results_2018_04_30_08_00_42/report.html


The data’s pretty interesting and cool!

SparseAssembler has over 2x the amount of data (in bas pairs), yet produces the worst assembly.

SuperNova and Hi-C assemblies are very close in nearly all categories. This isn’t surprising, as the SuperNova assembly was used as a reference assembly for the Hi-C assembly.

However, the Hi-C assembly is insanely better than the SuperNova assembly! For example:

  • Largest contig is ~7x larger than the SuperNova assembly.
  • The N50 size is ~243x larger than the SuperNova assembly!!
  • L50 is only 18, 46x smaller than the SuperNova assembly!

This is pretty amazing, honestly. Even more amazing is that this data was sent over to us as some “preliminary” data for us to take a peak at!

from Sam’s Notebook

Sam’s Notebook:Assembly Stats – Geoduck Hi-C Assembly Comparison


Ran the following Quast command to compare the two geoduck assemblies provided to us by Phase Genomics:

 /home/sam/software/quast-4.5/ \ -t 24 \ --labels 20180403_pga,20180421_pga \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-03\ 11\:05\:41.596285/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-21\ 18\:09\:04.514704/PGA_assembly.fasta 

Quast Output folder: results_2018_04_30_11_16_04/

Quast report (HTML): results_2018_04_30_11_16_04/report.html


The two assemblies are nearly identical. Interesting…

from Sam’s Notebook

Yaamini’s Notebook: Gonad Methylation Analysis Part 6

The end of a pipeline test

Yesterday I encountered a gunzip error when aligning sequences with bismark. I opened a issue and documented everything in my lab notebook post. Steven said that I shouldn’t worry about it because I got .bam files! Today, I moved on in my Jupyter notebook to the bismark_methylation_extractor step. I successfully used the following parameters:


Once again, I encountered a gunzip error:


And like last time, I have outputs! You can find them in this folder. I ignored the error and moved on in the pipeline to the bismark2report step.

This step is fairly simple if you don’t want to customize the command. I used the following parameters:

  1. Path to bismark2report
  2. –dir + path to output directory

The reports generated can be found in this folder.

The last part of the pipeline is bismark2summary. I’m not sure how this differs from bismark2report, but I’m gonna use it anyways. It generated a report that can be found as a .txt file and .html report.

The next steps are to understand the outputs and start the full pipeline. I posted this issue to get Steven’s advice.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Grace’s Notebook: April 29, 2018, RNA Isolation and DecaPod S1E6

RNA Isolations

This weekend I did two sets of 9 isolations. I had to do these along with a few more sets in order to get a subset of samples that ALL have quantifiable RNA via the Qubit.

The first set from yesterday was great! All 9 samples (uninfected; ambient) had readable RNA:

Today’s set of nine (uninfected; cold) was mostly good! Just one tube had “Out of range”, and as a result, I will pick a new crab (3 samples) to replace it:


I also edited and published S1Ep6 of DecaPod during which Pam answers a few of my questions on the issue and project. This is part 1 of 2. There were a lot of questions that I had as well as some questions that others in my cohort have asked me that I didn’t know the answer to.

Crab Mtg #3

Crab Mtg #3 is on Thursday. By then I am hoping to have a subset of samples that all have quantifiable RNA, as well as run a couple samples on the Bioananalyzer with Sam.

from Grace’s Lab Notebook

Yaamini’s Notebook: Gonad Methylation Analysis Part 5

When the stars (and sequences) align

I successfully completed my genome preparation in Bismark yesterday, so I started on my sequence alignment. Based on the Bismark manual and Steven’s lab notebook post, I used the following parameters in my Jupyter notebook:

  1. Path to Bismark
  2. -u 10,000: Allows me to subset the first 10,000 reads from each input file
  3. –nondirectional: See this issue
  4. –score_min L,0,-1.2: This was a parameter Steven used to allow for mismatches in alignment
  5. –genome + path to the folder with the .fa genome and bisulfite genome directories Bismark prepared earlier
  6. Path to sequence files for alignment

The Bismark run was completed for each file, but I did end up with a gunzip error for each of my files:


I’m not entirely sure what this means, so I opened a new issue. Once I fix this gunzip problem, I can move on to the methylation extraction and start to analyze all of the .bam files this alignment step produced!

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Laura’s Notebook: Oly temp experiment, 29 days of larvae


Larval production

Larval production data are in! It’s been 1 month of collecting and counting larvae released from four broodstock treatments, held for 3 months at the following, then induced to spawn under common conditions – 18C, food ad libitum.

  • Low food, low temp (~7C)
  • Low food, high temp (~10C)
  • High food, low temp (~7C)
  • High food, high temp (~10C)


The following plot shows cumulative larvae released over time for the four treatment groups, with four replicate spawning buckets per group. Number of released larvae have been normalized by the total oyster volume (measured via volume displacement). Total production does not appear to be influenced by treatment, but timing was. This is consistent with my results from last year’s project; see 2017 spawning charts. It’s interesting to see that, within temperature treatment, the low food groups spawned first, and that the limited feeding levels over winter did not significantly effect larval production.


Here is the total larval production data for each treatment/rep. Note, A1 and A2 (etc.) designate broodstock treatment tank reps 1, and 2 which I subsequently split into four total spawning buckets.

Treatment # Larvae ~# Oysters
A1-1 6,012,828 32
A1-2 3,136,527 34
A2-1 3,712,154 31
A2-2 995,396 28
B1-1 3,694,952 33
B1-2 3,423,074 34
B2-1 4,882,010 36
B2-2 2,952,781 37
C1-1 3,094,754 28
C1-2 2,463,644 23
C2-1 1,775,235 19
C2-2 1,759,988 19
D1-1 3,530,963 34
D1-2 3,388,931 37
D2-1 2,691,880 32
D2-2 3,879,053 35
Total 51,394,172 492

Here is the raw larval production data for a better idea of timing, and cells highlighted in blue indicate dates that I visually observed sperm/spawning activity via froth in the larval collection bucket (as shown in the middle bucket in the below image). On some days a few groups released larvae and spawned on the same day.



Larvae samples for size, lipid

Every day I collected larvae, I preserved samples in the -80. Samples were collected from 3/30-4/6 via killing with ethanol, transferring to 2mL microcentrifuge tube, then rinsing with ethanol to remove seawater, centrifuging for 1 min @ 2000g, and discarded supernatant. Then, I determiend that ethanol may interfere/remove some lipid content, so I altered the protocol to: rinse larvae with only DI water into microcentrifuge tubes, discarding DI supernatant, then into -80 immediately. Larvae are likely still alive upon transferring to freezer; the DI water simply causes them to close and sink, allowing for removal of supernatant. I preserved a total of 155 separate daily larval samples (with 2 or 3 reps for most). Plan is to look at lipid content and shell size.


Larval performance

To assess larval viability, I am rearing 12 families per broodstock treatment, 3 reps per family. I’ve stocked mini-silos with ~800 larvae. A family represents a subsample from one day of larval production from a bucket (must have spawned >100k larvae). Families were not stocked using two sequential release days from the same bucket, since I often observe larvae collected from the same bucket two days in a row (likely due to slow trickle/flow of larvae into catchment buckets). At week 2 post release, I rinse silos with fresh water, then sprinkle 0.5mL 224uM microcultch into each. When postset are 6 weeks old I will count survival and image for size.


Water Changes

Daily water changes are performed for the first 3 weeks for larval rearing mini-silo. Water changes include: collect 1L each diatom (CM) and flagellate (609) from the algae carboys (to ensure algae is void of ciliates), dilute with 6-8L FSW and count subsamples to determine algae concentration. Then, mix 30 gallons FSW + algae mix (determined via the below equation) for a 100,000 cells/mL medium. Fill tripours with 800mL mix, and transfer silos to new medium. At 3 weeks, water changes move to every other day. At week 6 I will move larvae into nylon mesh patches made from 450uM screen.

Volume Algae = (100,000 cells/mL) * (30 gal) / (algae concentration – 100,000 cells/mL)


  • Broodstock gonad sex & stage over time from November 30 -> March 23rd. 8 sampling dates for treatment groups, and 7 dates for Olys collected directly from Mud Bay on same day
  • Broodstock gonad lipid, glycogen content
  • Larval production
  • Larval shell size, upon release
  • Larvae lipid content, upon release
  • Larval survival
  • Post-set size, 6 weeks

img_4210 1

from The Shell Game

Yaamini’s Notebook: Gonad Methylation Analysis Part 4

Preparing a genome (and my brain)

I forgot just how time-consuming (and brain pain-inducing) bioinformatics pipelines are! Good thing I set up my monitor. Screensharing on my laptop screen would not be ideal.

I created a new Jupyter notebook for the Bismark portion of my pipeline. Here’s the basic outline of the Bismark pipeline:

  1. Genome preparation
  2. Alignment
  3. Methylation extraction
  4. HTML report
  5. Summary report

I got started on the genome preparation portion. To do this, I first downloaded the genome from NCBI. I learned that when you download FASTA files from NCBI, it specifies what kind of FASTA it is. In my case, I downloaded a .fna, or nucleotide FASTA file for the genome. I had to convert the .fna to an .fa for Bismark using the handy cp command.

Obviously I needed Bismark and Bowtie2 on the Mac mini before I could use it. This part was obviously strugglesome (see this issue if you don’t believe me). I downloaded Bismark from this website, and Bowtie2 using this code:

conda install -c bioconda bowtie2

It was simple, but Steven suggested downloading the source code in the future to have more control over versioning and the full path.

Finally, I could use the bismark-genome-preparation command. Here’s what I needed:

  1. Path to bismark-genome-preparation
  2. –path-to-bowtie + the path to the folder with bowtie2, and not the path to bowtie2
  3. –verbose, which prints detailed status reports
  4. Path to the folder with the .fa or .fasta genome file

And it’s running! I’ll check on it later tonight to see if I can move on to the alignment step.


// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Yaamini’s Notebook: Gonad Methylation Analysis Part 3

Quality: controlled

Turns out there was a typo in the IP address I was using to connect to genefish. Whoops. Now that I’m in the office, I was able to complete my FastQC analysis! You can see my process in this Jupyter notebook.

On the Mac mini, I located the files I needed from Owl and ran the command-line version of FastQC. The results can be found in this folder. There are two outputs: an .html file, and a .zip file. The .html files can be viewed by dragging the file from Finder into a web browser. To compare FastQC reports, I ran MultiQC. The interactive .html output can be found here. In general, I think my data has relatively good quality and I can proceed to Bismark.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Yaamini’s Notebook: Gonad Methylation Analysis

Time to analyze the C. virginica data

Now that my two papers do not require my constant attention, I can start analyzing the MBDSeq data from the C. virginica project. The goal is to see if experimental ocean acidfication drove differential gonad methylation in adult oysters. This lab notebook entry will outline my plan and link to important information I’ll need down the road.

Sam received the FASTQ files and saved them here. The sample IDs follow numerical order, and are non-directional.

Here’s how I will process these samples:

  1. FastQC I previously used FastQC with some O. lurida transcriptome data, so I can follow the general steps in this Jupyter notebook.
  2. Bismark The purpose of Bismark is to align my sample files with the C. virginica genome, then extract data from methylated areas. I will first test my Bismark pipeline with a subset of one data file. Once I know it works, I will run all my samples.

Now that I know what I’m doing, I should probably do it…

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Yaamini’s Notebook: Gonad Methylation Analysis Part 2

Yaamini vs. Computers

Plan in hand, I went to Roadrunner to start working on my pipeline. Before that, I created a project-virginica-oa repository, and a notebook to detail my pipeline.

However, Roadrunner wasn’t working! Sam had to reboot it a few times. After rebooting, I was unable to open Jupyter Notebook. Sam couldn’t even run any anaconda commands! He rebooted again, but the screen wouldn’t display anything. My struggles are documented in this issue. Steven pointed out that I should transition to the Mac mini in the conference room, so that’s exactly what I did.

But the problems didn’t stop there! In my notebook, I was unable to change the my working directory. In this issue, Sam and Steven pointed out that I need to use exclamation points before commands, and that I shouldn’t comment in the code box itself. By the time I solved all my computer problems, I had to rush to the graduate student town hall. I ensured that I could connect to the Mac mini using the Mac Screen Sharing app, then left. Before I left SAFS, I checked once more that my computer could connect to the Mc mini so I could work from home tonight.

When I got home, my computer could no longer connect to the Mac mini! My guess is that the screen turned off, or someone else tried accessing the computer, preventing me from logging in. So much for starting on my analysis today. I’ll focus on other things and return to this task (hopefully uninterrupted) this weekend.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student