Sam’s Notebook:Assembly Stats – Geoduck Genome Assembly Comparisons w/Quast – SparseAssembler, SuperNova, Hi-C


Steven requested a comparison of geoduck genome assemblies.

Ran the following Quast command:

 /home/sam/software/quast-4.5/ \ -t 24 \ --labels 20180405_sparse_kmer101,supernova_pseudohap_duck4-p,20180421_Hi-C \ /mnt/owl/Athaliana/20180405_sparseassembler_kmer101_geoduck/Contigs.txt \ /mnt/owl//halfshell/bu-mox/analyses/0305b/duck4-p.fasta.gz \ /mnt/owl/Athaliana/20180419_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-21\ 18\:09\:04.514704/PGA_assembly.fasta 

Quast output folder: results_2018_04_30_08_00_42/

Quast report (HTML): results_2018_04_30_08_00_42/report.html


The data’s pretty interesting and cool!

SparseAssembler has over 2x the amount of data (in bas pairs), yet produces the worst assembly.

SuperNova and Hi-C assemblies are very close in nearly all categories. This isn’t surprising, as the SuperNova assembly was used as a reference assembly for the Hi-C assembly.

However, the Hi-C assembly is insanely better than the SuperNova assembly! For example:

  • Largest contig is ~7x larger than the SuperNova assembly.
  • The N50 size is ~243x larger than the SuperNova assembly!!
  • L50 is only 18, 46x smaller than the SuperNova assembly!

This is pretty amazing, honestly. Even more amazing is that this data was sent over to us as some “preliminary” data for us to take a peak at!

from Sam’s Notebook

Sam’s Notebook:Assembly Stats – Geoduck Hi-C Assembly Comparison


Ran the following Quast command to compare the two geoduck assemblies provided to us by Phase Genomics:

 /home/sam/software/quast-4.5/ \ -t 24 \ --labels 20180403_pga,20180421_pga \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-03\ 11\:05\:41.596285/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-21\ 18\:09\:04.514704/PGA_assembly.fasta 

Quast Output folder: results_2018_04_30_11_16_04/

Quast report (HTML): results_2018_04_30_11_16_04/report.html


The two assemblies are nearly identical. Interesting…

from Sam’s Notebook

Yaamini’s Notebook: Gonad Methylation Analysis Part 6

The end of a pipeline test

Yesterday I encountered a gunzip error when aligning sequences with bismark. I opened a issue and documented everything in my lab notebook post. Steven said that I shouldn’t worry about it because I got .bam files! Today, I moved on in my Jupyter notebook to the bismark_methylation_extractor step. I successfully used the following parameters:


Once again, I encountered a gunzip error:


And like last time, I have outputs! You can find them in this folder. I ignored the error and moved on in the pipeline to the bismark2report step.

This step is fairly simple if you don’t want to customize the command. I used the following parameters:

  1. Path to bismark2report
  2. –dir + path to output directory

The reports generated can be found in this folder.

The last part of the pipeline is bismark2summary. I’m not sure how this differs from bismark2report, but I’m gonna use it anyways. It generated a report that can be found as a .txt file and .html report.

The next steps are to understand the outputs and start the full pipeline. I posted this issue to get Steven’s advice.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Grace’s Notebook: April 29, 2018, RNA Isolation and DecaPod S1E6

RNA Isolations

This weekend I did two sets of 9 isolations. I had to do these along with a few more sets in order to get a subset of samples that ALL have quantifiable RNA via the Qubit.

The first set from yesterday was great! All 9 samples (uninfected; ambient) had readable RNA:

Today’s set of nine (uninfected; cold) was mostly good! Just one tube had “Out of range”, and as a result, I will pick a new crab (3 samples) to replace it:


I also edited and published S1Ep6 of DecaPod during which Pam answers a few of my questions on the issue and project. This is part 1 of 2. There were a lot of questions that I had as well as some questions that others in my cohort have asked me that I didn’t know the answer to.

Crab Mtg #3

Crab Mtg #3 is on Thursday. By then I am hoping to have a subset of samples that all have quantifiable RNA, as well as run a couple samples on the Bioananalyzer with Sam.

from Grace’s Lab Notebook