Sean’s Notebook: Restarting Canu

Looks like they took Mox down for a bit last night/this morning for maintenance so our Canu run got bumped. Thankfully it can fail gracefully and restart midway so hopefully it will be done soon.

Ive got my long read samples concatenated and ready for consensus calling, next I need to organize the short read stuff by insert sizes for polishing, and see what additional prep needs to be done for them.

Sean’s Notebook: Canu running, figuring…

Sean’s Notebook: Canu running, figuring out Racon and what it needs next.

So I got Racon, a consensus caller installed, and it looks like it requires mapping position information. The Racon developers suggested Minimap for that purpose. I installed Minimap on Emu so I could run it on the filtered reads while Canu is running on Hyak.

It looks like the general process will be as follows

  • Canu to make preliminary contigs
  • Map Raw reads to contigs with Minimap
  • Consensus call with Racon
  • Repeat the above steps ~ 3 times (that seems to be a common number)
  • Map the raw Illumina reads to the final Racon output with samtools
  • Use Pilon to polish with the mapped .sam files.
  • End with an assembly based on PacBio reads, and finished with Illumina, the opposite of the Platanus/Redundans workflow.

We’re currently on step 1, so this might take a little while…

Sean’s Notebook: Falcon out, Canu in.

I’ve been trying to get Falcon installed on Hyak, or Emu, or even my laptop for a couple days now with no success. There wasn’t much help on the Falcon GitHub, so after doing some reading, it looks like Canu may be an option that is as good, or better than Falcon, so lets give that a whirl!

Canu’s GitHub is here and documentation is here

To install was super simple, just cloned the GitHub repo on to Hyak via git clone https://github.com/marbl/canu.git in to /gscratch/srlab/programs/canu/, changed directory in to /gscratch/srlab/programs/canu/src/ and ran make.

The Canu developers supply a sample assembly data set which can be downloaded via

curl -L -o p6.25x.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq

which I downloaded in to /gscratch/srlab/data/CanuTest.

To run the assembly, I spool up a 4 hour Interactive session (hopefully this is long enough) and run /gscratch/srlab/programs/canu/Linux-amd64/bin/canu -p ecoli -d ecoli-auto genomeSize=4.8m -pacbio-raw p6.25x.fastq .

This did not work, as Canu is built to run on a scheduler system, so it needs the --useGrid=FALSE argument added to the command. After changing that, everything looks like it’s working fine. After I make sure this works, I’ll get to work on the PacBio only assembly for the Oly genome.

Edit: It finished, and looks like it works with the sample data. Now to try it with our Oly PacBio stuff.

Sean’s Notebook: C. virginica MACAU…

Sean’s Notebook: C. virginica MACAU results.

So I finished running MACAU on the C. virginica oil spill samples a couple of different ways. First, I ran all 6 samples, 3 control and 3 oil exposed and secondly, 2 control and 3 oil exposed. The difference being one of the samples had really low read counts from the sequencing facility, and that brought down the number of available loci with a minimum of 10x coverage to look at from ~70,000 to ~17,000

Neither produced many loci that had meaningful priors, the 6 sample run had 4, and the 5 sample had 16. None had meaningful predictor parameters (beta), meaning that oil exposure was not a significant predictor of methylation status. After talking to Steven, this is not surprising, as other recent studies have shown similar lack of differentially methylated regions/loci.

Also, I’ve started revamping the Hyak wiki, trying to chunk it out to make it a little more readable by the average person. It’s on my personal Github wiki at the moment, but I will copy it over once I got some input on formatting and a few things.

Notebook for Methylation stuff: here
Hyak Wiki stuff: here

Sean’s Notebook: My FALCON won’t fly.

I’ve been trying to install FALCON, a PacBio based assembler, and it’s been a huge pain. Mostly typical Hyak permission issues, but also lots of errors with no way to figure out what they mean. LFS error code 32512?

That’s super useful, especially when it doesn’t specify if that’s Git’s LFS, or Lustre File System, or who knows what else (It’s likely the Git LFS option, but they don’t seem to have documented error codes easily accessible).

Going to look through the FALCON repo’s issues and see if there’s anything of use there, then post an issue of my own if not.

In other, more successful notes.

Uploaded more TA data to GitHub. The titratior seems to be behaving better, but still showing a ~40 point swing between beginning and end of day samples.

Finally finished the methylation count files for the C. virginica stuff and will start MACAU chewing on them tonight most likely. Played around with SQLShare way too long, but Tuesday I got a response from their helpdesk and learned that SQLShare converts all input to lower case letters, so if you’re trying to load the Loc column from your dataset, it sees dataset.Loc as dataset.loc, and then gets angry. Gotta use dataset.”Loc”. Even after figuring that out, it was still a giant pain, so I just subsetted my sample count files, and welded them back together at the end.

I started with 10x coverage, and will pare that back to probably 5x for a second run.

Notebook

Count Files

Sean’s Notebook: Oly Genome re-assembly, try 2.

Finished the Redundans run on the Oly genome with a smaller k-mer length, and things look much better!

Full Scaffold stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats scaffolds.fa
stats for scaffolds.fa
sum = 567422454, n = 337054, ave = 1683.48, largest = 74226
N50 = 3492, n = 44086
N60 = 2683, n = 62641
N70 = 1989, n = 87196
N80 = 1355, n = 121603
N90 = 737, n = 177120
N100 = 200, n = 337054
N_count = 30184511
Gaps = 286008

Reduced Scaffold stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats scaffolds.reduced.fa
stats for scaffolds.reduced.fa
sum = 546928670, n = 300593, ave = 1819.50, largest = 78788
N50 = 3852, n = 36149
N60 = 2917, n = 52500
N70 = 2137, n = 74420
N80 = 1455, n = 105359
N90 = 810, n = 155023
N100 = 200, n = 300593
N_count = 5238187
Gaps = 209203

Full Contig stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats contigs.fa
stats for contigs.fa
sum = 1633553496, n = 12395459, ave = 131.79, largest = 23341
N50 = 141, n = 2124062
N60 = 115, n = 3463771
N70 = 92, n = 5003872
N80 = 69, n = 7087338
N90 = 61, n = 9624083
N100 = 58, n = 12395459
N_count = 0
Gaps = 0

Reduced Contig stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats contigs.reduced.fa
stats for contigs.reduced.fa
sum = 532736649, n = 791403, ave = 673.15, largest = 23341
N50 = 940, n = 151868
N60 = 726, n = 216469
N70 = 553, n = 300627
N80 = 408, n = 412850
N90 = 289, n = 568264
N100 = 200, n = 791403
N_count = 0
Gaps = 0

Complete Contig fasta: here

Reduced Contig fasta: here

Complete scaffold fasta: here

Reduced scaffold fasta: here

SLURM execution script: here

Redundans output: here

All data: here

Sean’s Notebook: Calibrating the DNR titration pH probe:

Update: 4 CRMs run, last three were within +/- 10, but still ~ 30 points higher than the reference. It’s at least consistent in the degree off, but I believe that it can be accounted for with an offset. Hooray?

So Micah and I decided last week that there was a fair amount of drift in the pH probe week to week, potentially being one of the causes of the reference sample variation we see. To combat this, starting today we’ll recalibrate the probe at the beginning of each sample day.

Steps to recalibrate probe:

1. Obtain 4/7/10 buffer solutions. These are in stock bottles on the left hand side of the upper shelf in the lab.
2. Fill a sample cup ~1/3rd full with each solution. These don’t need to be weighed, they just need to cover the pH probe bulb completely.
3. Run the Probe Calibration Method on the Titratior. You’ll load the 4.0 buffer, hit ok, then it will run for a while, then load the 7.0 and then the 10.

IMG_2111

IMG_2112

IMG_2113

4. When the calibration has finished it will show a summary screen like:

IMG_2110

and the number we’re after is the second SLOPECAL value, -58.18mV/pH. This is the number that lets us equate the millivolt readings from the pH probe to the pH of the sample. For reference, last week the SLOPECAL value was -58.90mV/pH so a little change, but not much.

5. After we have our slope value, we need to calculate mV values for pH 3.5, where the initial offgas spin starts, and 3.0, where the titration ends. To do this, we just figure out how many pH units we move from 7 (7 has a mV reading of 0) to 3.5, or 3.5 units, and multiply that time the SLOPECAL value, and get 203.63. Then we calculate our end mV reading, a change of 4 units and get 232.72.

6. These numbers we take to the laptop, and go in to the Analysis Tab, then double click on the New TA Titration method. This brings up a graphical flow chart of the entire process, and we want the “Titration (EQP) [1] method. There we want to check the Predispense Potential, and the Termination Potential values. According to Micah we choose the nearest 10s value to our targets, due to granularity in the dispensing protocol, we won’t actually hit exactly 203 or 232. I chose 200, and 230.

IMG_2114

IMG_2115

IMG_2116

IMG_2117

Time to run some CRM’s and see if this helps at all!