Sean’s Notebook: Falcon out, Canu in.

I’ve been trying to get Falcon installed on Hyak, or Emu, or even my laptop for a couple days now with no success. There wasn’t much help on the Falcon GitHub, so after doing some reading, it looks like Canu may be an option that is as good, or better than Falcon, so lets give that a whirl!

Canu’s GitHub is here and documentation is here

To install was super simple, just cloned the GitHub repo on to Hyak via git clone https://github.com/marbl/canu.git in to /gscratch/srlab/programs/canu/, changed directory in to /gscratch/srlab/programs/canu/src/ and ran make.

The Canu developers supply a sample assembly data set which can be downloaded via

curl -L -o p6.25x.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq

which I downloaded in to /gscratch/srlab/data/CanuTest.

To run the assembly, I spool up a 4 hour Interactive session (hopefully this is long enough) and run /gscratch/srlab/programs/canu/Linux-amd64/bin/canu -p ecoli -d ecoli-auto genomeSize=4.8m -pacbio-raw p6.25x.fastq .

This did not work, as Canu is built to run on a scheduler system, so it needs the --useGrid=FALSE argument added to the command. After changing that, everything looks like it’s working fine. After I make sure this works, I’ll get to work on the PacBio only assembly for the Oly genome.

Edit: It finished, and looks like it works with the sample data. Now to try it with our Oly PacBio stuff.