Sean’s Notebook: My FALCON won’t fly.

I’ve been trying to install FALCON, a PacBio based assembler, and it’s been a huge pain. Mostly typical Hyak permission issues, but also lots of errors with no way to figure out what they mean. LFS error code 32512?

That’s super useful, especially when it doesn’t specify if that’s Git’s LFS, or Lustre File System, or who knows what else (It’s likely the Git LFS option, but they don’t seem to have documented error codes easily accessible).

Going to look through the FALCON repo’s issues and see if there’s anything of use there, then post an issue of my own if not.

In other, more successful notes.

Uploaded more TA data to GitHub. The titratior seems to be behaving better, but still showing a ~40 point swing between beginning and end of day samples.

Finally finished the methylation count files for the C. virginica stuff and will start MACAU chewing on them tonight most likely. Played around with SQLShare way too long, but Tuesday I got a response from their helpdesk and learned that SQLShare converts all input to lower case letters, so if you’re trying to load the Loc column from your dataset, it sees dataset.Loc as dataset.loc, and then gets angry. Gotta use dataset.”Loc”. Even after figuring that out, it was still a giant pain, so I just subsetted my sample count files, and welded them back together at the end.

I started with 10x coverage, and will pare that back to probably 5x for a second run.

Notebook

Count Files

Sean’s Notebook: Oly Genome re-assembly, try 2.

Finished the Redundans run on the Oly genome with a smaller k-mer length, and things look much better!

Full Scaffold stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats scaffolds.fa
stats for scaffolds.fa
sum = 567422454, n = 337054, ave = 1683.48, largest = 74226
N50 = 3492, n = 44086
N60 = 2683, n = 62641
N70 = 1989, n = 87196
N80 = 1355, n = 121603
N90 = 737, n = 177120
N100 = 200, n = 337054
N_count = 30184511
Gaps = 286008

Reduced Scaffold stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats scaffolds.reduced.fa
stats for scaffolds.reduced.fa
sum = 546928670, n = 300593, ave = 1819.50, largest = 78788
N50 = 3852, n = 36149
N60 = 2917, n = 52500
N70 = 2137, n = 74420
N80 = 1455, n = 105359
N90 = 810, n = 155023
N100 = 200, n = 300593
N_count = 5238187
Gaps = 209203

Full Contig stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats contigs.fa
stats for contigs.fa
sum = 1633553496, n = 12395459, ave = 131.79, largest = 23341
N50 = 141, n = 2124062
N60 = 115, n = 3463771
N70 = 92, n = 5003872
N80 = 69, n = 7087338
N90 = 61, n = 9624083
N100 = 58, n = 12395459
N_count = 0
Gaps = 0

Reduced Contig stats:

D-173-250-161-130:NewOlyAssembly Sean$ assembly-stats contigs.reduced.fa
stats for contigs.reduced.fa
sum = 532736649, n = 791403, ave = 673.15, largest = 23341
N50 = 940, n = 151868
N60 = 726, n = 216469
N70 = 553, n = 300627
N80 = 408, n = 412850
N90 = 289, n = 568264
N100 = 200, n = 791403
N_count = 0
Gaps = 0

Complete Contig fasta: here

Reduced Contig fasta: here

Complete scaffold fasta: here

Reduced scaffold fasta: here

SLURM execution script: here

Redundans output: here

All data: here

Kaitlyn’s Notebook: Unique Proteins that Appeared

I parsed out proteins from Rhonda’s data that initially had 0 abundance on day 1, but later had some measurable abundance for at least 1 day in the experiment. I ran the list of proteins I identified for each silo through CompGO. Many of the proteins did not have associated GO terms which was disappointing since some of those proteins were very uniquely abundant in the experiment.

I recorded this in a jupyter notebook entry.