Grace’s Notebook: June 23, 2017

Today I scrounged around for missing data from Heare et. al paper (Evidence of Ostrea lurida Carpenter, 1864 population structure in Puget Sound, WA).

Found it!

Am now going through all the revisions and making sure that they were addressed. Sending the final version to Brent and Steven today, and will resubmit by Monday!



Sean’s Notebook: BWA-Meth Output for EPI-135.

Found a different methylation aligner, BWA-meth, that’s based on a Burrows Wheeler aligner that’s supposed to deal better with gap alignment than Bowtie2. Fired it up with the EPI-135 and 10k Geoduck genome. It gave an answer, but I *really* don’t believe it. The bamtools stats output I believe claimed 80% mapping rate. Compared to the 6% from Bismark.

sean@emu:~/Documents/Geoduck_Rerun/bwatest$ bamtools stats -in bwa-meth.bam

Stats for BAM file(s): 

Total reads:       55253237
Mapped reads:      48098666	(87.0513%)
Forward strand:    30884874	(55.8969%)
Reverse strand:    24368363	(44.1031%)
Failed QC:         16970759	(30.7145%)
Duplicates:        0	(0%)
Paired-end reads:  55253237	(100%)
'Proper-pairs':    30054258	(54.3937%)
Both pairs mapped: 47463686	(85.9021%)
Read 1:            27626448
Read 2:            27626789
Singletons:        634980	(1.14922%)

.bam file is available: here

Sean’s Notebook: Starting GARM meta-assembly…

Sean’s Notebook: Starting GARM meta-assembly of PacBio and BGI assemblies for Olys.

The Pilon polishing for the CANU assembly finished last night, and it seems pretty small, but hopefully between that, the BGI assembly, and the Platanus assembly, we can assemble ourselves up one decent genome. Fingers crossed at least.

Assembly stats on the Polished assembly:

D-69-91-159-59:BGI_Oly_Genome Sean$ assembly-stats oly_polished_.fasta 
stats for oly_polished_.fasta
sum = 46364927, n = 3388, ave = 13685.04, largest = 61211
N50 = 14126, n = 1230
N60 = 12962, n = 1573
N70 = 11906, n = 1947
N80 = 10932, n = 2352
N90 = 9590, n = 2803
N100 = 2074, n = 3388
N_count = 1
Gaps = 1

1 gap is interesting, but with the assembly size being at least 1, of not two orders of magnitude smaller than the expected genome size, I think we’re short on coverage to allow for conservative error correction levels. Will have to reassemble with looser standards and see if we can bump it up.

Polished CANU assembly found: here

Pilon output file: here

Next step is GARM, to see what that gives us. I think I’ll also re-assemble the PacBio stuff with much less stringent error correction to see if that gives any measurable difference in outputs.

Edit: Also, I finished the –non-directional runs for Bismark, no change in mapping rates and less than 1% complementary mapping, so it looks like the regular arguments are correct. Output .bam files are found here with the NonDir tag.