Sean’s Notebook: Trimming and Quality…

Sean’s Notebook: Trimming and Quality Checking EPI-135 and EPI-135WG.

We’re back to playing with Hollie’s Geoduck methylation data, and noticed that there was some wonky nucleotide sequence at the beginning of reads potentially hampering the mapping efficiency downstream. Steven asked me to trim the first 18 nucleotides and last nucleotide from each sequence, and I did that here.

EPI-135 Pre Trimming:
Read 1:

Read 2:

EPI-135 Post Trimming
Read 1:

Read 2:

EPI-135WG Pre Trimming:
Read 1:


EPI-135WG Post Trimming:
Read 1:

Read 2:

It looks like trimming cleaned most everything up, though maybe it could have stood to have an additional base pair or two trimmed off the tail.

I’ve started running Bismark on the outputs to see if this trimming improves mapping rates, hopefully it will. I’m going to try tinkering with the number of allowed mismatches in Bowtie, to see if that helps also.

On the Oly Genome front. I talked with Katherine today, and have a pretty decent idea of a game plan for going forward. It looks like I was under the wrong impression that all the Illumina data was paired end, it looks like the longest insert stuff was actually mate pair, which the Platanus devs say is not ideal for assembling. Oops.

The plan:

1. Finish polishing the Canu assembly with Pilon. Still mapping reads back to the Canu assembly with bowtie.
2. Re-run Platanus using only the PE150 data from Illumina, then scaffold with PE150 and MP50 data. Then throw it in to Redundans **without** the reduction step. Katherine thinks that Redundans may be throwing away too much data due to high heterozygosity, and turning that step off may prevent the loss.
3. Throw the BGI assembly, the Platanus/Redundans Assembly, and the Canu/Pilon assembly in to a meta assembler such as GARM.
4. Have the best assembly ever. Or at least a better assembly.