Sean’s Notebook: Bismark mapping efficiency with Hard trimmed C. virginica sample.

Yesterday Mackenzie Gavery came by and offered some suggestions to increase mapping rates for our Virginica BS-Seq data using Bismark. Her two suggestions were using the –non_directional flag to account for the PBATness of the data, which had a huge effect, and hard trim the first 16 bases in our samples, because they look weird.

I tried everything on a single sample for speed and finished it this morning.

The weirdness


Default Trimmed:

Hard Trimming of the first 16 bases:

That cleans stuff up for sure. Unfortunately it didn’t have much of an effect on mapping rate, bringing us up from 28% to 28.3%. Was worth a shot though!

Final Alignment report
Sequences analysed in total:	12197930
Number of alignments with a unique best hit from the different alignments:	3456338
Mapping efficiency:	28.3%
Sequences with no alignments under any condition:	5760842
Sequences did not map uniquely:	2980750
Sequences which were discarded because genomic sequence could not be extracted:0

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT:	181719	((converted) top strand)
CT/GA:	166362	((converted) bottom strand)
GA/CT:	1588675	(complementary to (converted) top strand)
GA/GA:	1519582	(complementary to (converted) bottom strand)

Final Cytosine Methylation Report
Total number of C's analysed:	61813350

Total methylated C's in CpG context:	12572131
Total methylated C's in CHG context:	4005979
Total methylated C's in CHH context:	12350257
Total methylated C's in Unknown context:	0

Total unmethylated C's in CpG context:	2271987
Total unmethylated C's in CHG context:	12442077
Total unmethylated C's in CHH context:	18170919
Total unmethylated C's in Unknown context:	5

C methylated in CpG context:	84.7%
C methylated in CHG context:	24.4%
C methylated in CHH context:	40.5%
C methylated in Unknown context (CN or CHN):	0.0%

Bismark output files located: here

now time to run the rest of them!