Steven found a new program for simultaneously finding transposable element insertion points as well as their methylation levels called epiTEome and asked me to see how applicable it was to our data. I read through some papers, and it looks like it should work, and give us some interesting information regarding CHH context methylation at transposable element insertion points. I’ve been trying to get it installed on my laptop, but some of the perl modules won’t install correctly. I’ll update this with a guide once it’s behaving properly.
Also, Mackenzie Gavery came by and offered some suggestions about our mapping efficiency issue in Bismark. We apparently have PBAT libraries, that are the complement of the strand of interest. This is obvious, once you know what to look for, in the FastQC report due to the sequence files being G depleted, as opposed to C depleted as one would expect in bisulfite treated samples. I’m trying Bismark again on a single file with the
--non_directional flag, which should output all 4 strand possibilities so we can see if anything changes.
On the Hyak front,
platanus assemble finished after running only overnight. Which is a little disturbing since the last time I ran it, it ran for a week before finally crashing due to lack of memory. I guess that’s the power of 500gb of ram and a pair of Skylake Xeons? Next we move on to the scaffolding step, which will hopefully be as fast!
Finished the Bismark run on that one file, and went from 8% mapping to 28% mapping. Pretty huge increase!
Bismark report for: trimmed-2112_lane1_ACAGTG_L001_R1.fastq (version: v0.16.3) Option '--non_directional' specified: alignments to all strands were being performed (OT, OB, CTOT, CTOB) Bismark was run with Bowtie 2 against the bisulfite genome of /home/srlab/Documents/C-virginica-BSSeq/genome/ with the specified options: -q -N 1 --score-min L,0,-0.2 --ignore-quals Final Alignment report ====================== Sequences analysed in total: 12260444 Number of alignments with a unique best hit from the different alignments: 3439005 Mapping efficiency: 28.0% Sequences with no alignments under any condition: 6508309 Sequences did not map uniquely: 2313130 Sequences which were discarded because genomic sequence could not be extracted: 4 Number of sequences with unique best (first) alignment came from the bowtie output: CT/CT: 194297 ((converted) top strand) CT/GA: 177441 ((converted) bottom strand) GA/CT: 1574633 (complementary to (converted) top strand) GA/GA: 1492630 (complementary to (converted) bottom strand) Final Cytosine Methylation Report ================================= Total number of C's analysed: 72855963 Total methylated C's in CpG context: 15487306 Total methylated C's in CHG context: 5008319 Total methylated C's in CHH context: 15081081 Total methylated C's in Unknown context: 5 Total unmethylated C's in CpG context: 2557484 Total unmethylated C's in CHG context: 14183192 Total unmethylated C's in CHH context: 20538581 Total unmethylated C's in Unknown context: 78 C methylated in CpG context: 85.8% C methylated in CHG context: 26.1% C methylated in CHH context: 42.3% C methylated in Unknown context (CN or CHN): 6.0%