Yaamini’s Notebook: DML Analysis Part 28

Revisiting DML analyses

Perfecting the DML track

Based on this issue, I revised the destranded 5x DML track to show methylation differences instead of a false forward strand indiciation. Turns out I applied mutate on the wrong dataframe in my R Markdown script! My revised track, found here, matched the track Steven generated.

Untitled 2

Figure 1. DML tracks in IGV with 5x and 10x coverage sample tracks.

This is also interesting because Steven used filterByCoverage after processBismarkAln to generate the 5x coverage track, while I used mincov within processBismarkAln. It is interseting to know that we (presumably) get the same DML locations. I decided to go forward with the destranded 5x DML track, since both Claire and Mac used 5x coverage for their analyses.

CpG information

I returnedd to my R Markdown script and got CpG coverage and methylation information for the 5x processed samples.

s9-cpgcoverage

s9-percentmethylation

Figures 2-3. Example CpG coverage and percent methylation plots.

I also generated a correlation plots, full sample methylation clustering, PCA, and screeplots within methylKit.

full-sample-correlation

full-sample-clustering

full-sample-PCA

full-sample-screeplot

Figures 4-7. Full sample plots.

Characterizing DML locations

I revisited my bedtools Jupyter notebook to characterize the location of destranded 5x DML. I reset the variable path name (side note: HOLY FORK SUCH A HANDY TOOL I’M SO GLAD I USED IT) to the new destranded 5x track. I then went through the script and reran all of my DML code. All of the overlap locations can be found here. I also reran my flankBed and closestBed analysis, and saved the output here.

The track had 630 DML, instead of the 1398 in my 3x coverage track. Of the 630 DML, were 335 hypermethylated in the treatment and were 295 hypomethylated. Most of the DML, 576, were located in mRNA coding (genic) regions, with 1474 genes represented. In the genic regions, 388 DML overlapped with exons and 200 overlapped with introns. 61 DML overlapped with transposable elements generated using all species in the database, and 40 DML overlapped with transposable elements generated using C. gigas only. In 1000 bp flanking regions upstream and downstream of mRNA coding regions, 46 DML overlapped with upstream flanks and 50 overlapped with downstream flanks.

Going forward

  1. Conduct a proportion test for DML locations
  2. Describe methylation irrespective of treatment
  3. Update paper methods and results
  4. Work through gene-level analysis
  5. Figure out what’s going on with DMR
  6. Figure out what’s going on with the gene background

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2Ckf8T1
via IFTTT