Yaamini’s Notebook: DML Analysis Part 13

IN PROGRESS: Different mincov values in methylKit

Using this R Markdown file, I tested the effect of different mincov values on sample clustering and DMLs produced. After dicsussing methods in this issue, I went through this process with both Steven’s samples and my own samples.

Steven’s samples

All of my output from this analysis can be found here. Below are some highlights:

Figures 1-3. Percent CpG coverage for all samples using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

Figures 4-6. Percent CpG methylation for all samples using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

Figures 7-9. Full sample CpG methylation clustering using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

*Figures 10-12 PCA of full sample methylation using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

I also wrote out differentially methylated loci that were at least 50% different between my treatment and control for mincov = 1, mincov = 3, and mincov = 5. I haven’t dug into what the exact differences are between these files, but there are at least differences in the number of DMLs produced.

Table 1. The mincov metric, total number of loci produced, and the number of DMLs that were at least 50% different between treatment andc control samples. More restrictive mincov metrics produced less significantly different DMLs.

mincov Total Loci Number of Significantly Different DMLs
1 1112085 4904
3 670301 1398
5 503780 816

One thing that was concerning about the pipeline is that I kept getting this error:

 glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred  

My samples

I went through the bismark pipeline in my Jupyter notebook to get my deduplicated and sorted files. Initially I tried using bismark_methylation_extractor, but I was unable to extract methylation data for all files before genefish ran out of space (again…RIP). I moved all my large files to gannet and decided it probably wasn’t worth extracting the methylation data from genefish since I already have the pipeline running on Mox. If I have some downtime, I can always change the code so I’m running bismark_methylation_extractor from gannet.

All output from methylKit testing for my samples can be found here.

Figures 13-15. Percent CpG coverage for all samples using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

Figures 16-18. Percent CpG methylation for all samples using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

Figures 19-21. Full sample CpG methylation clustering using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

*Figures 22-24 PCA of full sample methylation using a) mincov = 1 b) mincov = 3 or c) mincov = 5.

/**

  • RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
  • LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://ift.tt/1PPNtdx/ / var disqus_config = function () { this.page.url = PAGE_URL; // Replace PAGE_URL with your page’s canonical URL variable this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page’s unique identifier variable }; */ (function() { // DON’T EDIT BELOW THIS LINE var d = document, s = d.createElement(‘script’); s.src = ‘https://the-responsible-grad-student.disqus.com/embed.js’; s.setAttribute(‘data-timestamp’, +new Date()); (d.head || d.body).appendChild(s); })(); </script>

Please enable JavaScript to view the <a href=“https://disqus.com/?ref_noscript”>comments powered by Disqus.</a>

http://“//the-responsible-grad-student.disqus.com/count.js”

from the responsible grad student https://ift.tt/2PBSgCW
via IFTTT