Quantifying proportion of overlaps
Now that I know the DMLs and CGs intersect with various feature tracks, I need to quantify the overlap proportions. I answsered the following questions in this Jupyter notebook:
- What proportion of total overlaps does a certain feature track represent?
- Out the total number of CG motifs, how many overlaped with a feature track?
- Out the total number of DMLs, how many overlaped with a feature track?
It was simple enough to conceptualize and carry out! Just needed to count the number of DMLs or CG motifs, then count the number overlaps before summing and dividing in various ways. See my previous notebook post for code that counts the lines in BEDfiles and overlaps.
Here’s the answer to the first question:
- Proportion exon overlap with total CG overlaps: 67.55% (0.6754709569888477)
- Proportion intron overlap with total CG overlaps: 26.06% (0.2606253947864305)
- Proportion mRNA overlap with total CG overlaps: 6.39% (0.06390364822472172)
Most of our CG overlaps come from exons, which code for proteins. Interesting!
Here are the answsers to the second and third questions:
|Feature||% CG Overlaps||% DML Overlaps|
Those numbers are different, so there’s definitely some sort of enrichment going on! Maybe when I figure out that gene set enrichment everything will make sense.