Grace’s Notebook: 2015 Oysterseed Skyline questions

Today I worked on trying to figure out what could have made my error rates from last week so awful. I re-did the Skyline DIA protocol many times and ended up realizing that some of my samples only had a few files listed in the protein, while others had all four. Asked Yaamini for clarification. I was told by Emma to make sure that I have four tabs in Skyline. Each tab represents a sample, and each sample has 4 files associated with it.

Here’s the post where Emma shared new info with me: here.
Here’s the post where I figure out how to make 4 tabs: here

Here is the GitHub issue detailing what I realized today: #290

Will revisit this issue when I’m back at SAFS on Tuesday. Tomorrow and Thursday I’m volunteering at Highline StormFest in Des Moines, WA, where I and other volunteers will teach 6th graders about stormwater and pollution!

from Grace’s Lab Notebook

DML Analysis: Possible Gene Enrichment Methods

topGO and GOstats look promising. This blog has some code for how to use these packages, but it’s a little unclear. Overall consensus on this forum is that DAVID is heavily-used, but should not be used anymore since it hasn’t been updated in several years. Would also be nice to have something in a script instead of a website.

DML Analysis: Notes

The mRNA track includes exon and introns. The exon track just has exons. See the issue.

Next steps in analysis (from this issue):

  • Proportion CGs in exons, introns, and mRNA
  • Find DMLs and CG motifs within 1000 bp of mRNA, possibly using -iobuf with intersectBed
  • Enrichment analysis of mRNA genomic regions. Need to find specific package?

Yaamini’s Notebook: DML Analysis

Finding overlaps between DMLs and other things

In this Jupyter notebook, I used intersectBed from bedtools to find overlaps between DMLs I identified in methylKit! I found the Genome Feature Tracks in the Genomic Resources wiki page. There were four feature tracks I could use:

  1. Exons: The coding regions
  2. Introns: Regions that are removed
  3. mRNA: The things that make proteins!
  4. CG motifs: Regions with CGs where methylation can occur

I used the following command to count the number of overlaps between my DMLs and the various feature tracks:

 ! /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \ #Path to program -u \ #Just report the fact that one overlap was found in a region -a ../2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed \ #Path to DML bed file -b {} \ #Path to feature track | wc -l #Count lines  

I then used a similar code to write out the overlaps in a .txt document:

 ! /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \ #Path to program -wb \ #Write the entry information from File B -a ../2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed \ #Path to DML bed file -b {} \ #Path to feature track > 2018-06-11-DML-{}.txt #New .txt file created  

I also found and counted overlaps between CG motifs and exons, introns, and mRNAs! The reason I did this was because if most of the CG motifs are in the exon regions, then we’d expect to find most of the methylation in the exons. All of my .txt files that weren’t too big for Github can be found in this folder.

One thing I noticed is that I found more overlaps than Steven did for all categories. My guess is that the default score_min option I used lead to more overlaps. It will be interesting to look into the differences. Another important thing to mention is that there are more exon overlaps than mRNA overlaps. This does not make sense to me, since exons are what code for mRNA. I’ll post an issue with this question.

I think my next step has to do with gene ontology, but I’m not entirely sure what that entails. Guess I’ll post another issue.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student