Gene enrichment for mRNA overlaps
Back in analysis mode! I decided to tackle a gene enrichment for the any feature files that overlaped with mRNA coding regions. The reasons why I only chose files with mRNA coding region overlaps are because learning which coding regions are enriched is most interesting to me, and because I needed Genbank IDs to match overlap results with my
blastx output. I did a gene enrichment before, but this was using the wrong background. For this analysis, I used the a previously generated
blastx output and the gene background from
methylKit. I merged documents and isolated Uniprot codes in this R Markdown file.
I followed the instructions from my previous gene enrichment using DAVID. I downloaded the functional annotation table, functional annotation clustering, and GOterm information (biological processes, cellular components, and molecular functions) from DAVID for each analysis. The output from all my analyses can be found in this master folder. I’ve gone into detail about some of the GOterm results below, but there were barely any significantly enriched terms after correcting for multiple comparisons. There are a handful of enriched GOterms that are significant without correction that could be interesting to describe.
Based on corrected p-values, there was only one significantly enriched GOterm for DMR-mRNA overlaps: cilium morphogenesis! That could be interesting seeing how impacts on cilia could affect cellular structure. The only other GOterm with less than a 10% FDR was cellular projection organization, which may also be involved with cilia, flagella, and sperm motility.
There were no significantly enriched GOterms when I looked at DMLs instead of DMRs. For cellular components, cytoplasm had a FDR less than 10%. The molecular function ubiquitin-protein transferase activity also had a FDR less than 10%.
I previously conducted a flanking analysis to identify 100 bp flanks upstream and downstream of mRNA coding regions. I then intersected these flanks with DMR and DML. Understanding what processes are enriched in the flanking regions can provide insight into regulatory mechanisms.
There were no significantly enriched GOterms for the intersection of upstream flanks with DMRs or DMLs after correcting for multiple comparisons.
There were no significantly enriched GOterms for the intersection of downstreams flanks with DMRs or DMLs after correcting for multiple comparisons.
When I conduced my flanking analysis, I also identified the closest non-overlapping DMR and DML to each mRNA coding region. Again, understanding what processes are enriched in these non-overlapping elements may provide information about regulatory mechanisms or related gene functions. I used this file for DMRs and this file for DMLs.
There were also no significantly enriched GOterms for the closest non-overlapping DMRs or DMLs.
- Determine if this is the best gene enrichment approach
- Find a way to do a gene enrichment with exon, intron, and transposable element overlaps
- Describe functions of most interesting genes with DML and DMR