It’s not enough to understand where DMLs or CG motifs intersect with exons, introns and mRNAs! Regions up or downstream from mRNA regions may contain promoters or other transcription factors. These areas could also be regulated by methylation. To explore this, I identified
flank in `bedtools in this issue as a method to add 1000 bp to the beginning and end of an mRNA region.
Before I could do this successfully, I had to create a “genome file”. For
flankBed, the genome just needed to be a tab-delimited file with the chromosome name, start and stop positions. I used NCBI information to create the “genome file” in TextWrangler.
I used the following code in my Jupyter notebook to generate the flanks:
! /Users/Shared/bioinformatics/bedtools2/bin/flankBed -i C_virginica-3.0_Gnomon_mRNA.gff3 -g 2018-06-15-bedtools-Chromosome-Lengths.txt -b 1000 \ > 2018-06-15-mRNA-1000bp-Flanks.bed
I had an issue running the code over the weekend, since I couldn’t get any output. Once I restarted Jupyter and reran the code, it worked!
I took my
flankBed output and ran it through
intersectBed to find overlaps between the output, DMLs, and CG motifs.
! /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \ -wb \ -a 2018-06-15-mRNA-1000bp-Flanks.bed \ -b ../2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed \ > 2018-06-19-mRNA-100bp-Flanks-DML.txt
! /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \ -wb \ -a 2018-06-15-mRNA-1000bp-Flanks.bed \ -b C_virginica-3.0_CG-motif.bed \ > 2018-06-19-mRNA-100bp-Flanks-CGmotif.txt
You can find the flank-DML output on Github, but the CG motif one was too big. I’ll upload that to OWL shortly.
That’s a wrap on the gonad methylation stuff until I’m back in town!