Small tasks for future analyses
Since I don’t have much to do in the way of analysis right now, I decided to tackle two smaller tasks that will inform how we proceed with our analysis.
Union bedgraph code
Steven generated union bedgraphs with 5x data for M. capitata]() and [P. acuta*. Hollie then imported the union files in R, calculated the average percent methylation at a locus for each method, then used that information to generate Circos plots in this script. At our meeting last week, Hollie asked me and Mac to go over the code just to get another pair of eyes on it. I went through the Jupyter notebooks and got acquainted with
unionbedg. That code seemed to be working fine, so I went through the portions of Hollie’s R script where she imports the union files and calculates averages. As a sanity check, I looked at the range of methylation values at each step to make sure it did not exceed 100% or 1. I didn’t see any outlier values! I also confirmed that the correct files or columns were used to delineate between the different methods. Unless I missed something or there is something at play before this script, it looks good to me! I posted my recap in this issue.
Earlier, I found out that neither the M. capitata nor P. acuta genome annotations included UTR. Hollie suggested I look at other coral methylation papers that characterized exon methylation to see how they were defined. I looked at Aiptasia and Stylophora pistillata genome feature files used by Li et al. 2018 and Liew et al. 2018 respectively. Looking at the GFFs, I saw that 1) there were distinct exon and CDS annotations in both genomes and 2) the exon track included UTR information.
Figures 1-2. Aiptasia and S. pistillata GFFs with exon and CDS information. The purple arrows and boxes indicate exons and respective CDS.
The authors that created these genomes are from the same institution, so I’m not surprised that both genomes have similar annotations and annotation criteria. I posted my quick recap in this issue. I don’t really know how we’ll be able to compare our findings to these papers if we’re unable to get UTR information for exons, but I guess we can always treat our CDS information like exons and look at CDS-intron boundaries.
- Figure out how to meaningfully concatenate data for each method
- Generate repeat tracks for each species
- Rerun the pipeline with full samples once pan-genome output is assessed and find a way to generate tables programmatically
- Create figures for CpG characterization in various genome features
- Update code for methylation frequency distribution figure
- Figure out methylation island analysis