Preliminary follow-up GLMs
Previously, I tried using NMDS and ANOSIM to understand if sequencing method influenced differences in proportions of CpGs in various methylation statuses or genomic locations. I’m still waiting on a response to some questions I had, but in the meantime I’ll try using generalized linear models to get at some of these differences. Since we have small sample sizes, using a combination of approaches gives us more power. Specifically I will be testing if sequencing method or genomic location influences the proportion of CpGs in a particular methylation status.
In this script, I started creating my models. Evan suggested I use
glmmTMB to run beta family models with a logit link. Since the proportions of highly, moderately, and lowly methylated CpGs add up to 1, I need to run separate models:
McapCpGHighModel <- glmmTMB(percentMeth ~ CDS + Introns + Upstream.Flanks + Downstream.Flanks + Intergenic + seqMethod, family = beta_family(link = "logit"), data = McapCpGMasterHigh) #Run the full model (all genomic locations) using a beta distribution and a logit link
When I ran the full model, I got a convergence issue error. Digging into
glmmTMB vignettes, it suggested that my model could be overparametrized. This made sense, since I have minimal data and five model parameters. I decided to run two models: one for gene features and another for non-gene features. First, I ran a null model (without sequencing method) and a full model (with sequencing method):
McapCpGHighModel.GeneNull <- glmmTMB(percentMeth ~ CDS + Introns, family = beta_family(link = "logit"), data = McapCpGMasterHigh) #Run null model (without sequencing method) using a beta distribution and a logit link McapCpGHighModel.Gene <- glmmTMB(percentMeth ~ CDS + Introns + seqMethod, family = beta_family(link = "logit"), data = McapCpGMasterHigh) #Run model using a beta distribution and a logit link
Then, I compared these models with a likelihood ratio test:
McapCpGHighModel.GeneANOVA <- anova(McapCpGHighModel.Gene, McapCpGHighModel.GeneNull, test = "LRT") #Compare the null and treatment models with a likelihood ratio test. McapCpGHighModel.GeneANOVA #Look at ANOVA output
Finally, I looked at the summary output for the best model using
summary. Usualy, the best-fit model was one that included the sequencing method. I’m not sure how (or if) to compare the gene and non-gene models, but AIC might be useful in that respect.
Still have questions, but at least it’s something to discuss at our meeting!
- Finalize analysis
- Locate TE tracks
- [Characterize intersections between data and TE, and create summary tables]
- Look into program for mCpG identification