Revised WGCNA with GO-MWU analysis

https://github.com/eimd-2019/project-EWD-transcriptomics/blob/master/analyses/WGCNA/WGCNA.md

Yaamini’s Notebook: MethCompare Part 24

Another (and hopefully final) iteration of union bedgraph analyses

Last night, Shelly pointed out that there were different versions of union begraphs in this issue. Steven originally created union bedgraphs, but in one of our meetings we realized that there were duplicate entries in each file. Shelly removed those duplicates and saved another version of files. I knew about these versions, but only used the older version with duplicates for my analysis! Even though my bedtools code was written such that overlaps with genome features are only listed once, the presence of duplicates would affect the proportions of highly methylated, moderately methylated, and lowly methylated loci. I changed the file path I used to source the union bedfiles in this Jupyter notebook and reran the script. Then, I took that output and ran it through this R Markdown script to generate my figures and count information. As expected, no drastic changes to the methylation status stacked barplots and no change at all to the genomic location barplots. I updated the figures in the manuscript and started working on the results. Now that we have the correct input and all the code is (hopefully) refined, perhaps I won’t have to run this again in the future…

Going forward

  1. Look into program for mCpG identification

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2ZTcgay
via IFTTT

Yaamini’s Notebook: MethCompare Part 23

Calculating median percent methylation

We’re wrapping up our analyses. Hollie is looking at gene methylation by method and asked me to calculate median methylation by gene for each sample, since I did that before for C. virginica. I set out to do just that in this script. I started by testing code on M. capitata samples, then used the finalized code for P. acuta as well.

Looking at my files on gannet, I realized I never transferred the intermediate genonme feature overlap files! I started rerunning this Jupyter notebook to get the gene-sample overlap files. Even after I generated those files, I was missing percent methylation information. I decided to include percent methylation information in the intermediate BEDfiles before using intersectBED. After using intersectBed, I had chromosome, start, stop, and percent methylation information for each sample.

I returned to my R Markdown script to import the overlap files. Again, I realized I was missing something! I needed to associate CpG loci with genes. I used intersectBed to match CpGs with genes:

#For each sample-gene overlap file (ends in *bedgraph.bed-mcGenes) #Use intersectBed to find where loci and genes intersect, allowing loci to be mapped to genes #wb: Print all lines in the second file #a: sample-gene overlap file #b: gene track #Save output in a new file that has the same base name and ends in -geneID for f in ../analyses/Characterizing-CpG-Methylation-5x/Mcap/*bedgraph.bed-mcGenes do /usr/local/bin/intersectBed \ -wb \ -a ${f} \ -b ../genome-feature-files/Mcap.GFFannotation.gene.gff \ > ${f}-geneID done 

I then took the files and moved them into a Median-Methylation-Calculations subdirectory. Next, I needed to import the nine files for each species into R. I used code I learned about when working with C. virginica to import all the files at once:

setwd("../analyses/Characterizing-CpG-Methylation-5x/Mcap/Median-Methylation-Calculations") #Set working directory within the notebook chunk for list.files to find the necessary files filesToImport <- list.files(pattern = "*geneID") #Create a file list for all 9 files to import. Only import overlaps for full samples (not divided by methylation status) list2env(lapply(setNames(filesToImport, make.names(gsub("_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcGenes-geneID", "", filesToImport))), read.delim, header = FALSE), envir = .GlobalEnv) #Import files to the .GlobalEnv with list2env. Use lapply to setNames of the files by taking all the common parts of their names out. Read files with read.delim and include header = FALSE. Files will be named Meth# head(Meth10) #Confirm import 

My next steps were simple: remove redundant columns, add column names, and use aggregate to calculate median percent methylation for each gene ID. That would amount to three lines of code for each sample, but I didn’t want to write an individual block of code for each sample! Based on this Stack Overflow post, I created a vector of dataframes, then pulled the dataframes out of the vector in a loop to format the files:

samplesMcap <- c("Meth10", "Meth11", "Meth12", "Meth13", "Meth14", "Meth15", "Meth16", "Meth17", "Meth18") #Create a vector of sample names 
for(sample in samplesMcap) { #For each sample listed in samplesMcap sample.tmp <- get(sample) #Extract sample based on vector contents sample.tmp <- sample.tmp[,-c(5:12)] #Remove extraneous columns colnames(sample.tmp) <- c("chr", "start", "stop", "percentMeth", "geneID") #Rename columns assign(sample, sample.tmp) #Replace sample with edited sample.tmp contents } head(Meth17) #Confirm formatting changes 

Finally, I used similar loops to calculate median percent methylation by gene ID and write out tab-delimited files for each sample:

for(sample in samplesMcap) { #For each sample listed in samplesMcap sample.tmp <- get(sample) #Extract sample based on vector contents sample.tmp <- aggregate(percentMeth ~ geneID, data = sample.tmp, FUN = median) #Use aggregate to group geneID and calculate median percent methylation assign(sample, sample.tmp) #Replace sample with edited sample.tmp contents } head(Meth17) #Confirm median methylation calculation 
for (i in 1:length(samplesMcap)) { #For each sample listed in samplesMcap sample <- get(samplesMcap[i]) #Extract sample based on vector contents fileName <- paste("../analyses/Characterizing-CpG-Methylation-5x/Mcap/Median-Methylation-Calculations/", samplesMcap[i], "-Median-Methylation", ".txt", sep = "") #Assign filename for each sample write.table(sample, file = fileName, sep = "\t", row.names = FALSE, col.names = TRUE) #Write out files into the Median-Methylation-Calculations subdirectory } 

The trick of putting dataframes in a vector to run the same commands on them is super useful. Definitely using that in the future. I posted the links to the M. capitata and P. acuta subdirectories in this issue for Hollie to use.

Aside from calculating methylation, I also modified figures based on suggestions from Hollie and updated the methods section. Not much left to work on (hopefully)!

Going forward

  1. Update results
  2. Look into program for mCpG identification

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2ZdwSLo
via IFTTT

WGCNA code

https://github.com/eimd-2019/project-EWD-transcriptomics/blob/master/analyses/WGCNA/WGCNA.Rmd

Yaamini’s Notebook: July 2020 Goals

EWeLmQlU4AET-ym

Labs are opening back up so I’m wrapping up my bioinformatics time and transitioning into pipetting time.

JUne Goals Recap:

Coral Methylation Comparison:

Other:

  • Made progress on side projects (ocean acidification and reproduction review and eelgrass transcriptomics)
  • Collaboratively wrote a proposal for a DEI-themed Bevan Series
  • Put together a talk for the Friday Harbor Labs Summer Virtual Seminars!

July Goals

Coral Methylation Comparison:

  • Update the methods and results in the paper
  • Add to the discussion points

Gigas and Virginica Labwork:

  • Review and update health and safety plan
  • Pitch plan to Katie and Alan
  • Create an action plan for samples
  • Take inventory of materials and start ordering what’s needed

Other:

  • Wrap up side projects (ocean acidification and reproduction review and eelgrass transcriptomics)

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2NRvc3W
via IFTTT