Graphing Day 10 DMRs by treatment.

Update: Better labels on graphs added.

Edit: just realized that my graphs are poorly titled. They are graphing the percent methylation at DMR sites. Will update them as soon as I can.

Worked on graphing Day 10 DMRs by treatment. I first tried a plot with means by treatment for all DMRs in a single plot but that was difficult to read, so I settled on individual box plots by location as I felt this captured both the location as well as the variation in the data.

A few example plots:




Code, and a walkthrough of the process is available here

A PDF of the plots can be downloaded here

DMR analysis for Day 10 Geoduck samples

Spent Friday looking for differentially methylated regions in the Day 10 geoduck samples of Hollie’s.

We’re using the MACAU algorithm for DMR identification, and that requires four files to properly run.

  • Vector of predictor variables (in our case, OA treatment of Ambient/Low/Super Low)
  • n x n matrix of pairwise relatedness values for n samples
  • Methylation counts for samples, with columns being samples, and rows being site IDS: A note regarding MethylExtract output, the output has a vector of scaffolds and a vector of nucleotide positions, these need to be concatenated in to a single vector to conform to MACAUs requirement.
  • Total Coverage counts for samples. Has the same requirements as the methylation counts for single scaffold/position vector.

I had built the predictor file previously. Ambient samples were coded 1, Low pH samples were coded 2, and Super-low pH samples were coded 3. Sample information was found [here](

For the relatedness matrix we’re using VCFtools based on SNPs:

I was unable to get MethylExtract to operate on a group of samples properly, so I had to run each sample individually. This necessitated an additional step in merging all of the sample SNPs in to a single file. This is also done in VCFtools via the `vcd-merge` command, which requires bgzipped SNP files, with tabix index files. The code below creates the bgzip SNP files and the index files in tabix, then runs vcf-merge. This takes a while, ~ 25 minutes for the Day 10 samples.


After that, we feed the aggregated SNP file in to vcftools


which results in


VCFtools output is not quite appropriate for MACAU, as it’s a 3 x ((n-1)(n)/2) length vector so I need to reformat it in Excel to the n x n format we want. This is just a whole lot of copy and transpose paste. The result in excel looks like:


Notice there are no column or row headers. Meaning for values is derived by the order put in, so count files, predictor files, and relatedness files have to have the same order.

Methylation and Total Coverage count files were created simultaneously in R.

A quick walkthrough of the code below. First, I make a vector of samples I want to iterate over. Then I work on my first sample outside of the loop, to avoid any fencepost issues. I read the data in, turn it in to a data.table (an expanded data frame, which deals with very large files better), sum across both the Watson and Crick strands for methylation and total coverage, saving these to new vectors in my data table and then save the Scaffold, location, context, and sum of methylation/coverage to a new data table.

After I have my first post created, I iterate through all other samples, using the same approach. The end result are two data tables with Columns for Scaffold, Position, Context, and counts for each sample.


This isn’t quite the proper format for MACAU, plus it has a bunch of locations that don’t have a complete set of reads across all samples at the proper sequencing depth, so I find all of the samples we want to actually analyze below


I begin by creating a new data table that only has positions that have data for each sample. Then I further subset the data by only accepting sites that have 10x coverage across all samples. I then created a new data table with a concatenated Scaffold/Position column, and count data from each sample that meets our requirements. These files are saved as tsv files for input into MACAU

Running Macau!

I now have all the files I need to run the MACAU algorithm, which I do below. The output isn’t very exciting, but it’s done!


The results look like:


MACAU creates a binomial mixed model for each location, and tests if the coefficient on the predictor value, beta, is significant. In essence, does the predictor/treatment add meaningful/significant information to the model for the location, which I believe indicates that the predictor/treatment has a significant effect on the methylation pattern at that site.

With the MACAU results, I want to go through and find the locations are significantly affected by the OA treatment. I do that below. First I save all files with beta p-values less than 0.05, then I create a new data table following the .bed format of scaffold, location a, location b, and a value, I chose the beta value, but that was just an off the cuff decision. I then save this as a bed file for viewing in IGV.


I had some percent methylation bed graph files I had created earlier that I compared to the bed file I created above in IGV. I didn’t save screenshots from IGV, but will update this post with screenshots once I’m done downloading the files.

-80 Organization

Continued with sorting and organizing -80 contents. Here is where I’ll be uploading pictures of metal tray configurations and their contents. Will get more specific pictures for boxes of relatively unknown samples.

Lab Enhancements

Re-labeled some boxes in the -80 that were deemed worthy of saving. Also transferred any samples that were kept in plastic boxes into cardboard so that they would fit in metal trays. Placed the boxes in the metal trays. So far the top shelf of the left -80 has been sorted through. I snapped some pictures of the trays, but the organization will likely change as we continue to go through boxes.  I would like to organize within the metal trays based on date and/or animal type.

Would like to sort -20 drawer contents better, but I am not entirely sure what the chemicals are used for, so I may need some assistance for organizing them in a way that makes sense.

Lab enhancements

Turns out the old QR Code generator website requires a monthly fee after a free trial in order to keep the code active. So, I made a new QR code for the lab inventory. Less fancy than the old version, but should remain free and active.

Steven and I have continued -80 organization. Top shelf of left -80 has been sorted through. Next step will be to clearly relabel (if needed) boxes of saved samples and place in metal trays, documenting contents with photos and/or spreadsheet (will begin tomorrow).

-20 in Rm 209: labeled the plastic drawers/bins 1 through 10 and added the documentation as a tab to the lab inventory which can be accessed through the QR code (new one posted in labs 209 and 213) or via the short URL: Will go through more in depth and reorganize in the containers so that related kits/chemicals/samples are together.

Assembling supplies and labeling tubes for C. gigas collection for Manchester trip for Wednesday. Will grab final things such as dry ice and a few other things tomorrow.

Other projects: Just waiting on a dissecting scope to continue Oyster seed measurements.

Perplexing results from Geoduck analysis.

Notebook entry here:

R notebook:

Data used and outputs:

Test data set results for Geoduck methylation

Been working on Hollie’s geoduck data, and while Emu grinds through the samples, I took a test set of data and brought it through identifying DMRs. The notebook entry describes what input files to the MACAU algorithm need to look like, and links to the results from a 4 sample data set.!