This week we hear about breakfast do and don’ts, everyone’s goals, and get to listen to a LIVE paper submission!
I’ve been sitting on the Hawaii
bismark data for a while, so it’s time I actually look at it! I referred to the MethCompare repository to remind myself what Shelly did to evaluate
bismark output quality. I found this script and realized all I needed to do was 1) get
multiqc information for
bismark output, and 2) look at the data in IGV!
When I looked at my script, I realized I never ran
multiqc on the output! I navigated to where I stored the output on
gannet and ran
multiqc * to get the reports. From this command, I got a HTML report with relevant summary statistics, as well as individual text files with the raw data (found in this folder). Looking at the general statistics, sample alignment averaged at 60%, which feels pretty good! About 12% of reads per sample aligned ambiguously, and 26% did not align at all. I was more interested in duplicated read alignment. Since I’m comparing diploid and triploid oysters, sequence duplication seems like it would be a bigger issue if there were duplicates removed from triploid oysters. Percent duplicated reads removed ranged from 10-12% for each treatment combination (ploidy x pH), so there doesn’t seem to be a treatment effect (at least at first glance). I did notice that sample 7 had more reads than any other sample by a long shot.
Figure 1. Reads aligned for reach sample
My next step was looking at the coverage files in IGV! I created coverage files that merged CpG locations (
*merged_CpG_evidence.cov), filtered the data to 5x coverage, then created bedgraphs (
*5x.bedgraph). I wanted to look at these bedgraphs in IGV with respect to genome feature files.
The first thing I did was download the lateset version of IGV (v 2.9.2) from this link. I found the links to the C. gigas genome and genome feature files in the Genomic Resources wiki. I created this IGV file using the
gannet links, and colored samples based on treatment (light blue (1-6): 2N high pH; light red (7-12), 2N low pH; dark blue (13-18): 3N high pH; dark red (19-24): 3N low pH). From my quick browse, most methylation seems to be in the gene regions (makes sense because invertebrates), but I did find some instances of transposable element methylation.
Figure 2-3. IGV tracks
Now that I’ve looked at the data, my next steps are to load data into R to explore with
methylKit. While I won’t be able to conduct a full analysis, I can obtain sample methylataion histograms and PCAs. To identify DML, I’m going to look into
ramwas. Addtionally, Steven brought up that there’s a revised C. gigas genome! While I work through DML identification with this dataset, I should also repeat
bismark with the new genome.
- Align samples to the revised C. gigas genome
- Obtain preliminary methylation assessment from
- Test-run DSS and ramwas
- Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
- Transfer scripts used to a nextflow workflow
- Update methods
- Update results