Yaamini’s Notebook: WGBS Analysis Part 4

Looking at preliminary DML

I have DML!. To clarify, I have 15,386 DML when filtering my data for 5x coverage. That’s more than twice as many DML as I got for my C. virginica data. Before I characterize the location of my DML in the C. gigas genome, I’m going to do two different things…

Trying different coverage in methylKit

Since I used WGBS, there’s a chance that I can use a more stringent coverage filter instead of 5x coverage. I returned to my R Markdown script and reprocessed the data using 10x coverage. I somehow ended up with more DML using 10x coverage! The files I generated can be found below:

Looks like the reason why I have more DML when using 10x coverage is because I gained 300 hypermethylated and 600 hypomethylated DML. The increase in hypomethylated DML is interesting. My C. virginica data had a pretty even split of hyper- and hypomethylated DML, whereas with 5x coverage I have 700 more hypomethylated DML and 1,000 more hypomethylated DML with 10x coverage. I don’t entirely know why this is happening so good thing I already planned on checking things out in IGV.

Table 1. Number of CpG loci in various categories

Coverage DML Background All DML Hypermethylated DML Hypomethylated DML
5 11,285,932 15,385 7,384 8,001
10 5,210,744 16,324 7,669 8,665

Verifying DML in IGV

I need to confirm that my DML are real, and the best way to do that is in IGV. I realized my IGV veresion was outdated, so I downloaded version 2.6.3 from the website. Since I’m looking at different coverage metrics, I also wanted to generate some bedgraphs that matched the coverage settinsg I used. I created this Jupyter notebook by copying my C. virginica equivalent and created 5x and 10x begraphs for each sample. After moving the contents of my repository to gannet, I used URLs to add the following tracks to a new IGV session:

I actually struggled quite a bit to load all of the tracks into the IGV session. I made the mistake of adding the full bedgraphs in the IGV, but couldn’t open the file to delete the tracks. When I tried editing the xml file itself, I could successfully add tracks but not remove them. I ended up deleting my session and starting a new one. Once I created a new session, I could do what I needed!

Screen Shot 2019-09-13 at 3 33 13 PMScreen Shot 2019-09-13 at 3 37 03 PMScreen Shot 2019-09-13 at 3 39 20 PMScreen Shot 2019-09-13 at 3 41 54 PMScreen Shot 2019-09-13 at 3 47 14 PMScreen Shot 2019-09-13 at 3 48 27 PM

Figures 1-6. DML on various chromosomes.

Looking at the DML, I can see that I didn’t export my BEDfiles correctly. The DML background and DML themselves don’t overlap neatly with CpGs (they’re 1 bp off). I need to change how I modify the base pairs in methylKit and resave the tracks.

I also don’t trust the 10x DML that were not in 5x DML. I don’t know what’s going on, but it’s flaming hot garbage. The other DML in the 10x track that were common to the 5x track looked good, so I think I want to remove all 10x DML that are not common to the 5x track. I will figure out a way to do that.

Quick bismark update

The alignments went well! My first file deduplicated successfully, but my second file was saved with the wrong filename. I realized my wildcard was not specific enough, so I created a modified script from deduplication onwards and saved it here. I moved the script to Mox and started the job. Once I get the files reprocssed I’ll not only run through my analyses, but also create one master script so I don’t have to piecemeal anymore.

Going forward

  1. Get revised bismark output from Mox and run through analysis pipeline
  2. Fix DML background and DML BEDfiles in methylKit
  3. Remove 10x DML that are not present in 5x DML track
  4. Obtain C. gigas feature tracks
  5. Characterize locations of DML
  6. Conduct a gene enrichment and/or identify genes with DML and discern what it means

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/32Fz517

Yaamini’s Notebook: WGBS Analysis Part 3

Rerunning bismark and starting with methylKit

My bismark headache

After Mox maintenance ended, I ran this script and hoped for the best. Somehow, only my ambient file was processed! I tried running just the alignment for my low pH sample with this script and got this error:

 Use of uninitialized value $path_to_bowtie in concatenation (.) or string at /gscratch/srlab/programs/Bismark-0.21.0/bismark line 6893. Failed to execute Bowtie 2 porperly (return code of 'bowtie2 --version' was 32512). Please install Bowtie 2 or HISAT2 first and make sure it is in the PATH, or specify the path to the Bowtie 2 with --path_to_bowtie2 /path/to/bowtie2, or --path_to_hisat2 /path/to/hisat2  

Stuck after changing path_to_bowtie to path_to_bowtie2 and not getting the script to run, I reopened this issue/. Steven had me add the following line of code to the top of my script:

 source /gscratch/srlab/programs/scripts/paths.sh  

Apparently it does something with the bash paths set on Mox such that the paths in my script are recognized. I was able to align, deduplicate, sort, and index the low pH file!…but I couldn’t get bismark2summary to work:

 It looks like there is a mix of samples, e.g. RRBS as well as WGBS, in this folder. Please consider running bismark2summary only on samples of the same data type, or specify the input files manually (--help for more information). Extiting...  

For some reason, bismark thinks that my files were made with different sequencing methods. I created this script and manually set the files I wanted the command to use. Even then, I still got the error! I posted this issue because I was concerned about the file formats. Steven suggested I reformat my scripts and process the files together.

I created a 2019-09-11-Gigas-Bismark directory and started running this script to hopefully process both files together. In the meantime, I transferred all files I’ve generated to start testing out methylKit. I used this code to transfer the files:

 rsync --archive --progress --verbose /gscratch/scrubbed/yaamini/analyses/Gigas-WGBS/2019-09-03-Gigas-Bismark/* yaamini@  

Trying methylKit

Even though my final bam files may not be identical to the ones I transferred from Mox, I wanted to use them to test out the methylKit pipeline I generated for C. virginica. I created a new R project and this R Markdown script, changed all the file names, and started processing files. I decided to keep the same settings I used for C. virgincia:

  • 5x coverage; generated using processBismarkAln with mincov = 2 to quickly process my files, then using filterByCoverage to set minimum and maximum coverage thresholds (5 and 100)
  • destrand = TRUE to separate forward and reverse strands

I started processBismarkAln and it’s chugging along. Here’s hoping I can solidify the pipeline and get data so I can make a presentation!

Going forward

  1. Finish testing methylKit pipeline
  2. Get revised bismark output from Mox
  3. Obtain C. gigas feature tracks
  4. Characterize locations of DML (if there are any)
  5. Conduct a gene enrichment and/or identify genes with DML and discern what it means

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/32Ogm3N

Shelly’s Notebook: Thur. Sept. 12, 2019 Geoduck Genome paper

Multi Sample BamQC

Generateed multi sample BamQC report with this jupyter notebook: https://github.com/shellytrigg/Shelly_Pgenerosa/blob/master/analyses/MultiBamQC_allEPIsamples.ipynb

Geoduck genome 9/12 call:

  • Decided we will focus on day10 comparison all groups
    • next week we will compare:
      • DMGs (Hollie)
      • DMLs (Steven)
      • DMRs (Shelly)
    • we will decide if analyses are redundant and if we should just go with one and then focus on the functional analysis
  • one idea for TE analysis: compare TEs with DMRs to all TEs
    • is there a specific TE family that has DMRs?
  • a method for plotting chromosomes with colored regions: karyoplotter
  • methylation plotter (web interface Steven posted)

from shellytrigg https://ift.tt/34G4nXF