Shelly’s Notebook: Nov. 7-9, Geoduck DMR feature analysis

Generate appropriate background

  • I previously used within-sample DMRs filtered for coverage in 3/4 individuals/group as the background.
    • HOWEVER, these did not include all sites that had the potential to be methylated
    • To create a more inclusive background, I need to look at all CG sites considered prior to determining within-sample DMRs
  • Within-sample DMRs were determined by:
    • having at least 3 differentially methylated sites (DMS)
      • DMS were determined by:
        • having 5x coverage
          • if sites are within a 30bp window their counts can be combined
        • passing an RMS test significance threshold of 0.01
    • within a max distance of 250bp
  • ATTEMPT 1: If I remove the significance threshold (and set –sig-cutoff to 1 instead of 0.01) for DMS, than I should get results from all sites considered for DMS.
    • I attempted this using this mox script:
      • RESULTS: Resulting files indicate some other filtering is happening because because the number of regions is the nearly the same as when I ran DMRfind with significance level set to 0.01. I had expected a much longer list of regions.
  • ATTEMPT 2: To address if the software only consider CG sites with a value in the methylation counts column of the allc file I created new allc files with the coverage counts column also as the methylation counts column (assuming 100% methylation at every site).
  • ATTEMPT 3: To adjust other parameters in attempt to remove all filtering, I tried the following settings in this mox script here: on ambient samples only as a test:
    • –resid-cutoff 1
    • –min-tests 1
    • –num-sims 1
    • –sig-cutoff 1
    • RESULTS: resulting files indicate certain criteria in the software are not met as output contains only a header

4th times the charm!

Plot background vs. sig. DMR features

I updated the Rmarkdown file I used previously to generate a bar plot showing the proportion of features that significant DMRs vs. background sites fall into.


RESULTS: There are no strong differences between significant DMRs and background CpG sites in the proportion of features they overlap with, except:

  • Day145 comparison where CDS and exon features are under-represented, and repeat_regions are over-represented.
  • CDS and exon are slightly over-represented in Day135 and all ambient sample comparisons

Next steps

  • Look deeper into repeat regions (there are ~6 different catagories, so can see if any are particularly different)
  • For DMRs no in features, check the nearest features
  • Compare genes with DMRs to the ones identified by Hollie’s method
  • Continue GO analysis
    • generate appropriate GO background to use for each comparison
    • run TopGO
  • Go through manuscript methods section
    • update new stuff
    • add comments to areas of concern

from shellytrigg

Grace’s Notebook: Still no clear RNA bands on Bioanalyzer; Also working to get diff. exp. for GSS

Today I went through the DNase-free plan with 4 of the extracted crab samples. There was initially quite a bit of DNA, then after I used the Turbo DNA-free Kit, there was significantly less DNA. The Bioanalyzer still didn’t look super great… detail in post. Also, I’m working through kallisto and have abundance files – working to figure out next steps in order to hopefully create a heatmap of differential expression between the infected and uninfected crabs (combining day 12 and day 26). Details in post.

DNA-free plan and results

GitHub Issue: #792


  1. Run four samples from the above group on Qubit using DNA HS
  2. Use Turbo DNA-free Kit on those samples
  3. Run those samples on Bioanalyzer
  4. Re-run those samples on Qubit using DNA HS


  1. Done –> results
  2. Done (Did the routine DNase treatment)
  3. Done (screenshots below) ran each of the 4 samples twice (##-a and ##-b)
  4. Done – samples were eluted in ~40-45ul
    (ran dsDNA HS results; ran RNA HS results)




There was a LOT less DNA in the samples after the DNA-free Kit process. However, it resulted in pretty dilute RNA because I had to add enough RNA-free water in the beginning in order to have 50ul of sample for the reagents. Also, the RNA still didn’t band very well on the bioanalyzer. I think Steven said we’ll just pool them and send them off the NWGC… will confirm next week.

I’ll have to re-extract those 11 samples that I worked with this week on the qubit and bioanalyzer becuase there is not much material left!! Not a big deal. I can do that Monday or Friday (want to have enough time to prepare for GSS talk next Thursday as well).

Working for new analyses for GSS: Kallisto

Sam helped me a TON today.

He helped me set up hummingbird in FTR 213 so that I have my own user account, and that made everything a lot easier. He also helped me get git working from command line so that I can keep all of my work on GitHub in project crab.

Here’s my jupyter notebook: 2019-11-06-bairdi-kallisto.ipynb

There are so many cells because each sample (4 total) has 2 samples for both lanes (4 files total per sample).

I made individual directories in project-crab/analyses for each sample (number starting with 3#####) and each lane.

Each directory contains three files:

  • abundance.h5
  • abundance.tsv
  • run_info.json

The interesting stuff is in the abundance.tsv files because it has all the count data.

Next steps are to create a table combining all the count data for all the samples, and then using that in DESeq2 in R to create a heatmap… I think. Still trying to figure that out…


GitHub Issue #790

from Grace’s Lab Notebook