Shelly’s Notebook: Fri. Sept. 13, 2019 Pt. Whitney Juvenile Var.low pH experiment

Water chemistry

-Took discrete measurments and TA – @11:40AM I moved Apex probes monitoring the trash cans directly into the silos – for TA samles I took directly from the discrete measurement water (filtered it). The discrete measurement water samples were directly from the headers and inside the silos (using clean TA cups)

  • will finish full water chem analysis ASAP, but generally things seem on point

Animal check and clean

  • silo 2.1:
    • open?id=14Mb9t5zylQKhKQPLNQ46blhBxcjND9ID
    • open?id=1-DgGJYdii42ygBgvm1vU6nog7W9pm-CC
    • Used transfer pipette spoon to sift through sand to see if animals are buried. Uncovered at least 10 in 2 different areas, so Matt is right that the animals are just buried.
    • Checked the mesh to see if it’s fouling. It’s not, it looks really clean.
    • open?id=1dHtxLxHTCY92566mhyJLrGiskEoj-VST
  • Cleaned all trash cans:
    • disconnected pipe from manifold, drained manifold, and lifted manifold off
    • lifted silos out and moved to empty tote
    • dumped all trash can water
    • rinsed trash cans with header tote hose
      • all trash cans had light brown rinse water
  • equalized flow to ~37mL/5 sec in B1 and B2, check multiple ports
  • 3.12: clamp fell into silo, hopefully it didn’t crush them
  • over all animals look good, especially those that are buried.

Algae count

  • cellometer counted 3-5E5 cells/mL in water from one port on each manifold.
  • Flow rate from the manifold port was 37 mL/5 sec = 7.4mL/sec
  • conclusion: they are getting 222,000 cells/sec. for at least 12 hours/day so it seems like they have plenty

from shellytrigg

Yaamini’s Notebook: WGBS Analysis Part 6

Preliminary data for PCSGA

My PCSGA practice talk is tomorrow, and my real talk is on Tuesday! Today I’ll finish processing my data and get tables and figures prepared for my talk.

Revised bismark output in methylKit

EVERYTHING FINALLY PROCESSED CORRECTLY (as far as I can tell)! I moved all my output off Mox and onto gannet with this code:

 rsync --archive --progress --verbose /gscratch/scrubbed/yaamini/analyses/Gigas-WGBS/2019-09-11-Gigas-Bismark/* yaamini@  

I used the revised deduplicated and sorted files in methylKit. Something to remember: methylKit uses a Fisher’s exact test instead of a logistic regression since there are no replicates.

Table 1. Number of DML identified by methylKit that are a 100% different between samples with 10x coverage.

Coverage Date Created Number of DML
5 2019-09-12 729
5 2019-09-15 732
10 2019-09-12 625
10 2019-09-15 628

Looks like the new deduplicated and sorted files gave me a few more DML. Although I looked at different coverage metrics, I’m still going to use the 10x coverage data with a 100% methylation difference. These settings are stringent to account for the fact that I only have two samples.

I tried the diffMethPerChr to obtain a breakdown of DML by chromosomes. The output doesn’t look that great, but at least it’s a visual that shows how sparse the DML are in the genome.

 jpeg(filename = "2019-09-15-Loci-Analysis/2019-09-15-DML-Distribution-by-Chromosome.jpeg", height = 1000, width = 1000) #Save file with designated name diffMethPerChr(differentialMethylationStatsFilteredCov10Destrand, plot = TRUE, qvalue.cutoff = 0.001, meth.cutoff = 99) #Look at distribution of hyper- and hypomethylated DML per chromosome. Create an accompanying plot.  


Figure 1. Percentage of DML by chromosome.

There are also still weird things going on with the DML background when using destrand = TRUE…I’ll worry about this after PCSGA.

Screen Shot 2019-09-15 at 9 34 10 PM

Figure 2. destrand = TRUE output.

Verifying DML in IGV

I returned to this notebook and used my new bismark output to revise 5x and 10x coverage tracks. I moved the finished tracks to gannet, then imported these tracks into IGV along with the revised BEDfiles from methylKit.

For the most part, things lined up and looked good! Weird inconsistencies do exist…again, something to think about after PCSGA.

Screen Shot 2019-09-15 at 9 33 33 PM

Figure 3. Tracks and CG motifs not lining up in IGV.

Characterizing DML locations

In this issue, Steven provided this link and this link to C. gigas feature tracks. I created this notebook to download feature tracks from eagle and characterize DML locations (P.S. by “created,” I mean I copied my C. virginica notebook and deleted all the cells I no longer wanted. I feel so accomplished).

Table 2. Counts for each genome feature.

Genome Feature Track Length
CG motifs 10,035,701
Exons 196,691
Introns 176,049
Genes 28,027
Promoters 28,023
Transposable Elements 119,786

What’s interesting at first glance is that 1) there are not an equal number of genes and promoters and 2) there are 4 million less CG motifs in the C. gigas genome than the C. virginica genome.

Table 3. Number of overlaps with various genome feature tracks. For example, the 1.7% overlap between exons and CG motifs means that exons contained 1.7% of all CG motifs.

Genome Feature Track Overlaps with CG motifs Overlaps with DML Overlaps with TE
CG motifs N/A 624 (99.4%) 82,554 (0.8%)
Exons 172,056 (1.7%) 157 (25%) 2,597 (2.2%)
Introns 150,884 (1.5%) 285 (45.4%) 18,989 (15.9%)
Genes 28,015 (0.3%) 442 (70.4%); 398 unique genes (61.9%%) 11,748 (9.8%)
Promoters 5,696 (0.06%) 24 (3.8%) 3,966 (3.3%)
Transposable Elements 82,554 (0.8%) 8 (1.3%) N/A
No overlaps 5,118,363 (51%) 165 (26.3%) N/A

It’s interesting to see that most of the DML are actually in introns. This makes me thing that DML could be involved with alternative splicing. There are also slightly more DML in unannotated intergenic regions than exons. I put the overlap proportions in this file and compared overlap proportions in this R Markdown script. I didn’t know how to effectively put files together to generally characterize methylation trends, so I compared total CpGs with DML instead of methylated CpGs with DML. The distributions are different, and I generated a figure which can be found here in pdf form.

Annotating output

Bottom line, this was unsuccessful. I posted this issue to see if we had any annotated gene information since the GFF I’m using has CGI IDs and does not match up with what I see on GigaTON. Steven ran a blast, and I used that output to try and annotate my DML-gene overlaps in this notebook. Something’s fishy though because I can’t do the annotation! I’m pretty sure I modified and joined all my files correctly until then, so there must be some reason why the IDs aren’t matching up. At this point, I think it’s time to make a talk instead of trying to work out analyses.

Going forward

  2. Present preliminary results at PCSGA
  3. Figure out how to clean up results for WSN and ASLO
  4. Potentially sequence more samples…?
  5. Return to C. virginica data and finish. that. paper.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student