Yaamini’s Notebook: DML Analysis Part 34

Even more troubleshooting with IGV

Gene background issues

Before I could proceed with a gene enrichment, I needed to sort through my gene background issues. The gene background consists of all loci with 5x coverage in my dataset. Previously, my gene background was “hot flaming garbage” when I looked at it in IGV. Assuming that the gene background was working correctly in methylKit, I returned to my R Markdown file to see if the gene background issues were a result of the way I exported the file.

Looking at the gene background, I noticed that there were columns with coverage metrics, as well as the number of cytosines and thymines at each locus. The start and stop position for each locus was the same, which could be affecting how the visualization occurs in IGV.

Screen Shot 2019-05-15 at 1 56 52 PM

Figure 1. Gene background information in methylKit.

I decided to subset the first four columns for exporting: chromosome (chr), start position (start), stop position (stop), and strand (strand). I did this by creating a new dataframe. I also subtracted 1 from each start position to visualize the data. Finally, I exported the gene background as a BEDfile and looked at it in IGV.

 methylationInformationFilteredCov5DestrandReduced <- data.frame("chr" = methylationInformationFilteredCov5Destrand$chr, "start" = methylationInformationFilteredCov5Destrand$start, "stop" = methylationInformationFilteredCov5Destrand$end, "strand" = methylationInformationFilteredCov5Destrand$strand) #Subset data methylationInformationFilteredCov5DestrandReduced$start <- (methylationInformationFilteredCov5DestrandReduced$start - 1) #Subtract 1 from the start position for visualization write_delim(methylationInformationFilteredCov5DestrandReduced, "2019-05-14-Methylation-Information-Filtered-Destrand-Cov5.bed", delim = "\t", col_names = FALSE) #Write out all methylation information as a background to be used for gene enrichment analyses.  

I opened this IGV session, which had my previous gene background file in it. I added the new file and compared the two versions:

Screen Shot 2019-05-14 at 2 20 13 PM

Screen Shot 2019-05-14 at 2 21 09 PM

Screen Shot 2019-05-14 at 2 22 43 PM

Figures 2-4. Gene background visualization in IGV.

Looking at different sections, I found that the gene background aligned with CG motifs and various DML instead of looking like large regions of information! I feel confident with this new version of the gene background and think I could use it for gene enrichment.

Difference between gene and mRNA tracks

During lab meeting, I pointed out that there were coding sequences included in genes that were not part of the mRNA track. These coding sequences had defined exons. Kaitlyn pointed out that the exons are the elements retained in mRNA, but the exons in coding sequence (CDS) are the parts that get translated. That clears up the question I posed in this issue.

Refining the intron track

Based on this information, I decided to use subtractBed with the gene ane exon tracks to create my intron track. When I looked through that track in IGV, I found that there were some areas where the intron track looked good. Introns from coding sequences were included (unlike the one I generated using mRNA), and the first bp in every exon sequence was no longer being included in the intron track.

Screen Shot 2019-05-15 at 3 23 05 PM

Figure 5 Example of introns generated from the gene and exon tracks that are not included in the intron track I generated from mRNA or the intron track I downloaded from the Genomic Resources wiki.

Screen Shot 2019-05-15 at 3 22 40 PM

Screen Shot 2019-05-15 at 3 21 51 PM

Figure 6-7. Instances were the intron track from the Genomic Resources wiki includes the first bp of each exon, but the intron track I generated does not.

However, there were instances of the intron track I generated still including exons:

Screen Shot 2019-05-15 at 3 28 58 PM

Screen Shot 2019-05-15 at 3 25 57 PM

Figures 8-9. Areas where the intron track still contained exons.

I posted my findings in this comment and asked if it was an artifact of enforcing same strandedness with -s, or if it was worth using subtractBed on the intron and exon tracks to get rid of the overlaps. Since exons and introns are on the same strand, I don’t think -s is the issue, but perhaps there’s a different way I need to code it.

Going forward

  1. Finalize the intron track
  2. Conduct a gene enrichment for DML
  3. Figure out what’s going on with DMR
  4. Work through gene-level analysis
  5. Update paper repository
  6. Update methods and results
  7. Start writing the discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student http://bit.ly/2HuOOXP

Sam’s Notebook: Library Decisions – C.bairdi RNAs for Library Pools

I isolated a bunch of tanner crab RNA on 20190430 and Steven asked me to try to figure out some options for RNAseq libray pools.

Using Grace’s table join of my Qubit data (hemo_qub-for-libs.csv), I created a pivot table to quickly get an idea of how things looked (note: I added a column with total RNA yield for each sample – 10uL * concentration):

C.bairdi pivot table screen cap

Assuming a minimum of 1000ng is required by the UW’s Northwest Genomics Center (that was the requirement last time we had RNAseq performed by them; have emailed to confirm it’s still required), here are the libraries I think we can/should make.

Possible libraries, by “Day”:

  • Day 9
    • Infected vs Uninfected
  • Day 12
    • Infected vs Uninfected
    • Ambient vs Cold vs Warm
  • Day 26
    • Infected vs Uninfected
    • Ambient vs Cold

The “Day 9” samples are named poorly, as these are actually the samples prior to temperature exposures. Thus, the “trtmnt_tank” column (i.e. temperature treatment) in the table above doesn’t really apply. With that in mind, Day 9 samples can only be compared as Infected/Uninfected.

So, by default, those should be two libraries.

Then, we have the Day 12 and Day 26 samples. We can either do Infected/Uninfected OR Ambient vs. Cold, but can’t do both, as there isn’t sufficient RNA…

Admittedly, I’m not sure which aspect of this project was given more weight in the grant proposal:

  • response to disease over time
  • response to temperature over time

The answer to that question will guide library pooling.

With that said, if we do the following, we’ll have a bit of both pathogen response and temperature response:

  • Day 9 Infected/Uninfected
  • Day 12 Ambient/Cold
  • Day 26 Ambient/Cold

from Sam’s Notebook http://bit.ly/2HlPQXv

Shelly’s Notebook: Tues. May 14, 2019 Salmon-sea lice salinity x temp methylomes

Extracted gDNA from salmon skin samples

These samples are from Christian and are from animals that were infected with sea lice for the duration of the sea lice life cycle in two different temperatures (8C and 16C) and two different salinities (26psu and 32 psu). Sample info here

Kaitlyn helped with the extraction.

Followed the EZNA Mollusc DNA kit protocol

  • weighed out 40-50mg tissue and added to microfuge tube
    • remaining ethanol-fixed tissue is in a box on the bottom shelf of the 4C in 213
  • froze samples in liquid N2 and used disposable microtube pestle to pulverize sample
    • this was a little challenging as the pestle didn’t really seems to break up the tissue, just squish it. Also used pipette tip to smash tissue further.
    • samples sat on ice after being pulverized until we got through all the samples
  • incubated samlpes in 305uL ML1 buffer and 25uL Proteinase K solution for 1 hour at 60C
    • not all the sample was solubilized but a lot of it was
  • did the optional column equilibration step
  • eluted in 50uL elution buffer for 5 min at RT before spinning for 2 min at 10,000xg

DNA concentration

  • Qubit dsDNA BR kit for 20 samples + 2 standards
    • Ran Standard initially read 99.7ng/uL
      • read it again after every 5 samples and it consistently read 102ng/uL
    • sample concentrations are here: 20190408_SalmonSamples
      • lowest yield was 61ng/uL * 50uL = 3050ug so we have plenty of DNA
  • DNA are in a box in the -20C in 213 in the bottom drawer

Next steps

  • Digest 1ug of each sample with MspI and Taq-a-1 (o/n)
  • Zymo pico methyl library prep kit on salmon samples and sea lice samples

from shellytrigg http://bit.ly/2Hk9e7d