Sam’s Notebook: Samples Received – C.bairdi Hemolymph and Tissue in Ethanol from Pam Jensen

Pam Jensen (NOAA) brought by a variety of hemolymph and tissues stored in ethanol to use for DNA sequencing with the new MinION (Oxford Nanopore) to help aid our transcriptomics work.

The box of samples was stored in FTR 26 (underneath the fume hood).

Pam will provide a digitized version of the sample spreadsheet, as well as column name explanations. I’ll add that info here when I receive it.

Image of sample storage box

from Sam’s Notebook https://ift.tt/35CxXwT
via IFTTT

Sam’s Notebook: qPCR – Geoduck hemolymph and hemocyte cDNA with vitellogenin primers

Previously isolated RNA on 20191125 and made cDNA on 20191126 from some geoduck hemolymph and hemocyte samples that Shelly asked me to run qPCRs on.

Ran qPCR with vitellogenin primers I previously designed on 20181129 and tested on 20181206

Primers:

SR IDs:

  • 1712 (Pg_DN51983i8_1471)
  • 1711 (Pg_DN51983i8_1347)

Positive control used was same pooled gonad cDNA used succesffully on 20181206 to test the assay. Due to sample limitations, positive control was not run in duplicate!

All qPCR reactions were run in duplicate. See qPCR Report (Results section below) for plate layout, cycling params, etc.

qPCR Master Mix calcs (Google Sheet):

NOTE: I actually ended up running this twice (55oC and 60oC)due to poor melt curves when run at 55oC anneal.

Sam’s Notebook: Reverse Transcription – P.generosa DNased Hemolypmh and Hemocyte RNA from 20191125

Performed reverse transcription on the DNased hemolymph and hemocyte RNA from yesterday.

Reverse transcription was performed using 100ng of each sample with M-MLV Reverse Transcriptase from Promega.

Briefly, 100ng of DNased RNA was combined with 1:10 dilution of oligo dT primers (Promega) and brought up to a final volume of 15uL in standard 0.5mL snap cap tubes. Tubes were incubated for 10mins at 70oC in a PTC-200 thermal cycler (MJ Research), using a heated lid. Samples were immediately placed on ice.

A master mix of buffer, dNTPs, water, and M-MLV reverse transcriptase was made, 10uL of the master mix was added to each sample, and mixed via finger flicking. Samples were incubated for 1hr at 42oC in a PTC-200 thermal cycler (MJ Research), using a heated lid, followed by a 5min incubation at 65oC.

Reverse transcription calcs (Google Sheet):

Resulting cDNA was stored in the small -20oC in:

from Sam’s Notebook https://ift.tt/2rwu91a
via IFTTT

Sam’s Notebook: RNA Isolation and Quantification – Geoduck hemolymph and hemocyte samples

Shelly asked me to isolate RNA and run some qPCRs on the following samples:

-80C Location(rack column row) Sample Name Tissue Reproductive Stage Sex
8 1 3 G-57 H hemolymph(~1mL cells + lymph) 7 F
8 1 3 G-61 H hemolymph(~1mL cells + lymph) 7 F
8 1 1 G-39 H hemolymph(~1mL cells + lymph) 5 F
8 1 1 G-31 H hemolymph(~1mL cells + lymph) 4 F
5 3 1 11/01/2018_1H pelleted hemocytes unknown unknown
5 3 1 11/01/2018_2H pelleted hemocytes unknown unknown
5 3 1 11/8/2018_1H pelleted hemocytes unknown unknown
5 3 1 11/8/2018_2H pelleted hemocytes unknown unknown
5 3 1 11/15/2018_1H pelleted hemocytes unknown unknown
5 3 1 11/15/2018_2H pelleted hemocytes unknown unknown

I was unable to find the 11/01 and the 11/15 samples.

Hemo samples used for RNA isolation in tube rack

Yaamini’s Notebook: DML Analysis Part 43

Revising manuscript figures

The new submission deadline is Dec. 31! I’d like to get a full draft in for comments before the end of the month, so I’ve been regaining motivation and slowly chipping away at manuscript edits. The task I focused on this week was revising figures and making some new ones. I chose this task because I wanted to finalize my methods and results sections before tackling my introduction and discussion, and because making figures is fun and it seemed like the best way to trick myself into being productive :satisfied:

Recoloring plots!

First task: creating color schemes for my plots! I picked the Green-Blue color scheme from RColorBrewer, and checked that it was colorblind-friendly with dichromat.

plotColors <- rev(RColorBrewer::brewer.pal(5, "GnBu")) #Create a color palette for the barplots. Use 5 green-blue shades from RColorBrewer. Reverse the order so the darkest shade is used first. barplot(t(t(proportionData)), col = dichromat(plotColors)) #Check that the plot colors will be easy to interpret for those with color blindess 

Screen Shot 2019-11-23 at 2 00 26 PM

Figure 1. Plot color scheme with dichromat adjustment.

I started by recoloring previously-created figures: my methylation distribution (code here), stacked barplots (code here), and DML location figures (code here).

Screen Shot 2019-11-15 at 4 57 01 PM

Screen Shot 2019-11-15 at 4 56 12 PM

Screen Shot 2019-11-15 at 4 55 56 PM

Screen Shot 2019-11-15 at 4 57 11 PM

Screen Shot 2019-11-15 at 4 57 20 PM

Figures 2-6. Recolored versions of CpG locus distribution, CpG location stacked barplot, DML location stacked barplot, multipanel with DML locations in chromosomes and genes and hyper-hypomethylation breakdown, and scaled DML distribution.

GOslim reanalysis

The next task I tackled was reanalyzing my GOslim information by following methods from my C. gigas GOslim analysis. In this Jupyter notebook, I modified my GOslim annotations. I used GOslim annotations that included repeat GOslim terms for each gene, and included “uncharacterized biological processes” as a term in the analysis. Based on feedback from Steven and the rest of my committee, I first removed duplicate entries:

#Remove duplicate entries #Count the number of unique IDs with GOSlim terms !uniq Blastquery-GOslim-BP.sorted > Blastquery-GOslim-BP.sorted.unique !uniq -f1 Blastquery-GOslim-BP.sorted.unique | wc -l 

There were 126,112 unique C. virginica transcripts matched with GOslim terms. Once I removed duplicate entries, I removed any entry with “uncharacterized biological processes” using grep --invert-match:

#Remove all "other biological processes" #Count the number of unique CGI IDs with defined GOSlim terms !grep --invert-match "other biological processes" Blastquery-GOslim-BP.sorted.unique \ > Blastquery-GOslim-BP.sorted.unique.noOther !uniq -f1 Blastquery-GOslim-BP.sorted.unique.noOther | wc -l 

There were 105,959 unique C. virginica transcripts matched with defined GOslim terms. I moved onto this R Markdown file to annotate all tested genes and genes with DML with GOslim information. For all genes tested with GO-MWU, 27/2281 GO-MWU entries did not have accompanying GOslim entries, and 425/5283 GO-MWU entries did not have matching GOslim entries for genes with DML. After calculating the percentage of genes involved in each biological process, I created bar plots.

Screen Shot 2019-11-19 at 11 04 50 AM

Screen Shot 2019-11-19 at 10 55 28 AM

Figure 7. Percent of all genes tested involved in various biological processes.

Figure 8. Percent of genes with DML involved in various biological processes.

I figured it made sense to have a plot with biological process information for all tested genes and genes with DML side-by-side, so I created a multipanel plot. Instead of using mirror plot code like I did with the scaled DML distribution, I just created a multipanel with no margin between the two plots. I extended the leftside outer margin to make space for biological process labels. The code can be found in this R Markdown file.

Screen Shot 2019-11-19 at 2 36 59 PM

Figure 9. Side-by-side comparison of biological processes represented by all genes tested using GO-MWU enrichment and genes with DML.

I think this plot clearly shows why we didn’t see any significantly enriched gene ontology categories: the distributions are so similar! Having the background to compare to the GOslim categories of genes with DML drives the point home.

Julian suggested I give more weight to genes with multiple DML, or less weight to genes with DML that are part of multiple biological processes. The first suggestion is good, but becuase we don’t know the functional role of genes with multiple DML for gene expression in this species, I don’t know of giving more weight to genes with multiple DML makes sense. I think my new processes takes into account genes with multiple processes, since I’m looking at how many times that process is represented in my data.

Principal component analyses with methylKit

A while back, Katie suggested I compare PCAs for all my percent methylation data and DML. Instead of writing code from scratch using my multivariate class notes, I decided to use functions built into methylKit. Turns out that this should have been easy but in reality was way more complicated than it should have been. Understanding the matrix the data was in and what the functions were doing took me at least a day and a half. The documentation in the methylKit manual is somewhat helpful, but I found I had more questions than it could answer! I dug through specific function pages on the R Documentation page and even the methylKit publication. But I digress…

I returned to my methylKit R Markdown file and examined the structure of the methylation information matrix. It’s a methylBase object, which (spoiler alert) means it’s a pain in the ass. The function PCASamples calculates a percent methylation matrix by dividing the number of cytosines by the sum of cytosines and thymines at that locus (C/C+T). Then, it uses prcomp to create the PCA. I struggled for a while to figure out how to customize the PCA that methylKit creates! FINALLY after hours of searching, I learned I could use the argument obj.return = TRUE to save the PCA object! I could then use summary to look at how much variance each PC explained.

allDataPCA <- PCASamples(methylationInformationFilteredCov5Destrand, obj.return = TRUE) #Run a PCA analysis on percent methylation for all samples. methylKit uses prcomp to create the PCA matrix summary(allDataPCA) #Look at summary statistics. The first PC explains 18.1% of variation, the second PC explains 11.7% of variation 

According to Alan, my PCA explains more variance than his does. That’s interesting, wbut I don’t know what it means. I then used ordiplot to create a better looking PCA. I modeled my code after code I used for the DNR paper.

fig.allDataPCA <- ordiplot(allDataPCA, choices = c(1, 2), type = "none", display = "sites", cex = 0.5, xlim = c(-400, 200), xlab = "", ylab = "", xaxt = "n", yaxt = "n") #Use ordiplot to create base biplot. Do not add any points points(fig.allDataPCA, "sites", col = c(rep(plotColors[2], times = 5), rep(plotColors[4], times = 5)), pch = c(rep(16, times = 5), rep(17, times = 5)), cex = 3) #Add each sample. Darker samples are ambient, lighter samples are elevated pCO2 #Add multiple white boxes on top of the default black box to manually change the color box(col = "white") box(col = "white") box(col = "white") box(col = "white") box(col = "white") box(col = "white") box(col = "white") box(col = "white") box(col = "white") box(col = "white") ordiellipse(allDataPCA, plotCustomization$treatment, show.groups = "1", col = plotColors[4]) #Add confidence ellipse around the samples in elevated pCO2 ordiellipse(allDataPCA, plotCustomization$treatment, show.groups = "0", col = plotColors[2]) #Add confidence ellipse around the samples in ambient pCO2 axis(side = 1, at = seq(-400, 200, 200), col = "grey80", cex.axis = 1.7) #Add x-axis mtext(side = 1, text = "PC 1 (18.1%)", line = 3, cex = 1.5) #Add x-axis label axis(side = 2, labels = TRUE, col = "grey80", cex.axis = 1.7) #Add y-axis mtext(side = 2, text = "PC 2 (11.7%)", line = 3, cex = 1.5) #Add y-axis label mtext(side = 3, line = -5, adj = c(-100,0), text = " a. All CpG Loci", cex = 1.5) 

Screen Shot 2019-11-23 at 1 23 05 PM

Figure 10. PCA with percent methylation data for all CpG loci with 5x data across all samples.

I then decided to do the same thing with just DML information. This is when it got more complicated than it needed to be. The original methylation data is in a methylBase object, and PCASamples only accepts input in that format. I needed a way to subset DML without compromising the format. According to the methylKit manual, you can only subset rows using either select or []. After two hours of banging my head against a wall trying to join or merge my list of DML with all methylation data, I left the office and immediately had an epiphany. I realized I could get the row numbers of DML in the original methylation data, save the row numbers as a vector, then use that vector to select the rows I want, and save it as a methylBase object.

DMLPositions <- rep(0, times = length(diffMethStats50FilteredCov5Destrand$chr)) #Create an empty vector with 598 places to store row numbers for (i in 1:length(DMLPositions)) { DMLPositions[i] <- which(getData(diffMethStats50FilteredCov5Destrand)$start[i] == getData(methylationInformationFilteredCov5Destrand)$start) } #For each DML, save the row number where that DML is found in methylationInformationFilteredCov5Destrand tail(DMLPositions) #Confirm vector was created 
DMLMatrix <- methylationInformationFilteredCov5Destrand[DMLPositions,] #Subset methylationInformationFilteredCov5Destrand to only include DML and save as a new methylBase object sum((DMLMatrix$start) == (diffMethStats50FilteredCov5Destrand$start)) == length(diffMethStats50FilteredCov5Destrand$start) #Confirm that start columns are identical. If they are identical, the sum of all TRUE statements should equal the length of the original methylBase object tail(DMLMatrix) #Confirm methylBase object creation 

I then used similar code wtih PCASamples and ordiplot to create a refined figure.

Screen Shot 2019-11-23 at 1 23 24 PM

It’s interesting that the first PC explains almost half of the variance in the data! PC 1 is definitely related to experimental conditions.

Figure 11. PCA with percent methylation data for DML.

And you betcha I made a multipanel plot.

Screen Shot 2019-11-23 at 1 24 03 PM

Figure 12. Multipanel plot with PCAs for all CpG loci with 5x data across all samples and DML only.

Heatmaps for DML

The last thing I wanted to do was make a heatmap of DML! I used percMethylation within methylKit to obtain the percent methylation matrix PCASamples uses.

percMethDML <- percMethylation(DMLMatrix, rowids = TRUE) #Get percent methylation for all samples at DML. Include row IDs (chr, start, end) information 

I used pheatmap and heatmap.2 (from gplots) to create heatmaps:

pheatmap(percMethDML, color = rev(plotColors), cluster_rows = TRUE, clustering_distance_rows = "euclidean", treeheight_row = 70, show_rownames = FALSE, cluster_cols = TRUE, clustering_distance_cols = "euclidean", treeheight_col = 40, show_colnames = FALSE, annotation_col = data.frame(pCO2 = factor(rep(c("Ambient","Treatment"), each = 5))), annotation_colors = list(pCO2 = c(Ambient = "grey90", Treatment = "grey10")), annotation_legend = FALSE, annotation_names_col = FALSE, legend = TRUE) #Create heatmap using pheatmap using percMethDML and plotColors color scheme. Cluster rows and columns using euclidean distances. Adjust the dendogram tree heights and do not show any row or column names. Create a dataframe with treatment information using annotation_col. Use annotation_colors to indicate colors for treatment ("grey90") and ambient ("grey10") samples. Do not include an annotation_legend or name for annotations (annotatino_names_col). Include a legend. 
par(oma = c(0, 1, 0, 0)) #Adjust outer margins heatmap.2(percMethDML, col = plotColors, scale = "none", margins = c(1,1), trace = "none", tracecol = "black", labRow = FALSE, labCol = FALSE, ColSideColors = c(rep("grey90", times = 5), rep("grey10", times = 5)), key = TRUE, keysize = 1.8, density.info = "density", key.title = "", key.xlab = "% Methylation", key.ylab = "", key.par = list(cex.lab = 2.0, cex.axis = 1.5)) #Create heatmap using heatmap.2 from gplots package using percMethDML data. Use plotColors but do not scale data, label rows, or label columns. Use ColSideColors to indicate colors for treatment and ambient samples. Add a legend using key, and adjust keysize. Have key display density data with density.info. Do not add a key title or y-axis label, and label x axis with key.xlab. mtext("Density", cex = 1.6, las = 3, adj = 0.8, padj = -29) #Manually add y-axis label for key since heatmap.2 doesn't let you change font size 

I’ve used pheatmap before, but I like how heatmap.2 also generates a density plot as part of the legend. I decided to save the heatmap.2 version for the paper.

Screen Shot 2019-11-23 at 1 22 15 PM

Figure 13. Heatmap of DML with percent methylation density plot in legend.

I tried making a multipanel plot with the heatmap and PCAs, but heatmap.2 calls plot.new within its code so I couldn’t create a multipanel plot! I could use gridGraphics, but I didn’t want to figure out an entirely new package. If I wanted to create a multipanel in the future, I might just use InDesign or PowerPoint.

Going forward

  1. Update methods and results
  2. Revise the discussion
  3. Revise the introduction
  4. Revise my abstract
  5. Send to collaborators and committee members
  6. Address any new edits and clean up the text
  7. Update paper repository
  8. Submit to the Special Issue
  9. Post the paper on bioRXiv

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2OfQLfF
via IFTTT

Grace’s Notebook: Pooled 6 new samples for sequencing

Today I pooled samples for the 6 new libraries for sequencing. There is over 1000ng in each pooled sample, so everything is ready to go! Details in post.

Pooling

I pooled 110ng of RNA from 10 samples within each of the 6 libraries. Math below:

FRP pool_number trtmnt_tank day infection_status maturity tube_number total-yield_ng elution_vol vol_for_pool
6112 5 NA 9 0 I 28 287.3 13 4.977
6136 5 NA 9 0 M 91 204.1 13 7.006
6143 5 NA 9 0 M 10 297.7 13 4.803
6150 5 NA 9 0 M 119 419.9 13 3.406
6161 5 NA 9 0 M 53 405.6 13 3.526
6213 5 NA 9 0 I 108 176.8 13 8.088
6106 5 NA 9 0 M 5 174.2 13 8.209
6156 5 NA 9 0 I 24 366.6 13 3.901
6157 5 NA 9 0 M 169 365.3 13 3.915
6178 5 NA 9 0 M 124 296.4 13 4.825
6122 6 NA 9 1 I 133 305.5 13 4.681
6201 6 NA 9 1 I 163 211.9 13 6.748
6206 6 NA 9 1 I 103 208 13 6.875
6174 6 NA 9 1 I 20 360.1 13 3.971
6177 6 NA 9 1 I 54 327.6 13 4.365
6187 6 NA 9 1 I 158 241.8 13 5.914
6199 6 NA 9 1 I 151 241.8 13 5.914
6231 6 NA 9 1 I 102 188.5 13 7.586
6233 6 NA 9 1 I 117 300.3 13 4.762
6237 6 NA 9 1 I 17 226.2 13 6.322
6104 7 cold 12 0 I 259 412.1 13 3.470
6106 7 cold 12 0 M 241 169 13 8.462
6153 7 cold 12 0 M 209 315.9 13 4.527
6157 7 cold 12 0 M 224 249.6 13 5.729
6160 7 cold 12 0 M 238 494 13 2.895
6172 7 cold 12 0 M 316 373.1 13 3.833
6175 7 cold 12 0 I 246 269.1 13 5.314
6178 7 cold 12 0 M 218 162.5 13 8.800
6189 7 cold 12 0 I 216 577.2 13 2.477
6191 7 cold 12 0 I 227 234 13 6.111
6118 8 cold 12 1 I 240 280.8 13 5.093
6128 8 cold 12 1 I 231 492.7 13 2.902
6141 8 cold 12 1 I 249 241.8 13 5.914
6148 8 cold 12 1 I 213 442 13 3.235
6149 8 cold 12 1 I 226 340.6 13 4.198
6163 8 cold 12 1 I 228 364 13 3.929
6164 8 cold 12 1 I 257 353.6 13 4.044
6173 8 cold 12 1 I 210 432.9 13 3.303
6174 8 cold 12 1 I 233 457.6 13 3.125
6177 8 cold 12 1 I 250 410.8 13 3.481
6223 9 warm 12 0 M 270 380.9 13 3.754
6228 9 warm 12 0 I 290 370.5 13 3.860
6230 9 warm 12 0 M 292 223.6 13 6.395
6232 9 warm 12 0 M 286 663 13 2.157
6234 9 warm 12 0 I 263 345.8 13 4.135
6235 9 warm 12 0 I 297 531.7 13 2.689
6238 9 warm 12 0 M 375 301.6 13 4.741
6242 9 warm 12 0 M 377 289.9 13 4.933
6253 9 warm 12 0 I 275 387.4 13 3.691
6265 9 warm 12 0 I 282 317.2 13 4.508
6231 10 warm 12 1 I 278 230.1 13 6.215
6233 10 warm 12 1 I 371 370.5 13 3.860
6244 10 warm 12 1 I 284 358.8 13 3.986
6245 10 warm 12 1 I 365 481 13 2.973
6249 10 warm 12 1 I 279 461.5 13 3.099
6255 10 warm 12 1 I 262 536.9 13 2.663
6256 10 warm 12 1 I 283 247 13 5.789
6257 10 warm 12 1 I 363 269.1 13 5.314
6261 10 warm 12 1 I 273 469.3 13 3.047
6262 10 warm 12 1 I 289 352.3 13 4.059

Results:

Ran 2ul of each pooled sample on Qubit RNA Hs

trtmnt_tank day infection_status total-yield_ng elution_vol vol_for_pool total_vol_ul qubit_vol_ul qubit_sample_ng total_ng_RNA_pool final_sample_vol
NA 9 0 287.3 13 4.977375566 52.65550411 2 21 1063.765586 50.65550411
NA 9 1 305.5 13 4.680851064 57.13842355 2 21.5 1185.476106 55.13842355
cold 12 0 412.1 13 3.470031546 51.61757335 2 23.6 1170.974731 49.61757335
cold 12 1 280.8 13 5.092592593 39.22471819 2 27.2 1012.512335 37.22471819
warm 12 0 380.9 13 3.754266212 40.86453844 2 26.9 1045.456084 38.86453844
warm 12 1 230.1 13 6.214689266 41.00446376 2 26.3 1025.817397 39.00446376

There’s at least 1000ng of RNA in all pooled samples!! :fireworks: :tada:
Things are ready to go to a sequencing facility.

from Grace’s Lab Notebook https://ift.tt/2rfM2RY
via IFTTT

Sam’s Notebook: PCR – Crassostrea gigas and sikamea Mantle gDNA from Marinellie Shellfish Company – No Multiplex

Primers and cycling parameters were taken from this publication:

SR ID Primer Name Sequence
1727 COreverse CAGGGGGCCGTTCGCGGTCAACGCT
1726 COCsi546r AAGTAACCTTAATAGATCAGGGAACC
1725 COCgi269r TCGAGGAAATTGCATGTCTGCTACAA
1724 COforward GGGACTACCCCCTGAATTTAAGCAT

Instead of running a multiplex PCR as before, I ran each set of species-specific primer pairs independently.

The COforward/reverse primers should amplify any Crassostrea spp. DNA (i.e. a positive control – 697bp) and the other two primers will amplify either C.gigas (Cgi269r – 269bp) or C.sikamea (Csi546r – 546bp).

Master mix calcs:

Component Single Rxn Vol. (uL) Num. Rxns Total Volumes (uL)
DNA 4 NA NA
2x Apex Master Mix 12.5 18 225
COforward (100uM) 0.15 18 2.7
reverse primer (100uM) 0.10 18 1.8
H2O 8.25 18 148.5
25 Add 21uL to each PCR tube

Cycling params:

95oC for 10mins

30 cycles of:

  • 95oC 1min
  • 51oC 1min
  • 72oC 1min

72oC 10mins

Used the GeneRuler DNA Ladder Mix (ThermoFisher) for all gels:

GeneRuler DNA Ladder Mix