Add legend to outside of multipanel plot

Google lead me to a solution from Katie Lotterhos! I found it extremely useful.

http://dr-k-lo.blogspot.com/2014/03/the-simplest-way-to-plot-legend-outside.html

Information for DNR Revision

From email communication with Alex and Micah.

Alex:

1. Time of day and height 2016 (from calendar and field notes):
Case inlet: T July 19 12:15 -1.8
Skok: Wed July 20 11:51 -1.8
Willapa: Th July 21 9:28 -1.7

I believe Port Gamble was T July 19 11:11 -1.6 and Fidalgo Bay was W July 20 12:12 -1.7′

2. The animals were pretty uniformly placed ~0.5m from the sensors at the same tide height. We had instruments in all habitat patches but did not get data from all of them.

3. Oysters were young of the year – probably ~2 months old.

4. I hate questions like that. I would answer along the lines of "it was beyond the scope of this paper" which is true since this paper is a lot about the development of the method and exploring protein results, ya? I would love to see an analysis combining the different things that we measured but that is complicated. For one, I don’t think we did protein and fatty acid on all of the same individuals. I remember adjusting the samples to try to do the same oysters but can’t remember how much overlap there was. If you would like, we could see if looking at growth or FA helps interpret some of the patterns you talk about in the current paper.

Micah:

Collection dates for the first outplant (June to July 2016) and the second outplant (July to August 2016):

Round 1:
07.19.16: Oysters collected from Case Inlet at ~12:15pm
07.20.16: Oysters collected from Fidalgo Bay at ~12:15pm, and from Skokomish at ~12:00pm
07.21.16: Oysters collected from Port Gamble Bay at ~12:30pm, and from Willapa at ~9:15am

Round 2:
8.17.16: Oysters collected from Case Inlet at ~11:45am
8.18.16: Oysters collected from Fidalgo Bay at ~11:45am, and from Skokomish at ~11:30am
08.19.16: Oysters collected from Port Gamble Bay at ~12:00pm, and from Willapa Bay at ~8:45am

The timing of dissections varied a little, depending on how many people were there to help, but I believe that they were always completed within 3hrs of collection. Alex – I know you did some solo dissections at SK/WB. Do your field notes/memories reflect a 3hr window?

The tidal height at PG, SK, and WB was roughly -1ft MLLW; at CI & FB it was roughly -2.5ft MLLW.

The instruments and oysters were very close to each other (photo from PGE attached). The lateral distance was less than 1m, and the tidal height was the same for the animals and the instruments.

The oysters were 2-3months old at the time of outplant.

Regarding the reviewer comments, you could respond by circling back to the central goals and questions of the proteomics work. From the perspective of WDNR, we wanted to explore whether proteomics could provide a tool to diagnose stress in a species of high ecological and economic importance. If the techniques developed through this project were used as a diagnostic tool on animals collected from the field, we wouldn’t know the growth rate of that animal – we’d have expression patterns only. So to me, it actually makes sense to confine the manuscript to proteomics.

Update to project-virginica-oa Repository Contents

Found out that there were several large files in the project-virginica-oa repository (ex. bam files from bismark runs) that maxed out hummingbird’s hard drive. The solution was to stop running bismark on hummingbird and use Mox instead. To continue to use hummingbird for any other analysis, I needed to move the large files from hummingbird to gannet. I created checksums for all of my large file folders, than moved them to this directory on gannet. Once on gannet, I deleted the files off hummingbird and synced my Github repository. Lab notebook links may be broken. In terms of workflows, any time I generate a large file in a repository that cannot be synced to Github, I should move it to gannet.

DML Analysis: Analysis Methods

Read The epigenetic landscape of transgenerational acclimation to ocean warming to get an idea for potential analysis methods. The authors used different functions in methylKit to obtain DMRs:

"Briefly, the ‘methRead’ function of methylKit reads the mapping results with 10 reads per cytosine as a minimum coverage threshold. High coverage bases (99.9%) were filtered to exclude potential PCR bias and then normalized using ‘filterByCoverage’ and ‘normalizeCoverage’ functions, respectively. Genomic regions were categorized as CpG island, CpG shore, promoter, 5′ untranslated region (UTR), exon, introns, 3′ UTR and repeats. Methylated or unmethylated cytosines in each genomic region were summed for each sample by the ‘regionCounts’ function of methylKit. The P values of methylation differences for each region between two samples were calculated using a chi-squared test in the ‘calculateDiffMeth’ function."

They also generated heatmaps with DMR data, which would be useful in my case as well.

In BEDtools, they used the closest function to pair DMRs and genes:

"The closest gene to a DMR on the same scaffold was identified using ‘closest’ from BEDTools v. 2.2339. This resulted in 1,563 genes from 2,078 CpG DMRs, while 115 DMRs were on scaffolds without annotated genes (Supplementary Table 4)."

Because they paired gene expression data with epigenetic data, they did not do any gene enrichment. I’ll need to refer to Emma’s geoduck paper for those methods.

DML Analysis: GOterm Update

Sent blastx results and C. virginica genome to Mike Riffle in Genome Sciences. He is going to build me a portal that will do a GOterm enrichment on any set of genes I provide. This is similar to the portal he built for Emma’s geoduck paper.

Linear Mixed Effect Models in R

Good tutorial from Bodo Winter at UC Merced can be found here. He encourages using Likelihood Ratio Tests to obtain p-values.

DML Analysis: How to get GOterms

Gene Set Enrichment Analysis Workflow:

  • Get Entrez Gene IDs
  • Match IDs with GOterms
  • Use both topGO and DAVID for enrichment

Problem:

  • The gene IDs found in the C. virginica GFF files are not official, NCBI Entrez Gene IDs. Not sure what LOC{} is, but XM_{} are Genbank IDs. Genbank IDs from the GFF were not recognized by DAVID

Solution:

  • blastx to get Uniprot accession codes and GOterms
  • Use Uniprot and GOterms in DAVID
  • Convert Uniprot accession codes to Entrez IDs
  • Use Entrez IDs and GOterms in DAVID