Laura’s Notebook: Running list of R tips from super-advanced class

Cheat Seets:


here() :
read_csv(): the way to read in a csv file tidily (as opposed to ‘read.csv’). Read in all data using “read_THING()”
unite(): reshape a dataframe
filter(): subset a dataframe
select(): select certain columns in a tibble
mutate(): create a new column as a function of another function
pivot_longer(): convert a tibble from wide to long format
pivot_wider(): covert a tibble from long to wide
left_join(): take data set 1, and join matching values from data set 2 <– like bind/merge functions (but smarter)
right_join(): take data set 2, and join matching values from data set 1
inner_join(): join rows with matching values
full_join(): join all columns and rows

stringr package

lubridate package

years() year()

Shelly’s Notebook: Fri. Sept. 27, Pt. Whitney Juvenile Geoduck var.low pH experiment

Water chem

  • Water level was low in the low pH header tank uc?export=view&id=1Xd1XBBGEwIlfxi-sW1oZ9WzkTtE-ucOU
    • Appears to be the same thing that happened to the ambient tank last week. I turned flow up so it can maintain the water level
  • Took discrete measurements and TA between 10:45-11.
  • for TA, I took directly from the discrete measurment water after filtering it through 20uM. The discrete measurement water samples were directly from the headers and inside the silos (using clean TA cups)
    • everything looked as it should
  • 12-12:30pm calibrated all Apex probes

algae check

Count for both B2 and B3 were ~ 6 x 10^5 cells/mL with a 7.25mL/sec. flow rate. Each second, animals are seeing ~ 4.3 x10^6 cells.

animal check

Var.low pH treatment

ambient offspring:

  • 2.1: uc?export=view&id=1sx8pmxnmp-1jFnhUJTiBtbP-n1PlEOgI
  • 2.2: uc?export=view&id=1dm1_0RD6kBU3VFWu7KSF9-2QQdGg77lL
  • 2.3: uc?export=view&id=1s6Z6yAzp7lCf40YBvLcnbToaVCa7eIT0
  • 2.4: uc?export=view&id=1hBoGdOtx6fO18fibnYBfNQc9CBNbuBvP
  • 2.5: uc?export=view&id=15f9NxAjAZaFqteIyFK-XhfGSXhhXpzI3

var.low pH offspring:

  • 2.8: uc?export=view&id=1vPxaFJZZJa6sP3ZhRI-j9CH0PDN2tmSs
  • 2.9: uc?export=view&id=1gK-QfzzwjOIDE8gVcGETlD79Ik9-pJ00
  • 2.10: uc?export=view&id=1l1G4TFCnYrKk_TfJeQRUsOz6IK5C5rNC
  • 2.11: uc?export=view&id=1YFM01VQYDNv1vXbtOo231HHgn77TfI0E
  • 2.12: uc?export=view&id=1DtRxoycd8hGn_MERXMP79pbUE-lQmvEv

Ambient pH treatment

ambient offspring:

  • 3.1: uc?export=view&id=1l1hLDnhetVK69jQCaW0tfAW5OJSKvLnJ
  • 3.2: uc?export=view&id=18l6p4nUnFxUVAp1pPfqhGvpNAx7XjhiC
  • 3.3: uc?export=view&id=1ENKVSbDTGsz6pxUOTYKB0clE16Cg8MyQ
  • 3.4: uc?export=view&id=18nf55Hlgu9AWYRolw4vybzxdmARDOqVb
  • 3.5: uc?export=view&id=17AWzOXihJCnFJSEIFN3n6_OBbjV9P8vp

var.low pH offspring:

  • 3.8: uc?export=view&id=1Jsi7rlUeS9HIv6TEQJf30WrNaDAbR60J
  • 3.9: uc?export=view&id=1GkPUf53tRsBpp9JGE6fvYCnn1CddcU0Q
  • 3.10: uc?export=view&id=1W7NqZud86OM1DWvdK0bGwl7Wz3b-_QNM
  • 3.11: uc?export=view&id=1z7sCu8CNm-z9RNELejGcxyFhVnKHNY7Z
  • 3.12: uc?export=view&id=1yu05IV8d8sVuOymezh-ByE9aXQHdSulE
  • Shells of animals in the var.low pH treatment are very fragile. Many are broken.
  • I tried to clean out the dead animals from all silos, but couldn’t get everything.
  • The animals from the var.low pH parents seem to be doing better (survivorship and size judged by eye) than the animals from ambient parents when both are exposed to the var.low pH treatment.

from shellytrigg

Grace’s Notebook: Oysterseed DIA – New Analyses and wrapping up paper

I’ve been working on finishing up the Oysterseed DIA paper, and met with Steven today to discuss it. Details in post below:

Meeting with Steven notes:

He added a lot of helpful comments and suggestions to my paper: Oyster-Larval-Proteomics-2015

I tried addressing some of the comments this morning and up until we met at 2pm.

I added 2 tables:

  1. Enriched GO terms (determined by DAVID) from the 33 significantly differentially abundant 23C proteins
  2. Enriched GO terms (determined by DAVID) from the 36 significantly differentially abundant 29C proteins

I also added an experimental timeline that I will edit this weekend after getting feedback from Steven today.

Other things to do:

  • Methods: remove MS Stats figure part – just have MS Stats in methods as a way to get list of 2808 proteins that were sig. diff. abundant using our set threshold of >2.00 and <-2.00 log-2 fold change
  • Results: add more DAVID and REVIGO results using 2808 proteins annotated with Uniprot (proteins_comp_annot_threshold.csv) as the gene list and the blast results with Uniprot IDs from Steven – blast with C. gigas proteome against uniprot/sprot database ( (ADD THIS TO METHODS, TOO)
  • Discussion: make it better (notes for that and outline in Oyster-Larval-Proteomics-2015)

Things I did today:

Starting to address “Results” for describing sig. diff. abundant proteins relative to proteome

I took the Uniprot IDs (Uniprot ACCESSION) from proteins_comp_annot_threshold.csv (Uniprot IDs for the 2,808 sig. diff. abundant proteins) and put that into DAVID as the gene list.

I took the Uniprot IDs (Uniprot ACCESSION) from (Blast that Steven did with proteome and uniprot database), and put those as the background in DAVID.

The resulting file came from that:

It’s a huge list. I copy and pasted it into excel to separate out the GO:IDs and the fold enrichment values.

I input the list of GO IDs and fold enrichment values into REVIGO, and got some visuals, but also got this warning message:

While parsing your data, warning(s) were encountered: You have provided a huge list of GO terms, containing 583 terms in the BIOLOGICAL_PROCESS namespace, and only approx. 350 of these terms will be shown in the table below, or be available in the exported text table. REVIGO keeps the terms with the strongest p-values (or enrichements, depending what you had specified on the input form). Consider filtering the list by an external criterion (e.g. enrichment) before submitting to REVIGO. 

And I got this tip from REVIGO following the warning:

Tip: your resulting list of GO terms seems to be quite long. If you want to reduce it further, press the Back button in your browser and choose a different setting for the "allowed similarity" parameter. 

I tried using that tip and selected a parameter of 0.5 (the value less than that gave a warning that I could be missing some important GO terms), and still got the same warning and tip messages.

Here are screenshots of the visuals that REVIGO came up with (using an allowed similarity parameter of 0.5):

from Grace’s Lab Notebook

Grace’s Notebook: Crab Project – What I did for PCSGA 2019

In this post I will detail what I did for PCSGA 2019. I did some new analyses and found some new results from our first assembled transcriptome.


Steven made a BLAST database from the Dinophyceae proteins (Dinophyceae is class under which dinoflagellates are), and did a BLASTx against the first assembled transcriptome (Day 26, infected and uninfected, cold and ambient).

I used this script to separate the transcriptome into putative crab and putative Hematodiniium genes: sep_crab-hemat-genes.Rmd

That script resulted in the following files:

Annotating BLAST outputs

In jupyter notebooks, I annotated the two blastx outputs (crab and Hematodinium) with GOslim terms.

I then used the following R script to create files for creating pie charts for biological process GO slim terms:

The script resulted in the following files:

Creating pie charts

I used the count.csv files to create pie charts in google sheets:

For the talk, I went more in-depth into the “stress response” slice in the crab GOslim pie. I made a table of some crab genes with notes on their names and functions, and links to the uniprot database on those genes:

The ones highlighted are the ones I chose to talk about in the talk.

The talk and slides

Link to final google slides: Crandall_Wed_1645

Link to slidedeck on figshare: Effects_of_Bitter_Crab_Disease_on_the_gene_expression_of_Alaskan_Tanner_Crabs

Thoughts on talk

It was supposed to be 12 minutes, with 3 minutes for questions. I have no idea how much time I took, but I did get 2 clarifying questions at the end of the talk.

It was my first time presenting at a conference, and my slot was on day 2 (Wednesday) at 4:45pm. It was a struggle to stay energized and it was also a struggle to stay calm – I was really nervous!

I think I had good background information and that I explained things fairly well, but I wish I had gone a little more in depth into the genes that we found. I also think that for next time, I need to find ways to combat the nerves because they got a bit in the way of my ability to speak at the beginning – I collected myself well-enough after the third slide or so… but I want to improve!

from Grace’s Lab Notebook

Shelly’s Notebook: Tues. Sept. 24, Geoduck Broodstock Histology

This post is in reference to the histology done on the Fall-Winter 2018-2019 Broodstock conditioned in constant low pH. This is data analysis for the manuscript on pH effect on reproductive development

Past histology analysis

Scoring females with imageJ

Analysis of scores

from shellytrigg

Shelly’s Notebook: Mon. Sept. 23, Geoduck Broodstock Histology and Juv. low pH DMRs

Broodstock histology

Met with Kaitlyn in the am and came up with the following plan for scoring gonad histology:

  1. Females:
    • quantify follicle area
    • quantify egg area
    • calculate egg/follicle ratio
    • calculate follicle/tissue ratio
  2. Males:
    • quantify acini/tissue ratio
    • quantify spermatagonia/spermatid ratio (dark purple to light purple)

Comparing allc DMRs and 5x cov files

  • Steven’s 5x cov files:
    • still unsure about what happened in the strandedness code
    • include bases where MAPQ < 30
  • Allc files generated by methylpy (here):
    • filtered for bases with MAPQ >= 30 (–min-mapq 30 default)
  • Methylkit processBismarkAln default includes a ‘minqual’ filter for MAPQ >= 20
  • Not sure if [DMG pipeline] includes a MAPQ cutoff


  • whatever files we use to validate our DMRs, DMLs, or DMGs, they must be filtered the same way.
    • Otherwise differences may not be apparent when all reads are included…

Some lit on whether to use a MapQ score threshold or not:

from shellytrigg

Sam’s Notebook: Trimming/FastQC/MultiQC – P.generosa EPI FastQs with FASTP on Mox

Steven noticed that the M-Bias plots generated by Bismark from these files was a little wonky and asked that I try trimming them a bit more. The files were originally quality/adaptor trimmed with TrimGalore! on 20180516.

I decided to use some trimming software called fastp, since it had a lot of options (including trimming without the need for quality filtering/trimming, as well as being multi-threaded).

Looking at the M-Bias plots and the original FastQC assessment, I opted to trim the first 20bp from the 5’ end of all reads.

I followed this up with FastQC and MultiQC.