Laura’s Notebook: Running list of R tips from super-advanced class

Cheat Seets:


here() :
read_csv(): the way to read in a csv file tidily (as opposed to ‘read.csv’). Read in all data using “read_THING()”
unite(): reshape a dataframe
filter(): subset a dataframe
select(): select certain columns in a tibble
mutate(): create a new column as a function of another function
pivot_longer(): convert a tibble from wide to long format
pivot_wider(): covert a tibble from long to wide
left_join(): take data set 1, and join matching values from data set 2 <– like bind/merge functions (but smarter)
right_join(): take data set 2, and join matching values from data set 1
inner_join(): join rows with matching values
full_join(): join all columns and rows

stringr package

lubridate package

years() year()

Shelly’s Notebook: Fri. Sept. 27, Pt. Whitney Juvenile Geoduck var.low pH experiment

Water chem

  • Water level was low in the low pH header tank uc?export=view&id=1Xd1XBBGEwIlfxi-sW1oZ9WzkTtE-ucOU
    • Appears to be the same thing that happened to the ambient tank last week. I turned flow up so it can maintain the water level
  • Took discrete measurements and TA between 10:45-11.
  • for TA, I took directly from the discrete measurment water after filtering it through 20uM. The discrete measurement water samples were directly from the headers and inside the silos (using clean TA cups)
    • everything looked as it should
  • 12-12:30pm calibrated all Apex probes

algae check

Count for both B2 and B3 were ~ 6 x 10^5 cells/mL with a 7.25mL/sec. flow rate. Each second, animals are seeing ~ 4.3 x10^6 cells.

animal check

Var.low pH treatment

ambient offspring:

  • 2.1: uc?export=view&id=1sx8pmxnmp-1jFnhUJTiBtbP-n1PlEOgI
  • 2.2: uc?export=view&id=1dm1_0RD6kBU3VFWu7KSF9-2QQdGg77lL
  • 2.3: uc?export=view&id=1s6Z6yAzp7lCf40YBvLcnbToaVCa7eIT0
  • 2.4: uc?export=view&id=1hBoGdOtx6fO18fibnYBfNQc9CBNbuBvP
  • 2.5: uc?export=view&id=15f9NxAjAZaFqteIyFK-XhfGSXhhXpzI3

var.low pH offspring:

  • 2.8: uc?export=view&id=1vPxaFJZZJa6sP3ZhRI-j9CH0PDN2tmSs
  • 2.9: uc?export=view&id=1gK-QfzzwjOIDE8gVcGETlD79Ik9-pJ00
  • 2.10: uc?export=view&id=1l1G4TFCnYrKk_TfJeQRUsOz6IK5C5rNC
  • 2.11: uc?export=view&id=1YFM01VQYDNv1vXbtOo231HHgn77TfI0E
  • 2.12: uc?export=view&id=1DtRxoycd8hGn_MERXMP79pbUE-lQmvEv

Ambient pH treatment

ambient offspring:

  • 3.1: uc?export=view&id=1l1hLDnhetVK69jQCaW0tfAW5OJSKvLnJ
  • 3.2: uc?export=view&id=18l6p4nUnFxUVAp1pPfqhGvpNAx7XjhiC
  • 3.3: uc?export=view&id=1ENKVSbDTGsz6pxUOTYKB0clE16Cg8MyQ
  • 3.4: uc?export=view&id=18nf55Hlgu9AWYRolw4vybzxdmARDOqVb
  • 3.5: uc?export=view&id=17AWzOXihJCnFJSEIFN3n6_OBbjV9P8vp

var.low pH offspring:

  • 3.8: uc?export=view&id=1Jsi7rlUeS9HIv6TEQJf30WrNaDAbR60J
  • 3.9: uc?export=view&id=1GkPUf53tRsBpp9JGE6fvYCnn1CddcU0Q
  • 3.10: uc?export=view&id=1W7NqZud86OM1DWvdK0bGwl7Wz3b-_QNM
  • 3.11: uc?export=view&id=1z7sCu8CNm-z9RNELejGcxyFhVnKHNY7Z
  • 3.12: uc?export=view&id=1yu05IV8d8sVuOymezh-ByE9aXQHdSulE
  • Shells of animals in the var.low pH treatment are very fragile. Many are broken.
  • I tried to clean out the dead animals from all silos, but couldn’t get everything.
  • The animals from the var.low pH parents seem to be doing better (survivorship and size judged by eye) than the animals from ambient parents when both are exposed to the var.low pH treatment.

from shellytrigg

Grace’s Notebook: Oysterseed DIA – New Analyses and wrapping up paper

I’ve been working on finishing up the Oysterseed DIA paper, and met with Steven today to discuss it. Details in post below:

Meeting with Steven notes:

He added a lot of helpful comments and suggestions to my paper: Oyster-Larval-Proteomics-2015

I tried addressing some of the comments this morning and up until we met at 2pm.

I added 2 tables:

  1. Enriched GO terms (determined by DAVID) from the 33 significantly differentially abundant 23C proteins
  2. Enriched GO terms (determined by DAVID) from the 36 significantly differentially abundant 29C proteins

I also added an experimental timeline that I will edit this weekend after getting feedback from Steven today.

Other things to do:

  • Methods: remove MS Stats figure part – just have MS Stats in methods as a way to get list of 2808 proteins that were sig. diff. abundant using our set threshold of >2.00 and <-2.00 log-2 fold change
  • Results: add more DAVID and REVIGO results using 2808 proteins annotated with Uniprot (proteins_comp_annot_threshold.csv) as the gene list and the blast results with Uniprot IDs from Steven – blast with C. gigas proteome against uniprot/sprot database ( (ADD THIS TO METHODS, TOO)
  • Discussion: make it better (notes for that and outline in Oyster-Larval-Proteomics-2015)

Things I did today:

Starting to address “Results” for describing sig. diff. abundant proteins relative to proteome

I took the Uniprot IDs (Uniprot ACCESSION) from proteins_comp_annot_threshold.csv (Uniprot IDs for the 2,808 sig. diff. abundant proteins) and put that into DAVID as the gene list.

I took the Uniprot IDs (Uniprot ACCESSION) from (Blast that Steven did with proteome and uniprot database), and put those as the background in DAVID.

The resulting file came from that:

It’s a huge list. I copy and pasted it into excel to separate out the GO:IDs and the fold enrichment values.

I input the list of GO IDs and fold enrichment values into REVIGO, and got some visuals, but also got this warning message:

While parsing your data, warning(s) were encountered: You have provided a huge list of GO terms, containing 583 terms in the BIOLOGICAL_PROCESS namespace, and only approx. 350 of these terms will be shown in the table below, or be available in the exported text table. REVIGO keeps the terms with the strongest p-values (or enrichements, depending what you had specified on the input form). Consider filtering the list by an external criterion (e.g. enrichment) before submitting to REVIGO. 

And I got this tip from REVIGO following the warning:

Tip: your resulting list of GO terms seems to be quite long. If you want to reduce it further, press the Back button in your browser and choose a different setting for the "allowed similarity" parameter. 

I tried using that tip and selected a parameter of 0.5 (the value less than that gave a warning that I could be missing some important GO terms), and still got the same warning and tip messages.

Here are screenshots of the visuals that REVIGO came up with (using an allowed similarity parameter of 0.5):

from Grace’s Lab Notebook

Grace’s Notebook: Crab Project – What I did for PCSGA 2019

In this post I will detail what I did for PCSGA 2019. I did some new analyses and found some new results from our first assembled transcriptome.


Steven made a BLAST database from the Dinophyceae proteins (Dinophyceae is class under which dinoflagellates are), and did a BLASTx against the first assembled transcriptome (Day 26, infected and uninfected, cold and ambient).

I used this script to separate the transcriptome into putative crab and putative Hematodiniium genes: sep_crab-hemat-genes.Rmd

That script resulted in the following files:

Annotating BLAST outputs

In jupyter notebooks, I annotated the two blastx outputs (crab and Hematodinium) with GOslim terms.

I then used the following R script to create files for creating pie charts for biological process GO slim terms:

The script resulted in the following files:

Creating pie charts

I used the count.csv files to create pie charts in google sheets:

For the talk, I went more in-depth into the “stress response” slice in the crab GOslim pie. I made a table of some crab genes with notes on their names and functions, and links to the uniprot database on those genes:

The ones highlighted are the ones I chose to talk about in the talk.

The talk and slides

Link to final google slides: Crandall_Wed_1645

Link to slidedeck on figshare: Effects_of_Bitter_Crab_Disease_on_the_gene_expression_of_Alaskan_Tanner_Crabs

Thoughts on talk

It was supposed to be 12 minutes, with 3 minutes for questions. I have no idea how much time I took, but I did get 2 clarifying questions at the end of the talk.

It was my first time presenting at a conference, and my slot was on day 2 (Wednesday) at 4:45pm. It was a struggle to stay energized and it was also a struggle to stay calm – I was really nervous!

I think I had good background information and that I explained things fairly well, but I wish I had gone a little more in depth into the genes that we found. I also think that for next time, I need to find ways to combat the nerves because they got a bit in the way of my ability to speak at the beginning – I collected myself well-enough after the third slide or so… but I want to improve!

from Grace’s Lab Notebook

Shelly’s Notebook: Tues. Sept. 24, Geoduck Broodstock Histology

This post is in reference to the histology done on the Fall-Winter 2018-2019 Broodstock conditioned in constant low pH. This is data analysis for the manuscript on pH effect on reproductive development

Past histology analysis

Scoring females with imageJ

Analysis of scores

from shellytrigg

Shelly’s Notebook: Mon. Sept. 23, Geoduck Broodstock Histology and Juv. low pH DMRs

Broodstock histology

Met with Kaitlyn in the am and came up with the following plan for scoring gonad histology:

  1. Females:
    • quantify follicle area
    • quantify egg area
    • calculate egg/follicle ratio
    • calculate follicle/tissue ratio
  2. Males:
    • quantify acini/tissue ratio
    • quantify spermatagonia/spermatid ratio (dark purple to light purple)

Comparing allc DMRs and 5x cov files

  • Steven’s 5x cov files:
    • still unsure about what happened in the strandedness code
    • include bases where MAPQ < 30
  • Allc files generated by methylpy (here):
    • filtered for bases with MAPQ >= 30 (–min-mapq 30 default)
  • Methylkit processBismarkAln default includes a ‘minqual’ filter for MAPQ >= 20
  • Not sure if [DMG pipeline] includes a MAPQ cutoff


  • whatever files we use to validate our DMRs, DMLs, or DMGs, they must be filtered the same way.
    • Otherwise differences may not be apparent when all reads are included…

Some lit on whether to use a MapQ score threshold or not:

from shellytrigg

Sam’s Notebook: Trimming/FastQC/MultiQC – P.generosa EPI FastQs with FASTP on Mox

Steven noticed that the M-Bias plots generated by Bismark from these files was a little wonky and asked that I try trimming them a bit more. The files were originally quality/adaptor trimmed with TrimGalore! on 20180516.

I decided to use some trimming software called fastp, since it had a lot of options (including trimming without the need for quality filtering/trimming, as well as being multi-threaded).

Looking at the M-Bias plots and the original FastQC assessment, I opted to trim the first 20bp from the 5’ end of all reads.

I followed this up with FastQC and MultiQC.

[code]#!/bin/bash ## Job Name #SBATCH...

## Job Name
#SBATCH --job-name=bm
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=06-00:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/sr320/0923/

# Directories and programs

source /gscratch/srlab/programs/scripts/

find ${reads_dir}*_L001_R1_001_val_1_val_1.fq.gz \
| xargs basename -s _L001_R1_001_val_1_val_1.fq.gz | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome /gscratch/srlab/sr320/data/geoduck/v074 \
-p 4 \
-score_min L,0,-0.6 \
-1 /gscratch/srlab/sr320/data/caligus/{}_L001_R1_001_val_1_val_1.fq.gz \
-2 /gscratch/srlab/sr320/data/caligus/{}_L001_R2_001_val_2_val_2.fq.gz \

find *.bam | \
xargs basename -s .bam | \
xargs -I{} ${bismark_dir}/deduplicate_bismark \
--bam \
--paired \

${bismark_dir}/bismark_methylation_extractor \
--bedGraph --counts --scaffolds \
--multicore 14 \
--buffer_size 75% \

# Bismark processing report


#Bismark summary report


# Sort files for methylkit and IGV

find *deduplicated.bam | \
xargs basename -s .bam | \
xargs -I{} ${samtools} \
sort --threads 28 {}.bam \
-o {}.sorted.bam

# Index sorted files for IGV
# The "-@ 16" below specifies number of CPU threads to use.

find *.sorted.bam | \
xargs basename -s .sorted.bam | \
xargs -I{} ${samtools} \
index -@ 28 {}.sorted.bam

#bismark, #sbatch

Shelly’s Notebook: Thur. Sept. 19, 2019 Pt. Whitney Juvenile Var.low pH experiment

Water chemistry

  • Took discrete measurements and TA around 2:30pm
    • for TA, I took directly from the discrete measurment water after filtering it through 20uM. The discrete measurement water samples were directly from the headers and inside the silos (using clean TA cups)
    • everything looked as it should
  • @5:40pm calibrated B5 which was off by 0.08. All other probes were within 0.01.

Animal check

  • Tote B5 had a pretty low water level and Matt and I weren’t really sure why. He said the header upstairs was overflowing so he turned that down and it should give more water to the totes. He also turned the pump up a little bit. The animals in B3 were still receiving water although the flow felt a little weaker by the touch. I doubt it has been like this for a whole week, it seems anomlymous.
  • uc?export=view&id=1qJcx4kZI58M1znUqjQizEcRYnqCE292a
  • Algae line looked a little clogged, like it wasn’t really flowing. However disconnecting from the tote inflow pipe looked like it was dispensing the same volume as the lines going into the conical headers. Also algae came out from the tap near the pump head itself when turned on. So all good here.

Respirometry and sampling

  • filled 5L beakers with 1-2L 1uM FSW.
  • Submerged vials in well plate holders
  • turned off flow in silos and removed manifolds
  • took 1 animal/silo with transfer pipette spoon and then used 2 transfer pipettes like chopsticks to transfer the animal into the corresponding submerged vial
    • Did this for all silos, took about 20 minutes
      • Turned flow back on immediately after transferring animals and adjusted to 7mL/sec.
    • Noticed animals in low pH treatment had very fragile shells and are a little white like Sam has previously seen
      • it seemed like animals from ambient parents were more fragile (many broken shells on the sand surface.
  • Ran 1 respirometry trial around 5pm for about 30 minutes.
    • SDR plate positions here (A1 = bottom right corner):
      • uc?export=view&id=1-pLPP4zsD3X0uyL_HakmZWjEp5Eqm5SG
      • on SDR plate: uc?export=view&id=1TBMmtUUk-HM8rBXHE348p2JbSacWTSfq
  • After trial, emptied vials and animals onto a well plate lid (to track their positions; A1 = upper right corner and order listed on size measurement data table linked below) for shell length measurements
  • uc?export=view&id=1nT7hq6Htii_Wnno32OnSO6ripnLZ5uyx
  • Got wet weight for each animal by transfering animals onto paper towel (using the transfer pipette chopstick method) to gently blot off excess water. Then transferred animal into tared snap cap tube, recorded weight, and immediately froze in liquid N2.

Geoduck genome call at 12pm

  • We shared our progress with our three different DM methods:
    • loci (Steven)
    • DMGs (Hollie)
    • DMRs (Shelly)
    • DMLs are more just for curiousity and confirming sites for other methods at this point
    • DMRs need to be more stringently defined
      • I will continue working on this
    • We will focus on DMGs now (Hollie is leading this effort)
      • She will make a heatmap with treatment averages of % gene body methylation (with and without filtering for low variance)
      • Then she will do GO enrichment

from shellytrigg

Sam’s Notebook: SRA Submission – Hollie’s Juvenile OA BS-seq Data

Submitted sequencing data (52 samples; see table below) from Hollie’s BS-seq project looking at differences in DNA methylation between juvenile geoduck exposed to different pHs over time.

BioProject Accession: