Sam’s Notebook: Primer Design – Geoduck Vitellogenin using Primer3

In preparation for designing primers for developing a geoduck vitellogenin qPCR assay, I annotated a de novo geoduck transcriptome assembly last week. Next up, identify vitellogenin genes, design primers, confirm their specificity, and order them!

All of this was done in a Jupyter Notebook on my computer (Swoose).

Jupyter notebook (GitHub):

Annoated transcriptome FastA (271MB):

Although everything is explained pretty well in the Jupyter Notebook, here’s the brief rundown of the process:

  1. Download FastA file.
  2. Split into individual FastA files for each sequence (used pyfaidx v0.5.5.2)
  3. Identify sequences (in original FastA file, not individual files) annotated as vitellogenin.
  4. Design primers on best vitellogenin match (based on TransDecoder score and BLASTp e-values) using Primer3.
  5. Confirm primer specificity using EMBOSS(v6.6.0) primersearch tool to check all individual sequence files for possible matches.

Yaamini’s Notebook: MEPS Revisions Part 3

Final unconstrained and constrained ordinations

I finally figured out all of my N/A problems and completed my unconstrained and constrained ordinations!

Protein abundance

I didn’t have any problems with my protein abundance NMDS and previously obtained pairwise ANOSIM R-statistic and p-values. In this R script, I visualized my ordinations. You can find one version with sample numbers here, and another with site and habitat distinctions here. Because I exported my files as pdfs and don’t feel like changing the code and re-exporting files, I’m also including screenshots of the ordinations.

screen shot 2018-11-29 at 10 58 33 am

screen shot 2018-11-29 at 10 58 49 am

Figures 1 and 2. Protein abundance ordinations with confidence ellipses.

Environmental data

My original problem with the environmental data was that I had N/As in my dataframe, but was unable to use method = "gower" with the metaMDS function in the package vegan. Turns out vegan uses vegdist, which doesn’t handle N/As. Instead, I used the daisy function in the cluster library to calculate a Gower’s dissimilarity (distance) matrix. I then inputted that matrix directly into metaMDS with method = "euclidean". It worked! My code can be found here. I then conducted an Analysis of Similarity (ANOSIM) to assess the significance of my results.

Table 1. One-way ANOSIM results for environmental data NMDS based on site (Case Inlet vs. Fidalgo Bay vs. Port Gamble Bay vs. Skokomish River Delta vs. Willapa Bay), habitat (bare vs. eelgrass), parameter (mean vs. variance), and environmental variable (pH vs. dissolved oxygen vs. salinity vs. temperature). Significant p-values are bolded.

One-way ANOSIM R p-value
Site -0.03428841 0.943
Habitat -0.02037461 0.982
Parameter 0.7378174 0.001
Environmental Variable 0.1796322 0.001

Only the parameter and environmental variable ANOSIMs were significant. My guess is that because I ordinated means and variances in the same space, this is skewing my results. I decided to conduct two-way ANOSIMs to go further.

Table 2. Two-way ANOSIM results for environmental data NMDS. Significant p-values are bolded.

Two-way ANOSIM R p-value
Site and Habitat -0.09991229 1
Parameter and Site 0.4778027 0.001
Parameter and Habitat 0.4778027 0.001
Environmental Variable and Site 0.02098092 0.315
Environmental Variable and Habitat 0.1089352 0.007
Environmental Variable and Parameter 0.7802082 0.001

When accounting for parameter, site and habitat were significant. When accounting for environmental variable, only habitat was significant. I included environmental variable and parameter, but I already figured that would be significant. I need to double check with Julian, but I belive conducting pairwise ANOSIMs based on the significant two-way ANOSIM results will help me understand what is driving the significant results.

I also visualized my ordination here.

screen shot 2018-11-29 at 11 08 28 am

Figure 3. Environmental data NMDS. Means are outlined in solid lines, variances are outlined in dashed lines.

Constrained ordination

In this R Markdown file, I conducted a constrained ordination to look at relationships in protein abundance based on my environmental data. To do this, I first created an environmental data matrix with mean and variance values from the entire outplant, and matched them with my objects, oyster sample IDs. This meant I had the same objects in both my protein abundance and environmental data matrices. I calculated the gradient length and found that the underlying model was linear, so I used an RDA. I specified na.action = na.exclude in my rda function so it could handle my missing values.

Table 3. Variance explained by constrained ordination. The RDA explains 29.21% of all variance.

Variance Partition Inertia Proportion
Total 0.015616 1
Constrained 0.004561 0.2921
Unconstrained 0.011055 0.7079

Table 4. ANOVA results for overall RDA analysis. With F(6,19) = 1.3064 and p-value = 0.195, the RDA is not significant.

Partition Df Variance F Pr(>F)
Model 6 0.0045607 1.3064 0.195
Residual 19 0.0110550

Table 5. Significance of each RDA axis. No axis is significant.

Axis Df Variance F Pr(>F)
RDA1 1 0.0023293 4.0033 0.316
RDA2 1 0.0013380 2.2995 0.603
RDA3 1 0.0003888 0.6682 0.998
RDA4 1 0.0003065 0.5268 0.994
RDA5 1 0.0001395 0.2397 0.999
RDA6 1 0.0000586 0.1008 0.999

Table 6. Significance of each environmental descriptor in the RDA. Temperature mean and variance are marginally significant predictors.

Environmental Variable Df Variance F Pr(>F)
Temperature Mean 1 0.0013542 2.3275 0.065
Temperature Variance 1 0.0012584 2.1627 0.067
pH Mean 1 0.0003063 0.5264 0.781
pH Variance 1 0.0007082 1.2171 0.301
Dissolved Oxygen Mean 1 0.0006024 1.0354 0.349
Dissolved Oxygen Variance 1 0.0003312 0.5692 0.725
Residual 19

One weird thing I noticed is that my salinity variables do not show up in Table 6 or my ordinations below. That’s something I’ll have to figure out with Julian. Other than that, it’s interesting to note that my RDA really isn’t significant, which fits the loose interpretation we had earlier of environment affecting protein abundance in the manuscript.

To visualized my contrained ordination, I included all proteins and environmental descriptors here, and only significant proteins and environmental descriptors here.

screen shot 2018-11-29 at 11 06 57 am

screen shot 2018-11-29 at 11 07 06 am

Figures 4-5. RDA visualization including all proteins and descriptors (top) and only those that are significant (bottom).

Going forward

  1. Discuss all results with Julian
  2. Create a better ordination visual
  3. Update manuscript with new methods and results
  4. Tweak discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Kaitlyn’s notebook: Proteomics paper


Referenced from Shelly’s post

  • Compare ASCA proteins (high loadings) with hierarchical cluster (differentially clustered) proteins
    • make raw abundance line plots facetted by protein
    • examine if GO enrichment changes when ASCA and cluster proteins are combined
  • Determine how time is factored into the cluster
  • Determine if the permutation test with ASCA tests needs to be improved for the high loadings proteins to be considered highly influential
  • Redo BLAST of CHOYP proteins to 2018 Uniprot database
  • Begin identifying and locating data files that we need to deposit in public protein repository (i.e. ProteomeXchange and PeptideAtlas)
    • need to regenerate ‘table_blastout_gigatonpep-uniprot’
    • need fasta file of the peptides ID’d by mass spec
    • make a simplified supplementary table containing CHOYP IDs, UniProt Accessions, e.val, Protein names, Gene names
      • Can modify current datasheet to get this info as well

Completed tasks

Paper questions

Question: Does temperature influence the proteome of larval C. gigas, and if so, how?

Referenced from Shelly’s post

  • Do we need to explain we did 2 x 4 treatments, or just say we did 1 x 2 treatments?
  • Do we have survival data for other silos to compare to silo 2, 3, and 9?
    • Can we rule out silo 2 as an anomaly or should we include it?

Kaitlyn’s notebook: clustering without day 0

I redid the hierarchical clustering of the combined silo 3 and 9 datasheet without day 0.

Include Day 0 Remove Day 0
Agglomerate coefficient 0.9964979 0.9976217
Cophenetic correlation 0.9477519 0.9630485
Clusters 41 84
Diferentially clustered proteins 33 213

31 “diferentially clustered proteins” remained differentially clustered whether day 0 was included or excluded. So, removing day 0 causes more proteins to cluster separately (the agglomerate coefficient is slightly increased).

  • i.e., removing day 0 causes more proteins to be identified as having different abundances.
    • All abundances are the same for both silos on day 0 since no treatment had been administered yet.
      • Removing day 0 means that only days following treatment are analyzed which makes more sense since we are attempting to identify proteins that have different abundances during treatment only.


This slideshow requires JavaScript.

This slideshow requires JavaScript.

Grace’s Notebook: Start new DIA analysis using Walnut, EncyclopeDIA, and Bri-line

Today I tried doing the mprophet model in Skyline Daily. It wasn’t super great and I got stuck, but Emma reminded me that a lot of people who do DIA analysis prefer a new method using Walnut, EncyclopeDIA, and Bri-line. I started this new protocol today and am currently letting Walnut make the new chromatogram library, which will take a while. I’ll move on to the next phase for this tomorrow when the library is done.


mprophet is part of the Advanced Peak Picking Tutorial. I finished the model, but didn’t go into checking the error rates because Emma reminded me that the Walnut, EncyclopeDIA, “Bri-line” method is what others have been preferring. is too large to add to OWL… but it’s on Woodpecker (FTR 209)

New method in a nutshell

  1. Convert .raw to .mzML using MSConvert (Check)
  2. Walnut –> make .elib file using the .mzML files made in Step 1. “Save Chromatogram Library” (in progress – will be done tomorrow)
  3. EncyclopeDIA –> Search wide-window data with library from Step 2.
  4. Bri-line –> ELIB browser: look at peaks; RAW file browser: look at RAW files

This method is pretty straight forward and seemingly easy to follow – will report back on that in tomorrow’s notebook post.

Goals for the week:


  1. Finish new DIA analysis method


  1. Extraction plan for Crab samples – get input from Steven and Sam
  2. Start extractions (?)

I have a million classes (3), so I won’t have time to fit in extractions, but:

  1. Work on R script for adding new Qubit data


  1. Extractions
  2. Anything from my previous goals that I couldn’t finish

from Grace’s Lab Notebook

Ronit’s Notebook: qPCR with Desiccation + Elevated Temp. Samples (DNMT1)

Before the long weekend, I ran a qPCR assay with my desiccated + elevated temp. samples. There were 40 samples in total and I ran one plate with DNMT1 (DNA methyltransferase) as my gene target.

Plate Layout:

The plate is laid out with duplicates in adjacent wells and the order of samples is as follows:

D01, D02, D03, D04, D05, D06, D07, D08, D09, D10, D11, D12, D13, D14, D15, D16, D17, D18, D19, D20, T01, T02, T03, T04, T05, T06, T07, T08, T09, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20

There are 2 positive controls with gDNA in wells G9 and G10. There is a no-template control (negative control) with no template in wells G11 and G12.

To run the qPCR assay, I created a mastermix with 20 µL of forward primer, 20 µL of reverse primer, 400 µL of 2x qPCR master mix, and 320 µL of DEPC-treated water. 19 µL of the mastermix was put into each well and a subsequent 1 µL of cDNA was put in for each sample (water for the negative controls and gDNA for the positive controls). Note that all sample cDNA used was a 1:5 dilution.

Grace’s Notebook: Cold, Infected Pool with RNeasy and DecaPod S1E13

Today I edited and published S1E13 of DecaPod. In this episode I interview Madi Shipley (MS, SAFS) who works with Bering Sea Tanner crabs. I also extracted RNA from 8 samples (all Cold, infected, Day 26) using Qiagen RNeasy and definitely have enough RNA for a pool!


S1E13 posted

RNeasy Extraction: 8 samples D26, cold, infected

Today I extracted RNA from 8 samples (all the second out of the three that were taken per crab):

I used the RNeasy protocol, with a few modifications:

  • At steps 5-7, I centrifuged at 10,000 rcf
  • Added 2-ME to Buffer ATL (since 8 samples, did 3mL buffer, 30µL 2-ME)
  • Did not add RNA carrier
  • Eluted with 14µL water (provided in Kit)


Qubit results look pretty amazing! My bioanalyzer results from last time looked pretty good, so I think we can trust these results. EXCEPT for tube number 413-2 because some of the working solution did not make it into the qubit tube.


The tubes (now with 13µL of material) are in Rack 5, Column 4, Row 3 in the -80.

NWGC wants 20ng/µL of RNA in 50µL sample (GitHub issue #184). It looks like we have more than enough!

When I return from the short holiday break this weekend, I’ll start picking out samples for the other pools and get feedback from Steven and Sam.

from Grace’s Lab Notebook

Sam’s Notebook: Reverse Transcription – Ronit’s C.gigas DNased RNA from 20181115

Quantified Ronit’s DNased RNA earlier today and proceeded with reverse transcription using 500ng of DNased RNA with oligo dT primers (Promega) and M-MLV reverse transcriptase (Promega) according to the manufacturer’s protocol.

RNA and oligo dTs were incubated at 70oC for 10mins in a PTC-200 thermal cycler (MJ Research) without a heated lid and immediately placed on ice.

A master mix of buffer, dNTPs, and M-MLV was distributed to each sample (10uL to each sample), mixed, and incubated at 42oC for 1hr, 3min at 95oC, and then held overnight at 4oC in the PTC-200 thermal cycler.

A 1:5 working dilution of each cDNA (5uL cDNA + 20uL PCR H2O was made and will be used for all subsequent qPCRs.

All samples were stored in Ronit’s -20oC box.

Reverse transcription calcs (Google Sheet):

Info was added to Ronit’s master sheet (Google Sheet):

from Sam’s Notebook

Yaamini’s Notebook: MEPS Revisions Part 2

Protein abundance and environmental data NMDS

Protein abundance

Previously, I conducted an NMDS using hellinger-transformed data and euclidean distances. I also conducted global Analysis of Similarities (ANOSIM) testing, and found a significant result when comparing regions (Puget Sound vs. Willapa Bay). After confirming this approach with Julian, he suggested I also conduct pairwise ANOSIM tests to understand where the differences are coming from. In this R script, I conducted pairwise ANOSIM tests between all different site combinations.

Table 1. Results of pairwise ANOSIM tests between sites for protein abundance data. R values are listed above the diagonal, and p values are listed below the diagonal. Values significant at the 0.05 level are marked by **, and values significant at the 0.10 level are marked by *.

CI 0.0196793 -0.07069971 -0.05830904 0.09259259
FB 0.331 -0.03125 0.09486607 0.2567829
PG 0.78 0.599 -0.006138393 0.07267442
SK 0.733 0.086 0.382 0.1540698
WB 0.16 0.035** 0.192 0.079*

Looks like the comparisons driving any regional differences are between Willapa Bay, Fidalgo Bay, and the Skokomish River Delta. Fidalgo Bay is the northern-most site, and Skokomish River Delta is in Hood Canal. It will be interesting to see how the environmental data compares.

Environmental data

Since my environmental data analysis is new, I created an R Markdown file. Based on my discussion with Julian, I first estimated means and variances for environmental conditions at each site. I originally was going to calculate minimums, maximums, means and variances, but he mentioned that minimum and maximum values will be autocorrelated with means and variances, so there is no need to include them. To calculate these variables, I used a for loop that iterated over dates as character strings and ignored missing values in my calculations. I (obviously) encountered errors doing this, which you can refer to here.

Once I calculated all the daily values I needed, I log(x+1) transformed my data. My dataframe includes missing values, since Gower’s is able to ignore missing values. After transforming my data, I transposed my dataframe so my environmental variables at each site-habitat combination were objects and my descriptors were dates. Originally, I wanted to use my dates as objects and environmental variables as descriptors. This, however, would mean that my dates would be the points clustering. I’m more interested in knowing how my environmental variables cluster in relation to eachother, so I figured it would be better to have those as objects. I can then use dates to understand where the variability comes from (and see if my later dates contributed to differences in variables more than earlier dates…thereby understanding temporal variation and allowing me to answer a certain reviewer comment).

With my transposed dataframe, I was ready to ordinate my data! However, I was unable to use metaMDS in the vegan package with missing data. In the past, I converted all my N/As to 0s to successfully use metaMDS. I don’t want to do this again, because then metaMDS will count those 0s as similarities, instead of discarding comparisons with missing values like it is supposed to. I posted my issue in the multivariate class Canvas page, so hopefully I get an answer. I can’t really move forward until then :/

Going forward

  1. Get environmental data NMDS to work
  2. Perform a constrained ordination
  3. Discuss all results with Julian
  4. Update manuscript with new analyses

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Sam’s Notebook: RNA Quantification – Ronit’s C.gigas DNased RNA from 20181115

After confirming Ronit’s DNased RNA was free of gDNA, I quantified the DNased RNA from 20181115 using the Roberts Lab Qubit 3.0 and the Qubit hsRNA Assay.

Used 1uL of each sample.