Yaamini’s Notebook: Remaining Analyses Part 11

Connecting the dots

Overwhelmed with the amount of data I had to work with, I did what any #ResponsibleGradStudent would do: talk to my adviser. Steven and I decided that the two best things to do would be to visualize diurnal fluctuations and regress peptide abundances against biomarkers. For the environmental variables, I’m not going to clip out any low tide exposure unless we think there’s something interesting to pursue and refine.

Environmental variables


This the variable that I for sure think could explain some differences in peptide abundance between sites.

R Script


Figure 1. Diurnal fluctuation of temperature at each site.


Figure 2. Differences in temperatures at each site.


I was unable to clip out low exposure data, but like I mentioned above, I’m not going to clip it out unless something looks interesting. Aside from higher salinity at Fidalgo Bay (which could be an artifact of low tide exposure), I don’t see anything worth pursuing.

R Script


Figure 3. Diurnal fluctuation of salinity at each site.


Figure 4. Differences in salinity at each site.


Willapa Bay had higher pH than Port Gamble Bay and Skokomish River Delta, but not the other two sites. Unlike the other environmental variables, all of my sites were not significantly different from eachother. The significant differences are between Fidalgo Bay and Port Gamble Bay (0.0004023), Skokomish River Delta (0.0028254), and Willapa Bay (p = 0.0002764)

R Script


Figure 5. Diurnal fluctuation of pH at each site.


Figure 6. Differences in pH at each site.

Dissolved Oxygen

Willapa Bay had slightly lower dissolved oxygen than the other sites. However, this could be skewed by the fact that there are some extreme exposure values!

R Script


Figure 7. Diurnal fluctuation of temperature at each site.


Figure 8. Differences in temperatures at each site.


To see if any of Alex’s biomarkers explained peptide abundance variation, I regressed each biomarker against each peptide. My many, many scatterplots can be found here.

I skimmed through all of the R-squared values the regressions and didn’t see many that had values over 0.5. Perhaps there’s a different method I should try.

from yaaminiv.github.io http://ift.tt/2i4efDB

Yaamini’s Notebook: Gonad Histology Check-in

Quick journey into Manchester world

Grace has been super busy analyzing my gonad histology data (#trumie)! She hasn’t finished all gonad maturation state or sex classification yet, but here’s a quick peek into what she has so far.

Her classification spreadsheet can be found here. Within this file, she made pie charts to visualize sex ratios before and after pH exposure.



Figures 1-2. Preliminary pre and post-OA sex ratios.

In the same file, I made some pie charts to look at gonad maturation state between treatments. She only has maturation state information for four oysters before exposure, so I made graphs for the post-exposure oysters. When I sampled, I took ten individuals per treatment.



Figures 3-4. Preliminary gonad maturation states, separated by treatment.

For this week’s class assignment, Steven also wanted me to generate a figure for my larval survival data. I think the one I have is sufficient for now…?


Figure 5. Larval mortality, separated by family.

from yaaminiv.github.io http://ift.tt/2jwuGbO

Yaamini’s Notebook: Remaining Analyses Part 10

Following through on my plan

…kind of.

I started to look up tidal data so I can clip out any low tide and exposure values, as Micah suggested. However, I could not find any tidal information for the Skokomish River Delta site on that database. The website had me pick cities that were closest to the outplant sites. Here’s what I decided on.

Vaughn, Case Inlet

Port Gamble, Port Gamble Bay

Anacortes, Fidalgo Bay

Nahcotta, Willapa Bay

I’m also unclear as to what exposure we need to clip out. Are we taking out any data from all low tides, or just from complete instrument exposure? How do we know what the difference is? It seems like I need the depth data from Micah to proceed.

Right now I think it makes the most sense to just visualize the salinity data, without clipping out any funky values (it seems like Micah already did some of that anyways…?) and then see if it will be a candidate explanatory variable for my protein expression. I’ve also been rethinking my original idea for a regression analysis. I think the power of such an analysis would be compromised since I only have protein expression from the very end of the experiment, as opposed to multiple time points. Condensing such high resolution environmental data into one number (mean or median) to build a model off of seems unwise. I think I’m going to shoot an email off to Julian to see if he has any ideas on how to tackle this.

from yaaminiv.github.io http://ift.tt/2BuGsff

Laura’s Notebook: Finalized list of SRM data analysis stats

The Thanksgiving weekend provided time to ponder/reflect on the SRM stats that I’ve run thus far, what else needs to be done, and how to finish up. Received guidance from Dave-o. Note: next time, include a housekeeping protein in list of targets.

This is my “finalized” list of SRM & environmental stats to run. In the last few days I’ve completed much of this. In bold are those remaining. Tomorrow I’ll hopefully be done, and will post my scripts and results.

SRM Protein & Environmental Data Analysis Steps

Each Protein: (assume proteins are independent)

  1. Test for normality
  2. Lambda transformation
  3. Test for normality post transformation
  4. Assess outliers, remove if necessary
  5. N-way ANOVA by: a) location b) habitat c) site d) region
  6. Determine P-adjusted, correct for multiple comparisons (bonferroni method, P/13)
  7. Post-hoc test to ID differences (is this really necessary?)
  8. Ultimate goal: which proteins are different between locations?
  9. Compare total abundance between sites (sum peptide abundance)

Each environmental variable:

  1. Download tidal chart data for each site
  2. Edit pH, DO & Salinity data:
    a. Remove data from exposed time points, as determined from tidal charts
    b. Identify and remove outliers from pH, DO & Salinity data
    c. Recombined outlier-scrubed data with Temp, Tide data.
  3. Assess Normality of each env. variable (all time points)
    • Found to be non-parametric (pH is kinda, but let’s assume not). Dataset is large (>6000 for each parameter), so did not determine lambda via tukeytransform function. Instead, used Krusgal-Wallis non-parametric analysis in lieu of ANOVA
  4. KW test for each env. variable by location, by region
  5. Dunn Test post-hoc test to ID differences
  6. Use bonferroni correction for P-adjusted in tests
  7. Ultimate goal: which env. variables are different between locations?
    a. basically all of them.

Prep for regression model:

  1. Calculate summary statistics: mean, variance, sd, min, max, median, %>1 sd from mean, %>2 sd from mean
  2. Plot() all env. variables- are any linearly related, aka not independent? If so, need to include interaction parameter in regression model.
  3. Plot() protein peptides against each other to confirm linear correlation; equation should be ~1:1.
  4. If all correlated select 1 peptide to use in regression model; highest abundance is best.

Run regression models for each representative peptide:

  1. Step-wise linear regression models with all env. variables; I would expect that only the variables that were found to be different via the ANOVA would significantly contribute to the model
  2. General linear model with variables ID’d in step-wise lm
  3. Figure out when to add a constant, and if I should do that in this scenario
  4. Run anova on best fit model, find P-value of the env. variables to determine confidence in the influence of each env. variable on proteins.
  5. Run model on the other peptides in the protein (not used as representative peptides); ID the R^2 and P values

from LabNotebook http://ift.tt/2Ag6i95

Grace’s Notebook: Juneau, November 28, 2017

Today Pam and I cleaned and packed up the lab!

We sent the coolers, pumps, and other things to the Kodiak Lab, and sent some of her things back to Seattle.

We power washed the tanks and scrubbed them down.

Pam has to send samples with ethanol, and they can only be sent through UPS. In Juneau, UPS is only open from 7pm-8pm.

Tomorrow Pam is giving a talk around 11am about this project. We have a few more things to clean and pack up, and then we’ll be done!

Yaamini’s Notebook: Remaining Analyses Part 9

My final NMDS

The first way I thought I could tie together environmental data with protein expression was an NMDS plot with eigenvectors based on environmental variables. I realized that wasn’t possible only after I made my NMDS by region in this script.


Figure 1. NMDS for protein expression by region (Puget Sound vs. Willapa Bay).

I actually ended up with a significant ANOSIM result when splitting up the regions! I got R = 0.2368 instead of an expected R of -0.002, with a p-value of 0.31.

Since the eigenvector thing isn’t going to work out, here’s my game plan to create figures with protein expression, environmental variable and biomarker data:

  • Clip out low tide/exposure recordings from salinity, pH, and dissolved oxygen data
  • Visualize diurnal fluctuations
    • Salinity
    • pH
    • DO
  • Boxplots
    • Salinity
    • pH
    • DO
  • Calculate maximum, minimum, mean, and median for variables that show differences in Willapa Bay. So far, these are temperature, delN, and tissue mass
  • Remake boxplots for significantly different peptides with secondary y-axes for variables found to be significant in the previous step
  • Add lines denoting calculated maximum, minimum, etc. on top of boxplots for protein expression

from yaaminiv.github.io http://ift.tt/2zBfrFl

Yaamini’s Notebook: Remaining Analyses Part 8

Visualizing environmental and biomarker data

I spent time recently making some figures and trying to figure out which environmental variables and biomarkers could explain my protein expresison patterns.

Environmental variables:

Based on discussion from our second proteomics meeting, temperature looked like a candidate driver for protein expression differences in Willapa Bay. I visualized the diurnal fluctuations and added mean and median lines to a multipanel plot in this R script.


Figure 1. Diurnal temperature fluctuations within each site and habitat. The mean and median temperatures between habitats are not very different within sites, so it could be possible to pool or average the data somehow.

I also made a boxplot for temperatures at each site. The ANOVA was significant, and a post-hoc Tukey HSD test showed that all sites were significantly different from eachother. I think this is because of the fluctuations between each site. The next step would be to do a “sliding window analysis,” where I run an ANOVA for 14 time points between sites, move over 7 time points, and then run the ANOVA again. Essentially, I would be able to isolate windows of time where there were siginficant temperature differences and where there weren’t.


Figure 2. Temperature at each site. Willapa Bay had significantly different temperatures.u

I also played around with the idea of a multipanel plot that puts the diurnal fluctuations and temperature boxplot side-by-side. I only made one of these since I don’t know what Steven would think of it. I like how it shows which temperatures were “extremes” and when those extremes were.


Figure 3. Multipanel plot with diurnal temperature fluctuations and quartiles for bare habitat at Case Inlet.

Micah sent over salinity data and low tide information, so I need to work with that to remove data we don’t trust from pH, DO and salinity. After that, I can see if any of those environmental variables explain higher protein abundance at Willapa Bay.


Alex sent over two different datasheets: data for all oysters he sampled and data for just the oysters I sampled. I decided to stick to the latter for now.

I generated boxplots for each biomarker in this R script. I then generated a table of ANOVA and Tukey HSD results. The annotated version can be found below.


Figure 4. ANOVA and post-hoc Tukey HSD results.

None of the results really jump out as being explanations for higher protein abundance in Willapa Bay. Tissue mass is significantly different between Willapa Bay-Fidalgo Bay (p = 5.37e-05) and between Willapa Bay-Port Gamble Bay (p = 0.05394943). Percent N was also significantly different between Willapa Bay-Fidalgo Bay (p = 0.08942135) and Willapa Bay-Port Gamble Bay (p = 0.09671321). According to Alex’s notes, relatively higher percent N is a good thing, and Willapa Bay had higher percent N than Fidalgo Bay and Port Gamble Bay. I’ll run this by him on Friday when we meet.

My next step now is to tie together protein expression and these variables in one figure. To the (literal) drawing board!

from yaaminiv.github.io http://ift.tt/2zNMVo8

Yaamini’s Notebook: Environmental Data from Micah

More data, more problems…?

Micah sent over the salinity data. He said there are instrumental issues, and that we need to clip out any data from “low tide/exposure.” He uses this website for tidal data. He also misspoke, and there were no chlorophyll sensors in the 2016 deployment.

Table 1. Latitude and longitude for deployments.

Site Habitat Current Latitude Current Longitude
CI E 47.3584391 -122.7964495
CI B 47.3579367 -122.7957627
FB B 48.481691 -122.58353
FB E 48.481342 -122.583529
PG B 47.842676 -122.583832
PG E 47.847983 -122.582919
SK E 47.3543321 -123.1566232
SK B 47.35523 -123.1572
WB B 46.4944789 -124.0261356
WB E 46.49508 -124.02652

He should have depth data for the following as well, but he didn’t have it at the moment. He said he will send more information over when he’s back in town on Wednesday.

from yaaminiv.github.io http://ift.tt/2zMQpaz

Grace’s Notebook: Juneau, November 27, 2017

*I will double check some specifics of information (denoted by * or __) with Pam and edit post accordingly.

Effects of temperature change and Hematodinium sp. infection (Bitter Crab Disease) on Tanner crab (Chionoecetes bairdi)

Final sampling day of Tanner Crab (Chionoecetes bairdi) blood 

Pam Jensen and I arrived at the Ted Steven’s Marine Research Institute in Juneau at ~7:45am. We worked in the Annex (right image).

Experimental Set-up

There were three header tanks at different temperatures (4˚C – cold; 10˚C – warm; ~6˚C – ambient). The cold and ambient had coolers, and the warm had heaters. The water flows (2L/min) into the tanks.

There was a mass mortality in the warm tanks due to the temperature getting too high over a weekend. As a result, by today’s sampling day, there was only one crab in each of the three warm tanks.


Set-up of the lab

The header tanks with the coolers had brass floats. If the water went to high, it would close the valve with the water source (much like a toilet) to prevent the water from overflowing as well as the temperature becoming too cold. Additionally, both cooling tanks had pumps to circulate the water to homogenize the temperature.

The temperature was logged every 15minutes using HOBO Tidbits. One TidBit in each tank (9 total).


Tanner Crab phlebotomy 

The crabs were corralled into one end of the tank, and a screen was placed to hold them back. Additionally, bricks were laid at the bottom because the crabs could crawl under the gaps in the screen. In between samples, we kept foam over the top of the tank, to keep the temperature from fluctuating too drastically, since the room was likely quite a bit warmer (even though we did not have the heat on) than the water in the tanks.


Crabs to be sampled.

Each crab has two tags of different numbers. There are two because sometimes the tags will come off, or their entire leg with the tag, will come off. All the crabs are male and young in order to keep from having too many variables.


After the tag numbers were recorded with a tube number, a Q-tip with rubbing alcohol was rubbed in order to clean the area of other dinoflagellates. Then, a syringe is used to sample the blood at the base of the front claw.

A drop of blood is placed on a labeled slide to create a blood smear, and then 0.2 ml is placed in three tubes containing RNAlater (6 for the warm water crabs, because we only had three total).

The blood smears will be stained by _____ at _____. She will use the slides to determine the life stage of the Hematodinium.

Clean up

The blood smear slides were let to dry and then placed in a slide box. The HOBO Tidbit temperature data were downloaded.

Pam and I began the process of taking down the lab. We had to sacrifice all the crabs even though they were collected from Stevens Passage in Juneau. Pam removed the tags and placed them in the dumpster.

We took apart all the plumbing and pumps. Rinsed everything with freshwater, drained the tanks.

Tomorrow we’ll clean, pack everything up, and ship things to the Kodiak Lab.

Other Info:

Pam said that while other types of infection can cause this mottled appearance in their legs, it is likely the result of Hematodinium infection because ~50% of Tanner Crabs are infected in the fall in  the Juneau area.


Yaamini’s Notebook: Remaining Analyses Part 7

Column comparisons

Emma ran two of my samples to see if she could “replicate my poor technical replication.” Instead, she ended up with good technical replication. The oyster samples are closer to themselves rather than the other oyster. There still seems to be quite a distance between the two technical replicates in ordination space though, but I guess that doesn’t matter (or could be due to the fact that she didn’t normalize the data).


Figure 1. Technical replication from Emma rerunning my samples.

She also provided us the Skyline output. I could use this to compare the area data with the area data from my run.

Emma suggested that I calculate predicted retention times by column. For the first column, I used PRTC retention times from oyster sample 4. My regression equation was 0.2734x + 11.52. I used this equation to predict retention times for my oyster peptides.

screen shot 2017-11-27 at 1 51 01 pm

Figure 2. Predicted retention times for oyster peptides based on column 1 equation.

Using PRTC retention times from oyster sample 94, I got an equation of 0.2718 + 10.939 for column two.

screen shot 2017-11-27 at 1 51 17 pm

Figure 3. Predicted retention times for oyster peptides based on column 2 equation.

There’s a slight difference in retention time from column 1 to 2, but not enough to where I would have picked the wrong peak. So now I’m really stumped and have no idea why my technical replication wasn’t good.

Emma also suggested I find PRTC peptides where I’m confident in the concentrations and try and compare peak areas between column 1 and 2. Based on screenshots from this lab notebook entry, my PRTC peptides had relatively similar peak area magnitudes at the end of the first column and beginning of the second column (around samples 85-95).

So yeah, I’m stumped.

from yaaminiv.github.io http://ift.tt/2k6IygR