Yaamini’s Notebook: Environmental Data from Micah Part 2

The last bit of data!

(I think).

Micah sent over growth data for the oysters, as well as site rankings for outplant depth and eelgrass extent in the bay.

Outplant elevation, from deepest to most shallow: Case Inlet (never dry), Fidalgo Bay (never dry), Skokomish (exposed at low tide), Port Gamble Bay (exposed at low tide), Willapa Bay (frequently exposed at low tide)

Eelgrass extent in bay, from most to least: Fidalgo Bay (eelgrass dominant), Willapa Bay (eelgrass dominant), Port Gamble Bay (eelgrass common), Case Inlet (eelgrass common), Skokomish (eelgrass limited)

from yaaminiv.github.io http://ift.tt/2AA1065


Yaamini’s Notebook: Environmental Data Meeting Part 3

Yesterday’s meeting notes

In the third installment of our “what does all of this actually mean” meetings, Micah, Alex, Emma, Brent, Laura, Steven and I discussed the progress we’ve made integrating all of our data into one cohesive story.


  • Dissolved oxygen measurements
    • FB most eelgrass dominated, higher pH, could have daily super saturation (DO > 12)
      • Need to do literature survey to verify measurements are “real”
    • Padilla Bay: DO ~ 19.3 max for sensors that never come out of the water
  • Should clip DO, pH and salinity data
    • Conservative one hour/one foot clipping
    • Use Union for SK tidal data
    • Just use bare for all sites
    • Correcting values to the right mean salinity from sensors can be difficult, lead to discrepancies
    • End drops in salinity and pH could be burials
  • Can examine brief window of environmental data one or two days before sampling
  • Number of low tides could be interesting
    • ex. Lots of drops in salinity at WB –> could number of low tides affect protein expression?
  • Eelgrass extent as an explanatory variable
    • Global eelgrass effect could override any bare sites?
  • Biomarker data
    • Ignore fatty acid data for now since there’s a low sample size
    • Final height is a proxy for growth
    • FB grew the most, CI grew the least
    • Tissue mass highest in FB, then PG. WB, SK, and CI were similar

Next steps:

  • Figure out biomarker comparison table code
  • Scrub data
  • Make environmental variable table
    • Average
    • Median
    • Maximum
    • Minimum
    • Standard deviation/variance
    • Number of observations above/below SD/2 SDs
    • Number of exposures/low tides
    • Days exposed
    • Total time exposed

from yaaminiv.github.io http://ift.tt/2jTd5Ls

Yaamini’s Notebook: Remaining Analyses Part 12

Some more regressions

I tried to write a for loop in this R script to make a table with each peptide vs. biomarker comparison, R-squared value, and slope……but I’m hardcore struggling with it. I wrote a for loop within a for loop to create all of the plots, but now I can’t write another for loop to take all of the information I’m generating and put it in a new dataframe. I’m going to keep trying though!

After looking at my peptide vs. biomarker regressions, Steven suggested I make the same plots for each site. I used the R script linked above to do that. The plots can be found in these folders:

Case Inlet

Fidalgo Bay

Port Gamble Bay

Skokomish River Delta

Willapa Bay

Once I figure out how to get my triple for loop to work, I’ll make a table for this information too. Now I guess I’ll wait for our meeting with Micah and Alex to see what to do next.

from yaaminiv.github.io http://ift.tt/2ArUsbA

Yaamini’s Notebook: Remaining Analyses Part 11

Connecting the dots

Overwhelmed with the amount of data I had to work with, I did what any #ResponsibleGradStudent would do: talk to my adviser. Steven and I decided that the two best things to do would be to visualize diurnal fluctuations and regress peptide abundances against biomarkers. For the environmental variables, I’m not going to clip out any low tide exposure unless we think there’s something interesting to pursue and refine.

Environmental variables


This the variable that I for sure think could explain some differences in peptide abundance between sites.

R Script


Figure 1. Diurnal fluctuation of temperature at each site.


Figure 2. Differences in temperatures at each site.


I was unable to clip out low exposure data, but like I mentioned above, I’m not going to clip it out unless something looks interesting. Aside from higher salinity at Fidalgo Bay (which could be an artifact of low tide exposure), I don’t see anything worth pursuing.

R Script


Figure 3. Diurnal fluctuation of salinity at each site.


Figure 4. Differences in salinity at each site.


Willapa Bay had higher pH than Port Gamble Bay and Skokomish River Delta, but not the other two sites. Unlike the other environmental variables, all of my sites were not significantly different from eachother. The significant differences are between Fidalgo Bay and Port Gamble Bay (0.0004023), Skokomish River Delta (0.0028254), and Willapa Bay (p = 0.0002764)

R Script


Figure 5. Diurnal fluctuation of pH at each site.


Figure 6. Differences in pH at each site.

Dissolved Oxygen

Willapa Bay had slightly lower dissolved oxygen than the other sites. However, this could be skewed by the fact that there are some extreme exposure values!

R Script


Figure 7. Diurnal fluctuation of temperature at each site.


Figure 8. Differences in temperatures at each site.


To see if any of Alex’s biomarkers explained peptide abundance variation, I regressed each biomarker against each peptide. My many, many scatterplots can be found here.

I skimmed through all of the R-squared values the regressions and didn’t see many that had values over 0.5. Perhaps there’s a different method I should try.

from yaaminiv.github.io http://ift.tt/2i4efDB

Yaamini’s Notebook: Gonad Histology Check-in

Quick journey into Manchester world

Grace has been super busy analyzing my gonad histology data (#trumie)! She hasn’t finished all gonad maturation state or sex classification yet, but here’s a quick peek into what she has so far.

Her classification spreadsheet can be found here. Within this file, she made pie charts to visualize sex ratios before and after pH exposure.



Figures 1-2. Preliminary pre and post-OA sex ratios.

In the same file, I made some pie charts to look at gonad maturation state between treatments. She only has maturation state information for four oysters before exposure, so I made graphs for the post-exposure oysters. When I sampled, I took ten individuals per treatment.



Figures 3-4. Preliminary gonad maturation states, separated by treatment.

For this week’s class assignment, Steven also wanted me to generate a figure for my larval survival data. I think the one I have is sufficient for now…?


Figure 5. Larval mortality, separated by family.

from yaaminiv.github.io http://ift.tt/2jwuGbO

Yaamini’s Notebook: Remaining Analyses Part 10

Following through on my plan

…kind of.

I started to look up tidal data so I can clip out any low tide and exposure values, as Micah suggested. However, I could not find any tidal information for the Skokomish River Delta site on that database. The website had me pick cities that were closest to the outplant sites. Here’s what I decided on.

Vaughn, Case Inlet

Port Gamble, Port Gamble Bay

Anacortes, Fidalgo Bay

Nahcotta, Willapa Bay

I’m also unclear as to what exposure we need to clip out. Are we taking out any data from all low tides, or just from complete instrument exposure? How do we know what the difference is? It seems like I need the depth data from Micah to proceed.

Right now I think it makes the most sense to just visualize the salinity data, without clipping out any funky values (it seems like Micah already did some of that anyways…?) and then see if it will be a candidate explanatory variable for my protein expression. I’ve also been rethinking my original idea for a regression analysis. I think the power of such an analysis would be compromised since I only have protein expression from the very end of the experiment, as opposed to multiple time points. Condensing such high resolution environmental data into one number (mean or median) to build a model off of seems unwise. I think I’m going to shoot an email off to Julian to see if he has any ideas on how to tackle this.

from yaaminiv.github.io http://ift.tt/2BuGsff

Laura’s Notebook: Finalized list of SRM data analysis stats

The Thanksgiving weekend provided time to ponder/reflect on the SRM stats that I’ve run thus far, what else needs to be done, and how to finish up. Received guidance from Dave-o. Note: next time, include a housekeeping protein in list of targets.

This is my “finalized” list of SRM & environmental stats to run. In the last few days I’ve completed much of this. In bold are those remaining. Tomorrow I’ll hopefully be done, and will post my scripts and results.

SRM Protein & Environmental Data Analysis Steps

Each Protein: (assume proteins are independent)

  1. Test for normality
  2. Lambda transformation
  3. Test for normality post transformation
  4. Assess outliers, remove if necessary
  5. N-way ANOVA by: a) location b) habitat c) site d) region
  6. Determine P-adjusted, correct for multiple comparisons (bonferroni method, P/13)
  7. Post-hoc test to ID differences (is this really necessary?)
  8. Ultimate goal: which proteins are different between locations?
  9. Compare total abundance between sites (sum peptide abundance)

Each environmental variable:

  1. Download tidal chart data for each site
  2. Edit pH, DO & Salinity data:
    a. Remove data from exposed time points, as determined from tidal charts
    b. Identify and remove outliers from pH, DO & Salinity data
    c. Recombined outlier-scrubed data with Temp, Tide data.
  3. Assess Normality of each env. variable (all time points)
    • Found to be non-parametric (pH is kinda, but let’s assume not). Dataset is large (>6000 for each parameter), so did not determine lambda via tukeytransform function. Instead, used Krusgal-Wallis non-parametric analysis in lieu of ANOVA
  4. KW test for each env. variable by location, by region
  5. Dunn Test post-hoc test to ID differences
  6. Use bonferroni correction for P-adjusted in tests
  7. Ultimate goal: which env. variables are different between locations?
    a. basically all of them.

Prep for regression model:

  1. Calculate summary statistics: mean, variance, sd, min, max, median, %>1 sd from mean, %>2 sd from mean
  2. Plot() all env. variables- are any linearly related, aka not independent? If so, need to include interaction parameter in regression model.
  3. Plot() protein peptides against each other to confirm linear correlation; equation should be ~1:1.
  4. If all correlated select 1 peptide to use in regression model; highest abundance is best.

Run regression models for each representative peptide:

  1. Step-wise linear regression models with all env. variables; I would expect that only the variables that were found to be different via the ANOVA would significantly contribute to the model
  2. General linear model with variables ID’d in step-wise lm
  3. Figure out when to add a constant, and if I should do that in this scenario
  4. Run anova on best fit model, find P-value of the env. variables to determine confidence in the influence of each env. variable on proteins.
  5. Run model on the other peptides in the protein (not used as representative peptides); ID the R^2 and P values

from LabNotebook http://ift.tt/2Ag6i95