Yaamini’s Notebook: Remaining Analyses Part 11

Connecting the dots

Overwhelmed with the amount of data I had to work with, I did what any #ResponsibleGradStudent would do: talk to my adviser. Steven and I decided that the two best things to do would be to visualize diurnal fluctuations and regress peptide abundances against biomarkers. For the environmental variables, I’m not going to clip out any low tide exposure unless we think there’s something interesting to pursue and refine.

Environmental variables


This the variable that I for sure think could explain some differences in peptide abundance between sites.

R Script


Figure 1. Diurnal fluctuation of temperature at each site.


Figure 2. Differences in temperatures at each site.


I was unable to clip out low exposure data, but like I mentioned above, I’m not going to clip it out unless something looks interesting. Aside from higher salinity at Fidalgo Bay (which could be an artifact of low tide exposure), I don’t see anything worth pursuing.

R Script


Figure 3. Diurnal fluctuation of salinity at each site.


Figure 4. Differences in salinity at each site.


Willapa Bay had higher pH than Port Gamble Bay and Skokomish River Delta, but not the other two sites. Unlike the other environmental variables, all of my sites were not significantly different from eachother. The significant differences are between Fidalgo Bay and Port Gamble Bay (0.0004023), Skokomish River Delta (0.0028254), and Willapa Bay (p = 0.0002764)

R Script


Figure 5. Diurnal fluctuation of pH at each site.


Figure 6. Differences in pH at each site.

Dissolved Oxygen

Willapa Bay had slightly lower dissolved oxygen than the other sites. However, this could be skewed by the fact that there are some extreme exposure values!

R Script


Figure 7. Diurnal fluctuation of temperature at each site.


Figure 8. Differences in temperatures at each site.


To see if any of Alex’s biomarkers explained peptide abundance variation, I regressed each biomarker against each peptide. My many, many scatterplots can be found here.

I skimmed through all of the R-squared values the regressions and didn’t see many that had values over 0.5. Perhaps there’s a different method I should try.

from yaaminiv.github.io http://ift.tt/2i4efDB

Yaamini’s Notebook: Gonad Histology Check-in

Quick journey into Manchester world

Grace has been super busy analyzing my gonad histology data (#trumie)! She hasn’t finished all gonad maturation state or sex classification yet, but here’s a quick peek into what she has so far.

Her classification spreadsheet can be found here. Within this file, she made pie charts to visualize sex ratios before and after pH exposure.



Figures 1-2. Preliminary pre and post-OA sex ratios.

In the same file, I made some pie charts to look at gonad maturation state between treatments. She only has maturation state information for four oysters before exposure, so I made graphs for the post-exposure oysters. When I sampled, I took ten individuals per treatment.



Figures 3-4. Preliminary gonad maturation states, separated by treatment.

For this week’s class assignment, Steven also wanted me to generate a figure for my larval survival data. I think the one I have is sufficient for now…?


Figure 5. Larval mortality, separated by family.

from yaaminiv.github.io http://ift.tt/2jwuGbO

Yaamini’s Notebook: Remaining Analyses Part 10

Following through on my plan

…kind of.

I started to look up tidal data so I can clip out any low tide and exposure values, as Micah suggested. However, I could not find any tidal information for the Skokomish River Delta site on that database. The website had me pick cities that were closest to the outplant sites. Here’s what I decided on.

Vaughn, Case Inlet

Port Gamble, Port Gamble Bay

Anacortes, Fidalgo Bay

Nahcotta, Willapa Bay

I’m also unclear as to what exposure we need to clip out. Are we taking out any data from all low tides, or just from complete instrument exposure? How do we know what the difference is? It seems like I need the depth data from Micah to proceed.

Right now I think it makes the most sense to just visualize the salinity data, without clipping out any funky values (it seems like Micah already did some of that anyways…?) and then see if it will be a candidate explanatory variable for my protein expression. I’ve also been rethinking my original idea for a regression analysis. I think the power of such an analysis would be compromised since I only have protein expression from the very end of the experiment, as opposed to multiple time points. Condensing such high resolution environmental data into one number (mean or median) to build a model off of seems unwise. I think I’m going to shoot an email off to Julian to see if he has any ideas on how to tackle this.

from yaaminiv.github.io http://ift.tt/2BuGsff

Laura’s Notebook: Finalized list of SRM data analysis stats

The Thanksgiving weekend provided time to ponder/reflect on the SRM stats that I’ve run thus far, what else needs to be done, and how to finish up. Received guidance from Dave-o. Note: next time, include a housekeeping protein in list of targets.

This is my “finalized” list of SRM & environmental stats to run. In the last few days I’ve completed much of this. In bold are those remaining. Tomorrow I’ll hopefully be done, and will post my scripts and results.

SRM Protein & Environmental Data Analysis Steps

Each Protein: (assume proteins are independent)

  1. Test for normality
  2. Lambda transformation
  3. Test for normality post transformation
  4. Assess outliers, remove if necessary
  5. N-way ANOVA by: a) location b) habitat c) site d) region
  6. Determine P-adjusted, correct for multiple comparisons (bonferroni method, P/13)
  7. Post-hoc test to ID differences (is this really necessary?)
  8. Ultimate goal: which proteins are different between locations?
  9. Compare total abundance between sites (sum peptide abundance)

Each environmental variable:

  1. Download tidal chart data for each site
  2. Edit pH, DO & Salinity data:
    a. Remove data from exposed time points, as determined from tidal charts
    b. Identify and remove outliers from pH, DO & Salinity data
    c. Recombined outlier-scrubed data with Temp, Tide data.
  3. Assess Normality of each env. variable (all time points)
    • Found to be non-parametric (pH is kinda, but let’s assume not). Dataset is large (>6000 for each parameter), so did not determine lambda via tukeytransform function. Instead, used Krusgal-Wallis non-parametric analysis in lieu of ANOVA
  4. KW test for each env. variable by location, by region
  5. Dunn Test post-hoc test to ID differences
  6. Use bonferroni correction for P-adjusted in tests
  7. Ultimate goal: which env. variables are different between locations?
    a. basically all of them.

Prep for regression model:

  1. Calculate summary statistics: mean, variance, sd, min, max, median, %>1 sd from mean, %>2 sd from mean
  2. Plot() all env. variables- are any linearly related, aka not independent? If so, need to include interaction parameter in regression model.
  3. Plot() protein peptides against each other to confirm linear correlation; equation should be ~1:1.
  4. If all correlated select 1 peptide to use in regression model; highest abundance is best.

Run regression models for each representative peptide:

  1. Step-wise linear regression models with all env. variables; I would expect that only the variables that were found to be different via the ANOVA would significantly contribute to the model
  2. General linear model with variables ID’d in step-wise lm
  3. Figure out when to add a constant, and if I should do that in this scenario
  4. Run anova on best fit model, find P-value of the env. variables to determine confidence in the influence of each env. variable on proteins.
  5. Run model on the other peptides in the protein (not used as representative peptides); ID the R^2 and P values

from LabNotebook http://ift.tt/2Ag6i95

Grace’s Notebook: Juneau, November 28, 2017

Today Pam and I cleaned and packed up the lab!

We sent the coolers, pumps, and other things to the Kodiak Lab, and sent some of her things back to Seattle.

We power washed the tanks and scrubbed them down.

Pam has to send samples with ethanol, and they can only be sent through UPS. In Juneau, UPS is only open from 7pm-8pm.

Tomorrow Pam is giving a talk around 11am about this project. We have a few more things to clean and pack up, and then we’ll be done!

Yaamini’s Notebook: Remaining Analyses Part 9

My final NMDS

The first way I thought I could tie together environmental data with protein expression was an NMDS plot with eigenvectors based on environmental variables. I realized that wasn’t possible only after I made my NMDS by region in this script.


Figure 1. NMDS for protein expression by region (Puget Sound vs. Willapa Bay).

I actually ended up with a significant ANOSIM result when splitting up the regions! I got R = 0.2368 instead of an expected R of -0.002, with a p-value of 0.31.

Since the eigenvector thing isn’t going to work out, here’s my game plan to create figures with protein expression, environmental variable and biomarker data:

  • Clip out low tide/exposure recordings from salinity, pH, and dissolved oxygen data
  • Visualize diurnal fluctuations
    • Salinity
    • pH
    • DO
  • Boxplots
    • Salinity
    • pH
    • DO
  • Calculate maximum, minimum, mean, and median for variables that show differences in Willapa Bay. So far, these are temperature, delN, and tissue mass
  • Remake boxplots for significantly different peptides with secondary y-axes for variables found to be significant in the previous step
  • Add lines denoting calculated maximum, minimum, etc. on top of boxplots for protein expression

from yaaminiv.github.io http://ift.tt/2zBfrFl

Yaamini’s Notebook: Remaining Analyses Part 8

Visualizing environmental and biomarker data

I spent time recently making some figures and trying to figure out which environmental variables and biomarkers could explain my protein expresison patterns.

Environmental variables:

Based on discussion from our second proteomics meeting, temperature looked like a candidate driver for protein expression differences in Willapa Bay. I visualized the diurnal fluctuations and added mean and median lines to a multipanel plot in this R script.


Figure 1. Diurnal temperature fluctuations within each site and habitat. The mean and median temperatures between habitats are not very different within sites, so it could be possible to pool or average the data somehow.

I also made a boxplot for temperatures at each site. The ANOVA was significant, and a post-hoc Tukey HSD test showed that all sites were significantly different from eachother. I think this is because of the fluctuations between each site. The next step would be to do a “sliding window analysis,” where I run an ANOVA for 14 time points between sites, move over 7 time points, and then run the ANOVA again. Essentially, I would be able to isolate windows of time where there were siginficant temperature differences and where there weren’t.


Figure 2. Temperature at each site. Willapa Bay had significantly different temperatures.u

I also played around with the idea of a multipanel plot that puts the diurnal fluctuations and temperature boxplot side-by-side. I only made one of these since I don’t know what Steven would think of it. I like how it shows which temperatures were “extremes” and when those extremes were.


Figure 3. Multipanel plot with diurnal temperature fluctuations and quartiles for bare habitat at Case Inlet.

Micah sent over salinity data and low tide information, so I need to work with that to remove data we don’t trust from pH, DO and salinity. After that, I can see if any of those environmental variables explain higher protein abundance at Willapa Bay.


Alex sent over two different datasheets: data for all oysters he sampled and data for just the oysters I sampled. I decided to stick to the latter for now.

I generated boxplots for each biomarker in this R script. I then generated a table of ANOVA and Tukey HSD results. The annotated version can be found below.


Figure 4. ANOVA and post-hoc Tukey HSD results.

None of the results really jump out as being explanations for higher protein abundance in Willapa Bay. Tissue mass is significantly different between Willapa Bay-Fidalgo Bay (p = 5.37e-05) and between Willapa Bay-Port Gamble Bay (p = 0.05394943). Percent N was also significantly different between Willapa Bay-Fidalgo Bay (p = 0.08942135) and Willapa Bay-Port Gamble Bay (p = 0.09671321). According to Alex’s notes, relatively higher percent N is a good thing, and Willapa Bay had higher percent N than Fidalgo Bay and Port Gamble Bay. I’ll run this by him on Friday when we meet.

My next step now is to tie together protein expression and these variables in one figure. To the (literal) drawing board!

from yaaminiv.github.io http://ift.tt/2zNMVo8