Yaamini’s Notebook: MEPS Revisions Part 3

Final unconstrained and constrained ordinations

I finally figured out all of my N/A problems and completed my unconstrained and constrained ordinations!

Protein abundance

I didn’t have any problems with my protein abundance NMDS and previously obtained pairwise ANOSIM R-statistic and p-values. In this R script, I visualized my ordinations. You can find one version with sample numbers here, and another with site and habitat distinctions here. Because I exported my files as pdfs and don’t feel like changing the code and re-exporting files, I’m also including screenshots of the ordinations.

screen shot 2018-11-29 at 10 58 33 am

screen shot 2018-11-29 at 10 58 49 am

Figures 1 and 2. Protein abundance ordinations with confidence ellipses.

Environmental data

My original problem with the environmental data was that I had N/As in my dataframe, but was unable to use method = "gower" with the metaMDS function in the package vegan. Turns out vegan uses vegdist, which doesn’t handle N/As. Instead, I used the daisy function in the cluster library to calculate a Gower’s dissimilarity (distance) matrix. I then inputted that matrix directly into metaMDS with method = "euclidean". It worked! My code can be found here. I then conducted an Analysis of Similarity (ANOSIM) to assess the significance of my results.

Table 1. One-way ANOSIM results for environmental data NMDS based on site (Case Inlet vs. Fidalgo Bay vs. Port Gamble Bay vs. Skokomish River Delta vs. Willapa Bay), habitat (bare vs. eelgrass), parameter (mean vs. variance), and environmental variable (pH vs. dissolved oxygen vs. salinity vs. temperature). Significant p-values are bolded.

One-way ANOSIM R p-value
Site -0.03428841 0.943
Habitat -0.02037461 0.982
Parameter 0.7378174 0.001
Environmental Variable 0.1796322 0.001

Only the parameter and environmental variable ANOSIMs were significant. My guess is that because I ordinated means and variances in the same space, this is skewing my results. I decided to conduct two-way ANOSIMs to go further.

Table 2. Two-way ANOSIM results for environmental data NMDS. Significant p-values are bolded.

Two-way ANOSIM R p-value
Site and Habitat -0.09991229 1
Parameter and Site 0.4778027 0.001
Parameter and Habitat 0.4778027 0.001
Environmental Variable and Site 0.02098092 0.315
Environmental Variable and Habitat 0.1089352 0.007
Environmental Variable and Parameter 0.7802082 0.001

When accounting for parameter, site and habitat were significant. When accounting for environmental variable, only habitat was significant. I included environmental variable and parameter, but I already figured that would be significant. I need to double check with Julian, but I belive conducting pairwise ANOSIMs based on the significant two-way ANOSIM results will help me understand what is driving the significant results.

I also visualized my ordination here.

screen shot 2018-11-29 at 11 08 28 am

Figure 3. Environmental data NMDS. Means are outlined in solid lines, variances are outlined in dashed lines.

Constrained ordination

In this R Markdown file, I conducted a constrained ordination to look at relationships in protein abundance based on my environmental data. To do this, I first created an environmental data matrix with mean and variance values from the entire outplant, and matched them with my objects, oyster sample IDs. This meant I had the same objects in both my protein abundance and environmental data matrices. I calculated the gradient length and found that the underlying model was linear, so I used an RDA. I specified na.action = na.exclude in my rda function so it could handle my missing values.

Table 3. Variance explained by constrained ordination. The RDA explains 29.21% of all variance.

Variance Partition Inertia Proportion
Total 0.015616 1
Constrained 0.004561 0.2921
Unconstrained 0.011055 0.7079

Table 4. ANOVA results for overall RDA analysis. With F(6,19) = 1.3064 and p-value = 0.195, the RDA is not significant.

Partition Df Variance F Pr(>F)
Model 6 0.0045607 1.3064 0.195
Residual 19 0.0110550

Table 5. Significance of each RDA axis. No axis is significant.

Axis Df Variance F Pr(>F)
RDA1 1 0.0023293 4.0033 0.316
RDA2 1 0.0013380 2.2995 0.603
RDA3 1 0.0003888 0.6682 0.998
RDA4 1 0.0003065 0.5268 0.994
RDA5 1 0.0001395 0.2397 0.999
RDA6 1 0.0000586 0.1008 0.999

Table 6. Significance of each environmental descriptor in the RDA. Temperature mean and variance are marginally significant predictors.

Environmental Variable Df Variance F Pr(>F)
Temperature Mean 1 0.0013542 2.3275 0.065
Temperature Variance 1 0.0012584 2.1627 0.067
pH Mean 1 0.0003063 0.5264 0.781
pH Variance 1 0.0007082 1.2171 0.301
Dissolved Oxygen Mean 1 0.0006024 1.0354 0.349
Dissolved Oxygen Variance 1 0.0003312 0.5692 0.725
Residual 19

One weird thing I noticed is that my salinity variables do not show up in Table 6 or my ordinations below. That’s something I’ll have to figure out with Julian. Other than that, it’s interesting to note that my RDA really isn’t significant, which fits the loose interpretation we had earlier of environment affecting protein abundance in the manuscript.

To visualized my contrained ordination, I included all proteins and environmental descriptors here, and only significant proteins and environmental descriptors here.

screen shot 2018-11-29 at 11 06 57 am

screen shot 2018-11-29 at 11 07 06 am

Figures 4-5. RDA visualization including all proteins and descriptors (top) and only those that are significant (bottom).

Going forward

  1. Discuss all results with Julian
  2. Create a better ordination visual
  3. Update manuscript with new methods and results
  4. Tweak discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2ACXXui
via IFTTT

Kaitlyn’s notebook: Proteomics paper

Goals

Referenced from Shelly’s post

  • Compare ASCA proteins (high loadings) with hierarchical cluster (differentially clustered) proteins
    • make raw abundance line plots facetted by protein
    • examine if GO enrichment changes when ASCA and cluster proteins are combined
  • Determine how time is factored into the cluster
  • Determine if the permutation test with ASCA tests needs to be improved for the high loadings proteins to be considered highly influential
  • Redo BLAST of CHOYP proteins to 2018 Uniprot database
  • Begin identifying and locating data files that we need to deposit in public protein repository (i.e. ProteomeXchange and PeptideAtlas)
    • need to regenerate ‘table_blastout_gigatonpep-uniprot’
    • need fasta file of the peptides ID’d by mass spec
    • make a simplified supplementary table containing CHOYP IDs, UniProt Accessions, e.val, Protein names, Gene names
      • Can modify current datasheet to get this info as well

Completed tasks

Paper questions

Question: Does temperature influence the proteome of larval C. gigas, and if so, how?

Referenced from Shelly’s post

  • Do we need to explain we did 2 x 4 treatments, or just say we did 1 x 2 treatments?
  • Do we have survival data for other silos to compare to silo 2, 3, and 9?
    • Can we rule out silo 2 as an anomaly or should we include it?

Kaitlyn’s notebook: clustering without day 0

I redid the hierarchical clustering of the combined silo 3 and 9 datasheet without day 0.

Include Day 0 Remove Day 0
Agglomerate coefficient 0.9964979 0.9976217
Cophenetic correlation 0.9477519 0.9630485
Clusters 41 84
Diferentially clustered proteins 33 213

31 “diferentially clustered proteins” remained differentially clustered whether day 0 was included or excluded. So, removing day 0 causes more proteins to cluster separately (the agglomerate coefficient is slightly increased).

  • i.e., removing day 0 causes more proteins to be identified as having different abundances.
    • All abundances are the same for both silos on day 0 since no treatment had been administered yet.
      • Removing day 0 means that only days following treatment are analyzed which makes more sense since we are attempting to identify proteins that have different abundances during treatment only.

 

This slideshow requires JavaScript.

This slideshow requires JavaScript.