Shelly’s Notebook: Fri. Jan 11, 2019 Clustering technical replicates

I compared NMDS and PCA plots of technical replicate ADJNSAF values in this md file. Rmarkdown here

imgimgimgimg Following Emma’s NMDS analysis with log tranformed ADJNSAF values and using Bray-curtis distance. This looks the same as the plot above which was euclidean distance. img

Overall, it seems like the PCAs are a little more informative than the NMDS. I’m wondering why we use NMDS for this analysis as opposed to PCA?

Looking at the PCAs, it’s a little concerning that the technical replicates don’t cluster better, particularly for sample 23C day 9. It seems there may be some unreliably detected proteins between technical replicates (e.g. 0 in one replicate and > 200 in the other), so I will try to go through the data to weed those out if possible.

It’s interesting in the PCA below that some of the 29C proteomes at earlier time points cluster with 23C proteomes of later time points. For example, green squares with blue triangles, blue squares with pink triangles, pink squares with gold triangles. img

Perhaps there is a a development program that is sped up in the 29C proteomes? Will need to further investigate.

from shellytrigg

Shelly’s Notebook: Fri. Jan 11, 2019 Oyster Seed Proteomics

In trying to run NMDS analysis on technical replicate ADJNSAF data, I found discrepencies between the ADJNSAF values in Steven’s ABACUS_output021417NSAF.tsv and Sean’s Abacus_output.tsv. I compared Steven’s ABACUS_output021417.tsv file (from which he made ABACUS_output021417NSAF.tsv, see his jupyter notebook with Sean’s Abacus_output.tsv and found no difference:

R code for comparing files

 install.packages("arsenal") library(arsenal) #Compare 02/14/2017 data with Sean's march 1 data data_SR <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE) data_SB <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE) compare(data_SR,data_SB) #Output: #Compare Object #Function Call: # = data_SR, y = data_SB) #Shared: 457 variables and 8443 observations. #Not shared: 0 variables and 0 observations. #Differences found in 0/456 variables compared. #0 variables compared have non-identical attributes. ###SHOWS NO DIFFERENCES BETWEEN FILES  

confirmed by command line diff command

#D-10-18-212-233:Desktop Shelly$ diff ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_outputMar1.tsv ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv #D-10-18-212-233:Desktop Shelly$

The values in Steven’s ABACUS_output021417NSAF.tsv are in fact NUMSPECSADJ values

R code to determine what the values in ABACUS_output021417NSAF.tsv are

 data_SR_NSAF <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417NSAF.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE) data_SB_NUMSPECADJ <- data_SB[,c(1,grep("NUMSPECSADJ", colnames(data_SB)))] colnames(data_SB_NUMSPECADJ) <- gsub("NUMSPECSADJ","ADJNSAF", colnames(data_SB_NUMSPECADJ)) compare(data_SR_NSAF,data_SB_NUMSPECADJ) #Output: #Compare Object #Function Call: # = data_SR_NSAF, y = data_SB_NUMSPECADJ) #Shared: 46 variables and 8443 observations. #Not shared: 0 variables and 0 observations. #Differences found in 0/45 variables compared. #0 variables compared have non-identical attributes. ###SHOWS NO DIFFERENCES BETWEEN FILES SO VALUES IN ###ABACUS_output021417NSAF.tsv ARE ACTUALLY ###NUMSPECADJ VALUES!!!!  

Determined values in ABACUS_output021417NSAF.tsv are in fact NUMSPECADJ values

Determined values in Kaitlyn’s file ABACUSdata_only.csv are in fact the averages of technical replicate NUMSPECADJ values

Next steps:

  1. NMDS analysis
    • extract ADJNSAF values from ABACUS_output021417.tsv
    • Find appropriate data transformation/normalization if necessary
      • Emma log transformed her NSAF values before doing NMDS
    • try NMDS again
    • determine if replicates can be pooled
  2. Try downstream analyses with NUMSPECSTOT values from ABACUS_output021417.tsv
    • if it makes sense to sum NUMSPECSTOT values for replicates, try that and then try running stats on those values
  3. Determine what values make sense to use in Hierarchical clustering analysis and ASCA, then re-do those analyses
  4. Look more closely at development over time
    • Try a fold-change analysis with each developmental time point relative to day 0

from shellytrigg