Shelly’s Notebook: Fri. Jan 11, 2019 Oyster Seed Proteomics

In trying to run NMDS analysis on technical replicate ADJNSAF data, I found discrepencies between the ADJNSAF values in Steven’s ABACUS_output021417NSAF.tsv and Sean’s Abacus_output.tsv. I compared Steven’s ABACUS_output021417.tsv file (from which he made ABACUS_output021417NSAF.tsv, see his jupyter notebook https://github.com/sr320/nb-2017/blob/master/C_gigas/04-Exploring-Abacus-out.ipynb) with Sean’s Abacus_output.tsv and found no difference:

R code for comparing files

 install.packages("arsenal") library(arsenal) #Compare 02/14/2017 data with Sean's march 1 data data_SR <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE) data_SB <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE) compare(data_SR,data_SB) #Output: #Compare Object #Function Call: # compare.data.frame(x = data_SR, y = data_SB) #Shared: 457 variables and 8443 observations. #Not shared: 0 variables and 0 observations. #Differences found in 0/456 variables compared. #0 variables compared have non-identical attributes. ###SHOWS NO DIFFERENCES BETWEEN FILES  

confirmed by command line diff command

#D-10-18-212-233:Desktop Shelly$ diff ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_outputMar1.tsv ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv #D-10-18-212-233:Desktop Shelly$

The values in Steven’s ABACUS_output021417NSAF.tsv are in fact NUMSPECSADJ values

R code to determine what the values in ABACUS_output021417NSAF.tsv are

 data_SR_NSAF <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417NSAF.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE) data_SB_NUMSPECADJ <- data_SB[,c(1,grep("NUMSPECSADJ", colnames(data_SB)))] colnames(data_SB_NUMSPECADJ) <- gsub("NUMSPECSADJ","ADJNSAF", colnames(data_SB_NUMSPECADJ)) compare(data_SR_NSAF,data_SB_NUMSPECADJ) #Output: #Compare Object #Function Call: # compare.data.frame(x = data_SR_NSAF, y = data_SB_NUMSPECADJ) #Shared: 46 variables and 8443 observations. #Not shared: 0 variables and 0 observations. #Differences found in 0/45 variables compared. #0 variables compared have non-identical attributes. ###SHOWS NO DIFFERENCES BETWEEN FILES SO VALUES IN ###ABACUS_output021417NSAF.tsv ARE ACTUALLY ###NUMSPECADJ VALUES!!!!  

Determined values in ABACUS_output021417NSAF.tsv are in fact NUMSPECADJ values

Determined values in Kaitlyn’s file ABACUSdata_only.csv are in fact the averages of technical replicate NUMSPECADJ values

Next steps:

  1. NMDS analysis
    • extract ADJNSAF values from ABACUS_output021417.tsv
    • Find appropriate data transformation/normalization if necessary
      • Emma log transformed her NSAF values before doing NMDS
    • try NMDS again
    • determine if replicates can be pooled
  2. Try downstream analyses with NUMSPECSTOT values from ABACUS_output021417.tsv
    • if it makes sense to sum NUMSPECSTOT values for replicates, try that and then try running stats on those values
  3. Determine what values make sense to use in Hierarchical clustering analysis and ASCA, then re-do those analyses
  4. Look more closely at development over time
    • Try a fold-change analysis with each developmental time point relative to day 0

from shellytrigg http://bit.ly/2TJFnZc
via IFTTT