Kaitlyn’s notebook: NSAF vs NUMSPEC and working draft methods

Shelly and I are working on putting together a paper focusing on proteomics for the 2016 oyster seed data. Here is the working draft!

We also found out that the data file I have been working with is not NSAF (normalized spectral abundance factor) values. Instead they are spectral peak values (NUMSPEC) which do not really correlate to protein abundance values.. All downstream analyses need to be done on NSAF values, but we are now sure that these are the correct values and that a redone file will export in the same format.

I went through Emma’s notebook today and found some information on technical replicates, MS preparation and experimental justification to name a few things. Important/relevant posts are linked in the draft that can be discussed and used to help discern the parts of the method we are unsure about.

We are focusing on the methods section. I added in information about hierarchical clustering and a fold change analysis. We are unsure if fold change analysis will stay in the paper. Previously we had down this with the NUMSPEC data, so in order to know if it will stay in the methods section, I need to redo it with the new NSAF data. Here is what I’ve done so far:

First I needed to organize the .tsv file. It contains several other NSAF values and spectral values. I know from the above issue that adjusted NSAF values should be used so I extracted all columns containing ‘ADJNSAF’. Next, I had to remove everything but the sample name which was tricky since the sample names are not the same length. Another tough aspect of rearranging this data sheet is that the sample names do not intuitively correspond to the silo, temperature or day as seen here, but another benefit to taking the time to reformat the data, is that it can be put through the other scripts much easier now to generate a new clustering and ASCA heatmap.

All silo 3 and 9 samples from day 0 (competent larvae) to day 13:

  • 1- s0d0
  • 4- s3d3
  • 8- s9d3
  • 12- s3d5
  • 16- s9d5
  • 20- s3d7
  • 24- s9d7
  • 28- s3d9
  • 32- s9d9
  • 36- s3d11
  • 40- s9d11
  • 44- s3d13
  • 48- s9d13

Row means were taken followed by a foldchange analysis for each day. Originally we removed fold changes less than 2 (as written in the methods), but that leaves many NA values for proteins that can’t be visualized. I need to find a way to remove NAs and infinity values without setting the cutoff value for this reason.