Kaitlyn’s Notebook: Basic Statistical ‘Tags’

I updated an excel spreadsheet so it has multiple stats that I thought might be useful to see any patterns in expression. There are multiple sheets on the file: combined data with few tags followed by silo 2, 3 and 9 with all tags. The tags and why they may be helpful in seeing protein expression patterns are listed below.

  1. Average- is this protein typically highly or lowly expressed?
  2. Standard Deviation- how much does each day deviate from one another on average?
  3. Coefficient of Variance – normalized variance; how dispersed the protein expression is
  4. Variance- less useful than (3) however another representation of the dispersion of protein expression
  5. Median- valuable if compared to the average protein abundance to understand if protein expression is consistent
  6. Slope- liner regression to understand overall trend of protein expression (decreasing vs. increasing)
  7. Kurtosis- understand if the protein has a sharp peak in protein expression
  8. Skewness- informs us if the protein is being expressed more in a certain hald of the experiment
  9. Max- is the protein expressed a lot at any point in the experiment?
  10. Min- is there a time when the protein is not expressed?
  11. Range- the overall change in protein expression (does not inform us whether it is increasing or decreasing)
  12. 1st quartile- What is the cutoff for 25% expression over the course of the experiment?
  13. 4th quartile- What is the cutoff for 75% expression over the course of the experiment?
  14. Sum- determines if the protein was highly abundant over the course of the experiment (relative to the sums of other proteins)
  15. Day0:Day15- a ratio of the day before treatment to the final day of the experiment; informs us if the protein significantly changed after treatment
  16. Day3:Day15- a ratio of the first day of measured day of treatment over the final day of treatment
  17. Average for Days 0-7- valuable when compared to the second average to see if there was a change in protein expression half way through the larvas’ lives
  18. Average for Days 9-15- a compliment to the above tag
  19. Range for Days 0-7- valuable when compared to the range for days 9-15; further elucidates changes in expression between the first half of the experiment and the second half
  20. Sum:Total Proteins Identified- what percentage of the total proteins in the experiment are caused by expression of this protein?

I’m not sure how else I should proceed with this data. I could potentially look at gene enrichment, but I believe that a significant portion of proteins should be eliminated before hand. Knowing which proteins to eliminate can be difficult because each ‘tag’ can highlight a new trait of that protein. Therefore, eliminating proteins will mostly depend on future interests for this data set.