# Kaitlyn’s Notebook: Basic Statistical ‘Tags’

I updated an excel spreadsheet so it has multiple stats that I thought might be useful to see any patterns in expression. There are multiple sheets on the file: combined data with few tags followed by silo 2, 3 and 9 with all tags. The tags and why they may be helpful in seeing protein expression patterns are listed below.

1. Average- is this protein typically highly or lowly expressed?
2. Standard Deviation- how much does each day deviate from one another on average?
3. Coefficient of Variance – normalized variance; how dispersed the protein expression is
4. Variance- less useful than (3) however another representation of the dispersion of protein expression
5. Median- valuable if compared to the average protein abundance to understand if protein expression is consistent
6. Slope- liner regression to understand overall trend of protein expression (decreasing vs. increasing)
7. Kurtosis- understand if the protein has a sharp peak in protein expression
8. Skewness- informs us if the protein is being expressed more in a certain hald of the experiment
9. Max- is the protein expressed a lot at any point in the experiment?
10. Min- is there a time when the protein is not expressed?
11. Range- the overall change in protein expression (does not inform us whether it is increasing or decreasing)
12. 1st quartile- What is the cutoff for 25% expression over the course of the experiment?
13. 4th quartile- What is the cutoff for 75% expression over the course of the experiment?
14. Sum- determines if the protein was highly abundant over the course of the experiment (relative to the sums of other proteins)
15. Day0:Day15- a ratio of the day before treatment to the final day of the experiment; informs us if the protein significantly changed after treatment
16. Day3:Day15- a ratio of the first day of measured day of treatment over the final day of treatment
17. Average for Days 0-7- valuable when compared to the second average to see if there was a change in protein expression half way through the larvas’ lives
18. Average for Days 9-15- a compliment to the above tag
19. Range for Days 0-7- valuable when compared to the range for days 9-15; further elucidates changes in expression between the first half of the experiment and the second half
20. Sum:Total Proteins Identified- what percentage of the total proteins in the experiment are caused by expression of this protein?

I’m not sure how else I should proceed with this data. I could potentially look at gene enrichment, but I believe that a significant portion of proteins should be eliminated before hand. Knowing which proteins to eliminate can be difficult because each ‘tag’ can highlight a new trait of that protein. Therefore, eliminating proteins will mostly depend on future interests for this data set.