Kaitlyn’s notebook: Unique proteins

 

Parsing out unique proteins from hierarchical clusters:

I updated the method to average rather than ward.D2 (Ward 1963) because it changed the cophenetic correlation from 0.6299225 to 0.9433488. The typical accepted value is at least 0.75. There is now a total of 41 clusters (note the agglomerative coefficient = 0.9959262).

Most proteins are in cluster 1 (14381/14510 = 98.68%). However, because the cluster hierarchy represents the original object-by-object dissimilarity matrix so well, I am not so worried about this.

freq-table-3_9

Now I want to remove proteins that are in the same cluster. I’m guessing most of cluster 1 will be removed, but this will help remove many proteins (and thus noise) in my data and highlight the proteins that cluster differently based on abundance. My attempts to solve this problem are in this issue.

This spreadsheet contains a list of proteins that were uniquely abundant between silos.

Now I want to move on to gene enrichment of these proteins, but I need to determine the most appropriate background for DAVID or what the background is for CompGO.

The importance of a genetic background. I’m going to use all of the proteins detected in silo 3 and 9 which were all used for the cluster analysis. I chose not to use the proteins that may be unique in silo 2 since it was not included in this analysis, however it will be worthwhile in the future to do the same methods done here between silos 2 and 3 since there were mortality differences despite the silos being the same temperature. I will only use detected proteins in silos 2 and 3 at that point, and depending on the results, I could redo the same method with all silos and use silos 2 and 3 as a replicate. The proteins would be distinguished by temperature only at this point rather than by group.


Quick note on small things I learned so I won’t forget down the road…

  • I was having some problems with git finding removed files and Github wanting to track them. This code fixes that in a user friendly way:
git clean -i -fd
    • -i for interactive
      -f for file
      -d for directory
      • Note: Add -n or –dry-run to just check what it will do.
  • R is a 1 based index.

I also installed Shutter on Roadrunner using the Ubuntu Software (App store) for image editing to post my issue.