Kaitlyn’s notebook: finalizing the methods and cluster heatmap

Shelly and I have developed nearly all of the methods. We need to develop and decide what we will use to either quantify the differences in the differentially abundant proteins we identify (eg. pairwise comparisons) or describe the differences (eg. gene enrichment). I also am working on identifying useful papers for the introduction and background on larval/seed mortality and temperature.
I really liked Emma’s presentation and take on the geoduck project. Since we have a very similar dataset I decided to run the deferentially identified proteins in DAVID again since we have found out that the values I was previously working with were not NSAF values. My idea was that different proteins could have been identified and led to different and possibly more significant results here that may be worth doing more digging in to. I choose to run only the differentially clustered proteins as a quick test to see if this was a method we should pursue (granted the drawback is there are only 32 proteins in this list, of which 18 do not have accessions and only 10 mapped to a DAVID ID). Here are the results: Nothing is signifigant by a Benjamini corrected p-value but translation and organonitrogen translation was. Translation is broad but is an important process for growth. Organonitrogen biosynthesis is too broad for me to make any sort of guess.
Overall I would say these results were not really useful. Emma went through her data by cluster. I can look into that after the pairwise comparison decisions and analysis is completed.
Note that 18 of the 32 proteins identified by differential hierarchical clustering do not have associated Uniprot accessions/annotations.
This is interesting because it seems like the proteins that have differential protein abundance between the two temperature treatments may be oyster specific, or at the very least unidentified. If we decide to pursue gene enrichment then we should blast the .fasta file against Uniprot or another database again to be sure of this.

In lab meeting we decided to keep all proteins and not filter. I reran the clustering code to see if this significantly changed the heatmap, and there are some changes but three groups of proteins are still identified by the dendrogram. I edited the new heatmap and added it to our document with a comment for interpretation. I also added the protein abundance plot to compare the differences showing protein abundance over time in line plots vs the heatmap.

A pairwise comparison will help to better quantify the differences between the treatments after the ASCA and hierarchical cluster analysis identify deferentially abundant proteins. I have a script and previously added in fold change analysis to the methods. I think if we do fold change, we may need to consider comparing the day before rather than every day compared to day 0. However, I also want to look into other pairwise comparisons such as a T-test.

I also looked into the differences in the way my line plots look compared to Emma’s line plots for the geoduck project. It appears this difference is based on the dissimilarity matrix. If I use bray-curtis to create my dissimilarity matrix, this is what my deferentially clustered proteins look like. There is about 3800 proteins here.

I will also upload my new cluster and heatmap code since they are highly edited now (the workflow is mostly the same).