Kaitlyn’s notebook: Bioanalyzer and Clustering

Bioanalyzer Results

I ran the broodstock DNA extractions on the Bioanalzyer today (10/10). Here are the results. For the first chip, the samples are:

1 – 10-3
2- 3-T1
3- UK-05
4- 12-T6
5- UK-06
6- 8-T2
7- 11-T4
8- UK-02
9- 7-T2
10- UK-08
11- 5-T3

The second chip is already labelled, and I reran UK-08  on it as well. Unfortunately, during preparation of the chip with the gel dye matrix, the plunger clip released to a higher position. I think this disrupted the gel matrix in the chip and caused some errors on the gel matrix although the electrophregram.


Previously, I was attempting to use a Bray-Curtis distance matrix which was used on my  cluster analysis for each individual silo. However, bray-curtis is an asymmetrical analysis such that any double zero values will be removed. My data contains several double zero values (where abundances weren’t detected for multiple days or in multiple silos for a day), but that is relevant information when examining the pattern of abundances. I choose to use a euclidean distance matrix instead because it is the most commonly used symmetrical distance matrix. I also changed the method I was using. Average is still a good choice but I think Ward’s method (1963) will be better since it is less sensitive to outliers. It minimizes the error of sum of squares by minimizing the increase in the sum of squares distances at each step. Average gave the best cophentic correlation (0.9433488). This should be done on my previous clusters for each individual silo if we choose to utilize those plots. This is not kmeans clustering- it is hierarchical clustering.

The table can be found here (along with a frequency table, scree plot, dendrogram and line plots which are pasted below for convenience.)

In the table, there is a protein for Silo 3 and a protein for Silo 9 so there are 2 of each proteins. Clusters are represented in a separate column. I want to determine what proteins were sorted into the same cluster for each silo. Obviously, I could do this manually but that would take a while. Next, I need a code that removes a protein if the cluster is the same for the duplicated protein.