After quantifying Ronit’s RNA earlier today, I DNased them using the Turbo DNA-free Kit (Ambion), according to the manufacturer’s standard protocol.
Used 1000ng of RNA in a 50uL reaction in a 0.5mL thin-walled snap cap tube. Samples were mixed by finger flicking and then incubated 30mins @ 37oC in a PTC-200 thermal cylcer (MJ Research), without a heated lid.
DNase inactivation was performed (0.1 volumes of inactivation reagent; 5uL), pelleted, and supe transferred to new 1.7mL snap cap tube.
Samples were stored on ice in preparation for qPCR to test for residual gDNA.
DNase calculations are here:
Samples will be permanently stored here (Google Sheet):
from Sam’s Notebook https://ift.tt/2J3c0Nl
Parsing out unique proteins from hierarchical clusters:
I updated the method to average rather than ward.D2 (Ward 1963) because it changed the cophenetic correlation from 0.6299225 to 0.9433488. The typical accepted value is at least 0.75. There is now a total of 41 clusters (note the agglomerative coefficient = 0.9959262).
Most proteins are in cluster 1 (14381/14510 = 98.68%). However, because the cluster hierarchy represents the original object-by-object dissimilarity matrix so well, I am not so worried about this.
Now I want to remove proteins that are in the same cluster. I’m guessing most of cluster 1 will be removed, but this will help remove many proteins (and thus noise) in my data and highlight the proteins that cluster differently based on abundance. My attempts to solve this problem are in this issue.
This spreadsheet contains a list of proteins that were uniquely abundant between silos.
Now I want to move on to gene enrichment of these proteins, but I need to determine the most appropriate background for DAVID or what the background is for CompGO.
The importance of a genetic background. I’m going to use all of the proteins detected in silo 3 and 9 which were all used for the cluster analysis. I chose not to use the proteins that may be unique in silo 2 since it was not included in this analysis, however it will be worthwhile in the future to do the same methods done here between silos 2 and 3 since there were mortality differences despite the silos being the same temperature. I will only use detected proteins in silos 2 and 3 at that point, and depending on the results, I could redo the same method with all silos and use silos 2 and 3 as a replicate. The proteins would be distinguished by temperature only at this point rather than by group.
Quick note on small things I learned so I won’t forget down the road…
- I was having some problems with git finding removed files and Github wanting to track them. This code fixes that in a user friendly way:
git clean -i -fd
- -i for interactive
-f for file
-d for directory
- Note: Add -n or –dry-run to just check what it will do.
- R is a 1 based index.
I also installed Shutter on Roadrunner using the Ubuntu Software (App store) for image editing to post my issue.
Last Friday, Ronit quantified 1:10 dilutions of the RNA I isolated on 20181003 and the RNA he finished isolating on 20181011, but two of the samples (D11-C, T10-C) were still too concentrated.
I made 1:20 dilutions (1uL RNA in 19uL 0.1% DEPC-treated H2O) and quantified them using the Roberts Lab Qubit 3.0, with the RNA HS assay. Used 1uL of the diluted RNA.
I ran the broodstock DNA extractions on the Bioanalzyer today (10/10). Here are the results. For the first chip, the samples are:
1 – 10-3
The second chip is already labelled, and I reran UK-08 on it as well. Unfortunately, during preparation of the chip with the gel dye matrix, the plunger clip released to a higher position. I think this disrupted the gel matrix in the chip and caused some errors on the gel matrix although the electrophregram.
Previously, I was attempting to use a Bray-Curtis distance matrix which was used on my cluster analysis for each individual silo. However, bray-curtis is an asymmetrical analysis such that any double zero values will be removed. My data contains several double zero values (where abundances weren’t detected for multiple days or in multiple silos for a day), but that is relevant information when examining the pattern of abundances. I choose to use a euclidean distance matrix instead because it is the most commonly used symmetrical distance matrix.
I also changed the method I was using. Average is still a good choice but I think Ward’s method (1963) will be better since it is less sensitive to outliers. It minimizes the error of sum of squares by minimizing the increase in the sum of squares distances at each step. Average gave the best cophentic correlation (0.9433488). This should be done on my previous clusters for each individual silo if we choose to utilize those plots. This is not kmeans clustering- it is hierarchical clustering.
The table can be found here (along with a frequency table, scree plot, dendrogram and line plots which are pasted below for convenience.)
In the table, there is a protein for Silo 3 and a protein for Silo 9 so there are 2 of each proteins. Clusters are represented in a separate column. I want to determine what proteins were sorted into the same cluster for each silo. Obviously, I could do this manually but that would take a while. Next, I need a code that removes a protein if the cluster is the same for the duplicated protein.
October 15, 2018 at 11:00AM
via iOS Photos https://ift.tt/2CKA4UR