Kaitlyn’s notebook: Metboanalyst round 2

First, loading values in ASCA…

The goal: I want to get the loading values (eigen values) after I reformatted the data and ran the ASCA in R to look at what proteins had the highest values since those proteins are the most influential in my dataset. I will plot these proteins’ abundances over time.

Shelly and I parsed out that the 1 represents PC1, and that the V values in svd are the loadings based on the MetStaT github code that explains the “behind the scenes” code that happens when using the package MetStaT. So we can examine the laoding values by running loadings <- as.data.frame(ASCA$1$svd$v[,1]). These were the lines in github that deciphered which table the loading values were:


pr.object <- asca[[ee]]$svd

pcs[tuple[1]] # 1st element of tuple contains index of PC1 within PCs

plot(1:dim(pr.object$v)[1],pr.object$v[,pc1],type=h,col=color.to….


However, the proteins are not labelled in the loadings values: loading-values-ASCA

There are the correct number of loadings, 7988, and I assume they coordinate with the same column row of the protein, but because I can’t be sure, I’m going to revisit Metboanalyst where I believe I can get a loading values table with the protein name. If the results look promising there, I will try to create the corresponding row names in the above dataframe and I will compare it to the Metboanalyst results.

Now, Metboanalyst-

Originally I was having errors running my data through Metboanalyst. I had my datasheet organized with my samples in columns in the same order Metboanalyst shows, however it seems that the actual software needs the order as ‘Sample’ followed by ‘Temperature’ and finally ‘Time’ in my case because it reads ‘Time’ as the number of groups in your data, and states that you need three replicates per group. Essentially, make sure that the second row is the group that contains at least three replicates.

I reorganized my data for the ASCA by modifying the code Shelly created. This time the samples are in rows with ‘Temperature’ as the first factor so Metboanalyst understands that there are replicates for this factor. I also made sure to select the Time-Series/Two-factor module rather than the Statistical Analysis Module.

Options and Results:

1) Data processing information: data-processing-info

2) I choose the standard deviation for data filtering.

3) Normalization –> Data scaling –> Mean centering:

normalization.png

4) Analysis paths: analyses-pathes

Multivariate- ASCA (The time is not in order because it is ordered by the first character.):

asca-factorasca-interactions

These results are not significant: implying that this data needs to be parsed out, possibly by the loadings- if I only run the high loading valued proteins, or by eliminating similarly abundant proteins via kmeans clustering.

permutations

 


Terminology:

SPE is the squared prediction error which measures the expected squared distance between the predicted value and the true value (ie. it measures the quality of the predictor).

Leverage measures the influence of each observation for a principal component. Score plots will identify observations with high leverages, ie. observations that tend to pull the PCA towards them.

Outliers are proteins that do not follow the general trend of the data, but don’t necessarily have a high leverage on the PCA. A high leverage value is not necessarily an outlier because it can leverage the PCA but still maintain the trend of the data.


As you can see the plots don’t decipher much information because of the amount of data, but you can download the associated tables of outliers and significant features.

 

 

temp-featuresLeverage threshold was 0.9 and alpha threshold was 0.05.

temp-highest-leverage

Highest leverage value for outliers in consideration of temperature.

There was no significant features for time or interactions, only for temperature.

ANOVA2 (two way anova) gave no significant results and thus did not plot anything.

MEBA sounds like something I would be interested in but it gives the error, “Please make sure data are balanced for time series analysis, In particular, for each time points, all experiments must exist and cannot be missing.”