Kaitlyn’s notebook: 20190419 Pt. Whitney geoducks

Broodstock

Tank 6:

Tank 6

Tank 5:

20190419_090053.jpg

Tank 4:

20190419_090105-e1556035508146.jpg

Juveniles overwintered in heath stacks:

20190419_093604

Some signs of buildup/infestation:

  • The first geoduck is alive.
  • The second image contains two shells and a dead geoduck.

20190419_09375020190419_093952.jpg

KCL Priming experiment:

Additionally, Sam worked on an experiment to determine the optimum time for KCL egg priming. These are some important points about the experiment (more info is on Slack):

  • 6 x 10 experiment
  • Screened fertilized eggs (~1500) and took sample 1 hour post-fertilization from screen (concentrated sample).
  • Added eggs into glass bottle (0.3 um filtered seawater)
  • Bottles stored in wine cooler

Sam screening bottles:

20190419_110109.jpg

Bottles were discarded because we did not see many D-hinge larvae:

  • Sam looked and believed he saw some evidence of fertilization (formation of polar bodies) in samples taken 1 hour post-fertilization and will looks through those samples instead.

 

Kaitlyn’s notebook: Geoduck heath tray setup

Elevated header tank:IMG_20190419_085330

Dripper and outflow pipe to heath trays:

IMG_20190419_090149.jpg

Heath stack 1:

  • Red are elevated CO2 lines, blue is ambient.
  • Nozzles on the line control flow into the tray.
  • Boards are covering the top trays.20190419_090944

Elbow pipes on heath trays:

IMG_20190419_091714

Accessing the water lines:

  • It is easy to pull on the water line system when pulling out individual trays. There is access to the back of the trays so that the water line can be removed so the tray can be pulled out.
  • However, this does make the trays stick out into the walkway. We zip-tied poles onto the edge of the stack in front of the elbow pipes to prevent people from directly hitting anything.
  • Also, the elevated water treatment has some bubbles that are generated in the line.

Line from algal header tank:

  • Sam added a thinner line (connected below the green zip tie) to improve flow.
  • The line runs to an algal header tank in the breezeway that is filled in the mornings and feeds everything in this room. The tank runs out in the evening.

20190419_091615

Juvenile geoduck in H1T1:

20190419_141734.jpg

For comparison, here are 5 week old industry geoduck:

20190419_100128.jpg

Interestingly, juveniles were crawling up the side in one industry tank:

20190419_100231.jpg

 

 

Kaitlyn’s notebook: C. virginica and C. gigas

I’ve been working on theseissues which means I have been analyzing the methylation dataset for different populations of eastern oysters, as well as continuing to work on the WSG publication.

CpG O/E Eastern oyster

cpg <- read.csv("Documents/robertslab/work/CpG/eastern-oyster/ID_CpG_labelled_all.csv")

single q-q plot

data <- cpg$CL_1
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = "CL_1", font.main = 3)
lines(density(filt.data), lty=2, lwd=2)

faceted q-q plots

library(mixtools)

colname.list <- colnames(cpg)

jpeg(filename = "Documents/robertslab/work/faceted_cpg.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(10, 9)) # r x c plot

for(i in 2:length(colname.list)){
data <- cpg[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[[1]][i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
print(mean(cpg[[i]]))
}

dev.off()

pull out stress response genes

GOslim <- read.csv("Documents/robertslab/work/CpG/eastern-oyster/Blastquery-GOslim.csv", header=FALSE)
GOslim$V1 <- gsub(".*?NC","NC", GOslim$V1) #remove any artifacts from excel

stress.GO.nolistnovalues <- GOslim[grep("stress", GOslim$V3), ]
write.csv(merge(stress.GOslim, cpg, by.x="V1", by.y="ID"), "Documents/robertslab/work/CpG/eastern-oyster/stress-annotated.csv")
#2733 stress IDs identified

BPspread <- aggregate(V3~V1,GOslim,FUN=paste)
GOspread <- aggregate(V2~V1,GOslim,FUN=paste)
GOBP.spread <- merge(BPspread, GOspread, by="V1")
class(GOBP.spread$V3)

stress.GOBP.list <- GOBP.spread[grep("stress", GOBP.spread$V3), ]

stress.cpg <- merge(stress.GOBP.list, cpg, by.x = "V1", by.y="ID")

scatter plots with mean, median and range of values in original cpg table

rownames(cpg) <- cpg$ID
cpg <- cpg[,-c(1)]

library(matrixStats)

stats <- data.frame(colMeans(cpg))
colnames(stats)[1] <- "mean"
stats$median <- colMedians(as.matrix(cpg))
#stats$range <- colRanges(as.matrix(cpg)) #values too small
#stats <- subset(stats, select=-c(range))
stats$skew <- colskewness(as.matrix(cpg))
stats$kurtosis <- colkurtosis(as.matrix(cpg))
stats$variance <- colVars(as.matrix(cpg))
stats$stdev <- colVars(as.matrix(cpg), std = TRUE)

stats$ID <- rownames(stats)

aa <- ggplot(stats, aes(x=ID, y=mean)) + geom_point()
bb <- ggplot(stats, aes(x=ID, y=median)) + geom_point()
cc <- ggplot(stats, aes(x=ID, y=kurtosis)) + geom_point()
dd <- ggplot(stats, aes(x=ID, y=skew)) + geom_point()
ee <- ggplot(stats, aes(x=ID, y=variance)) + geom_point()
ff <- ggplot(stats, aes(x=ID, y=stdev)) + geom_point()

library(gridExtra)
jpeg("Documents/robertslab/work/CpG/eastern-oyster/basicstats.jpeg")
grid.arrange(aa,bb,cc,dd,ee,ff, nrow = 3)
dev.off()

stress response stats

rownames(stress.cpg) <- stress.cpg$V1
stress.cpg <- stress.cpg[,-c(1:3)]

stress.stats <- data.frame(colMeans(stress.cpg))
colnames(stress.stats)[1] <- "mean"
stress.stats$median <- colMedians(as.matrix(stress.cpg))
stress.stats$skew <- colskewness(as.matrix(stress.cpg))
stress.stats$kurtosis <- colkurtosis(as.matrix(stress.cpg))
stress.stats$variance <- colVars(as.matrix(stress.cpg))
stress.stats$stdev <- colVars(as.matrix(stress.cpg), std = TRUE)

stress.stats$ID <- rownames(stress.stats)

a <- ggplot(stress.stats, aes(x=ID, y=mean)) + geom_point()
b <- ggplot(stress.stats, aes(x=ID, y=median)) + geom_point()
c <- ggplot(stress.stats, aes(x=ID, y=kurtosis)) + geom_point()
d <- ggplot(stress.stats, aes(x=ID, y=skew)) + geom_point()
e <- ggplot(stress.stats, aes(x=ID, y=variance)) + geom_point()
f <- ggplot(stress.stats, aes(x=ID, y=stdev)) + geom_point()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/stress-basicstats.jpeg")
grid.arrange(a,b,c,d,e,f, nrow = 3)
dev.off()

stress response and all samples plotted together

jpeg("Documents/robertslab/work/CpG/eastern-oyster/all-v-stress.jpeg")
grid.arrange(aa,a,cc,c,dd,d,ee,e, nrow = 4)
dev.off()

table with mean of all samples (rather than mean of genes)

cpg <- data.frame(t(cpg))

all.stats <- data.frame(colMeans(cpg))
colnames(all.stats)[1] <- "mean"

all.stats$median <- colMedians(as.matrix(cpg))
all.stats$skew <- colskewness(as.matrix(cpg))
all.stats$kurtosis <- colkurtosis(as.matrix(cpg))
all.stats$variance <- colVars(as.matrix(cpg))
all.stats$stdev <- colVars(as.matrix(cpg), std = TRUE)

write.table(all.stats, "Documents/robertslab/work/CpG/eastern-oyster/stats-allsamples-on-samples.txt")

all.stats$ID <- rownames(all.stats)

g <- ggplot(all.stats, aes(x=ID, y=mean)) + geom_point()
h <- ggplot(all.stats, aes(x=ID, y=median)) + geom_point()
i <- ggplot(all.stats, aes(x=ID, y=kurtosis)) + geom_point()
j <- ggplot(all.stats, aes(x=ID, y=skew)) + geom_point()
k <- ggplot(all.stats, aes(x=ID, y=variance)) + geom_point()
l <- ggplot(all.stats, aes(x=ID, y=stdev)) + geom_point()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/all-on_samples.jpeg")
grid.arrange(g,h,i,j,k,l, nrow = 3)
dev.off()

density, box and Swarm plots of stress reponse genes

library(ggbeeswarm)
theme_set(theme_light()) #sets a default ggplot theme

stress <- stress.cpg
rownames(stress) <- stress$V1
stress <- stress[,-c(1:3)]

stress$Gene <- rownames(stress)

library(tidyr)
test <- gather(stress, key = "Sample", value = "CpG", 1:(ncol(stress)-1))
test$pop <- test$Sample
test$pop <- gsub("_.*","",test$pop)

test <- data.frame(colMeans(stress[,c(1:90)]))
colnames(test) <- "CpG"
test$sample <- rownames(test)
test$pop <- test$sample
test$pop <- gsub("_.*","",test$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/swarm_box.jpeg")
ggplot(test, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/swarm.jpeg")
ggplot(test, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3)
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/density.jpeg")
ggplot(test) + geom_density(aes(CpG, color= pop))
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/faceted-density.jpeg")
ggplot(test) + geom_density(aes(CpG)) + facet_wrap(~pop, scale = "free")
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/box.jpeg")
ggplot(test) + geom_boxplot(aes(pop, CpG))
dev.off()

plot provided GO ID lists

GOcpg <- merge(GOBP.spread, cpg, by.x= "V1", by.y= "ID")

#list1
list1 <- list("GO:0100044",
"GO:0042538",
"GO:0042539",
"GO:1900464",
"GO:1900069",
"GO:1900070",
"GO:0071477",
"GO:0071475",
"GO:1902074",
"GO:1902075",
"GO:0009651",
"GO:1901000",
"GO:1901002",
"GO:1901001",
"GO:1901200",
"GO:1901196",
"GO:1901199",
"GO:0071472")

list1df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]] #save GO IDs in each line as a list
for(j in 1:length(list)){
if(list[j] %in% list1){
list1df <- rbind(list1df, GOcpg[i,]) #bind a matching row to the empty df
}
}
}

rownames(list1df) <- list1df$Gene
list1df <- list1df[,-c(2:3)]

list1df <- data.frame(colMeans(list1df[,c(2:91)]))
colnames(list1df) <- "CpG"
list1df$sample <- rownames(list1df)
list1df$pop <- list1df$sample
list1df$pop <- gsub("_.*","",list1df$sample)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-swarmbox.jpeg")
ggplot(list1df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

#list 2
list2 <- list("GO:0070415",
"GO:0070414",
"GO:0090441",
"GO:0090442",
"GO:0061408",
"GO:1990845",
"GO:0031990",
"GO:0043620",
"GO:0043555",
"GO:1990611",
"GO:1990497",
"GO:0034605",
"GO:0070417",
"GO:0006950",
"GO:0036003",
"GO:0043618",
"GO:0080134",
"GO:0080135",
"GO:0097201",
"GO:0033554",
"GO:0033555",
"GO:0009271")

list2df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list2){
list2df <- rbind(list2df, GOcpg[i,])
}
}
}

list2df <- unique(list2df)
rownames(list2df) <- list2df$V1
list2df <- list2df[,-c(2:3)]

list2df <- data.frame(colMeans(list2df[,c(2:91)]))
colnames(list2df) <- "CpG"
list2df$sample <- rownames(list2df)
list2df$pop <- list2df$sample
list2df$pop <- gsub("_.*","",list2df$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-swarmbox.jpeg")
ggplot(list2df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

#list 3
list3 <- list("GO:0045087",
"GO:0061760",
"GO:0045088",
"GO:0002218",
"GO:0002227",
"GO:1905036",
"GO:1905035",
"GO:1905034",
"GO:0035420",
"GO:0045824",
"GO:0045089",
"GO:0002758",
"GO:0002766",
"GO:0090714",
"GO:0035419",
"GO:0035423",
"GO:0035422",
"GO:0035421",
"GO:0046735",
"GO:0046738",
"GO:0038075",
"GO:0039503",
"GO:0052170",
"GO:0052166",
"GO:0052167",
"GO:0052390",
"GO:0052305",
"GO:0052309",
"GO:0002220",
"GO:0038078",
"GO:0052157",
"GO:0052169",
"GO:0052033",
"GO:0052034",
"GO:0052382",
"GO:0052306",
"GO:0052308",
"GO:0052258",
"GO:0052257",
"GO:0052296",
"GO:0035663",
"GO:0035662",
"GO:0090644",
"GO:0035325",
"GO:0039699",
"GO:0046696",
"GO:0002224",
"GO:0039539",
"GO:0039538",
"GO:0039537",
"GO:0039560",
"GO:1990231",
"GO:0002755",
"GO:0002756",
"GO:0035669",
"GO:0035668",
"GO:0035667",
"GO:0035666",
"GO:0035661",
"GO:0035660",
"GO:0035665",
"GO:0035664")

list3df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list3){
list3df <- rbind(list3df, GOcpg[i,])
}
}
}

list3df <- unique(list3df)
rownames(list3df) <- list3df$V1
list3df <- list3df[,-c(2:3)]

list3df <- data.frame(colMeans(list3df[,c(2:91)]))
colnames(list3df) <- "CpG"
list3df$sample <- rownames(list3df)
list3df$pop <- list3df$sample
list3df$pop <- gsub("_.*","",list3df$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-swarmbox.jpeg")
ggplot(list3df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

qq plot on stress response genes by population

Gocpg

loop to get pop means per gene

pops <- unique(list1df$pop) #list of pop names
popmeans <- data.frame(matrix(0,19907,1)) #create empty matrix with dimensions b/c cbind sucks
for(i in (1:nrow(GOcpg))){ #for each row of GOcpg
for(j in 1:length(pops)){ # for item in the list (for each pop in list)
pattern <- paste0(pops[j],"_") # create a new variable and paste the pop item with an underscore to distinguish unique pops (ex CL and CLP)
popmeans[i,j] <- apply(GOcpg[i,grep(pattern,colnames(GOcpg))],1,mean) #fill in each new cell of matrix with the mean of indv populations for each gene
}
#print(popmeans[i,])
}
rownames(popmeans) <- rownames(GOcpg)
colnames(popmeans) <- pops

library(mixtools)

colname.list <- colnames(popmeans)

same plot

jpeg(filename = "Documents/robertslab/work/CpG/eastern-oyster/popmeans-qq-same.jpeg", width = 1000, height = 1000)

for(i in 1:length(colname.list)){
data <- popmeans[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
print(mean(popmeans[[i]]))
}

dev.off()

different plots

jpeg(filename = "Documents/robertslab/work/CpG/eastern-oyster/popmeans-qq-faceted.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(4, 4)) # r x c plot

for(i in 1:length(colname.list)){
data <- popmeans[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}

dev.off()

Kruskel and Wilcox test

#anova <- gather(popmeans, key = "Pop", value = "CpG", 1:(ncol(popmeans)))
#anova$Pop <- as.factor(anova$Pop)

#kruskal.test(CpG ~ Pop, data = anova)

#pairwise.wilcox.test(anova$CpG, anova$Pop, p.adjust.method = "BH")

Mean plot

#library("ggpubr")
#ggline(popmeans, x = "Pop", y = "Cpg",
# add = c("mean_se", "jitter"),
#order = c("","",…),
#ylab = "CpG O/E", xlab = "Population")

Q-Q plots on provided lists

GO.popmeans <- merge(popmeans, GOBP.spread, by.x="row.names", by.y="V1")

list 1

l1df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list1){
l1df <- rbind(l1df, GO.popmeans[i,])
}
}
}

l1df <- unique(l1df)
rownames(l1df) <- l1df$Row.names
l1df <- l1df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l1df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-qq-faceted.jpeg", width = 1000, height = 1000)
a <- par(mfrow = c(4, 4)) # r x c plot
for(i in 1:length(colname.list)){
data <- l1df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l1df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2=" CpG O/E")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

list 2

l2df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list2){
l2df <- rbind(l2df, GO.popmeans[i,])
}
}
}

l2df <- unique(l2df)
rownames(l2df) <- l2df$Row.names
l2df <- l2df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l2df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-qq-faceted.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(4, 4)) # r x c plot

for(i in 1:length(colname.list)){
data <- l2df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l2df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("blue", "red"), xlab2 = "CpG O/E ")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

list 3

l3df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list3){
l3df <- rbind(l3df, GO.popmeans[i,])
}
}
}

l3df <- unique(l3df)
rownames(l3df) <- l3df$Row.names
l3df <- l3df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l3df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-qq-faceted.jpeg", width = 1000, height = 1000)
a <- par(mfrow = c(4, 4)) # r x c plot
for(i in 1:length(colname.list)){
data <- l3df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l3df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2=" CpG O/E")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

Kaitlyn’s notebook: dendrogram cutoffs and euclidean vs bray-curtis

Shelly did some amazing work creating a network using proteins identified by ASCA and p-values (calculated using a proportion test with spectral counts). The backround colors on the network represent the log fold change. We want to see if hierarchical clustering (HC) identified proteins with large fold changes and if that might be good to add into this network; We also want to look at the differences in proteins identified in HC and ASCA in the network.

Here I am comparing different cutoff values for HC (e.g. where I decide to cut the dendrogram) and how that affects the protein list and thus network visualization. One question we have always had is whether bray-curtis or euclidean dissimilarity matrices are appropriate. To answer this question, I looked at varying cutoff values and made multiple protein lists to insert into the network with both dissimilarity matrices.

#agglomerate coefficent (how likely a protein is to be placed in a new cluster)
#bray=0.9349286; euclidean=0.9969951

#cophenetic correlation (how well the dissimilarity matrix represents the original data)
#bray=0.7690317; euclidean=0.9460521 (both values reach 0.75 value)

Cutoff value 0.7 0.65 0.6 0.5 300 250 150
Clusters 9 17 23 54 25 35 79

Red lines represent cutoff values shown above in table.

Euclidean dendrogram:

Bray-curtis dendrogram:

Protein lists include the abundance value for the protein so each protein is duplicated- one for 23C and one for 29C. If you want just the list of unique proteins regardless of the silo, just subset one of the silos and that will include all of the uniquely clustered proteins.

The other files in the folders:

  • XXXfaceted-abund.jpeg
    • faceted abundance plots of all proteins
  • XXXfaceted-unq.jpeg
    • faceted abundance plots of only the unique proteins
  • XXXfreq.csv
    • frequency tables for the total number of proteins in each cluster
  • Scree plot
  • Dendrogram

Kaitlyn’s notebook: 20190123 Geoduck histology

We sampled geoduck at the end of January and took some histology samples. I believe only geoduck from tank 4 were strip spawned and we noticed that the treated tanks had poorly developed gonads. I examined the histology slides that we just received from sampling.

I reviewed how Grace staged her geoduck as well as the criteria in the paper Brent sent Shelly and I (Ropes 1968). I talked with Grace a little more about what she did and showed her some slides and we were both having trouble placing the geoduck at different stages. It was hard to decide whether they were producing less, reabsorbing, or just developing gonad more slowly than the other tanks.

I came up with some scoring criteria to get a better idea of how tanks and treatments compared and entered it into this spreadsheet.

Overall, it appears that the ambient tanks were further along in development than the treated tanks. Tank 4 was furthest in development compared to tank 1 which was dominated by early active geoduck.

All treatment group geoducks were early active:

Least developed male 4 with only spermatognium present in tiny acini-

Early active male 1 with dense connective tissue and very low amount of spermatids-

Early active male 018 with less connective tissue previously but is still early active (granted further along than 1) because of the number of acini with less than 30-40% spermatids; also the acini are smaller than seen below in the late active male –

Early active female 006 with dense connective tissue, low volume eggs, and most eggs are on the follicular wall

Early active female 034 with many elongated eggs on the wall of a small follicles however the eggs do have some good volume and roundness. This is a good example of difficulties staging-

While Ambient geoducks looked more like this:

Late active male 41 with visible spermatids covering at least 50% of large acini and all acini contain spermatids; connective tissue is less dense than 018-

Ripe female 045 with little connective tissue and high volume eggs; this female has the most eggs/follicle but it still isn’t a lot compared to a heavily ripe female-

Images are located on the FFAR-Geoduck drive under /Images/20190123-histo and on OWL. I also updated the histology-databank with the slide locations.

Ropes, J. W. 1968. Reproductive Cycle of the Surf Clam, Spisula Solidissima, in Offshore New Jersey. Biol Bull 135:349-365

Kaitlyn’s notebook: Extracting clusters from heatmap (and a Venn diagram)

Extracting Clusters

I’ve been working on extracting the clusters out from my heatmap and finally succeeded with this R script!

Cluster 1 

Cluster 2

23C

29C 23C 29C
Mean Protein Abundance 181.025 168.728 284.94 301.742

Important note:

  • distfun = function(x) as.dist(1 - cor(t(x), use = "pa"))
    • heatmap3 states that distfun=dist is default but that is incorrect and the above is the appropriate default;
    • gplots does have the correct distfun=dist default so you  must change it if heatmap3 gives a better dendrogram as in this case.

Question: Should I get error rates for the mean protein abundance of each cluster and silo?

I’m currently working on doing the same thing with the ASCA temperature influenced proteins (the ASCA table requires some different reformatting).

Venn Diagram

I had a lot of trouble trying to make a venn diagram in R (here is my attempt…). I did use the previous script to sort out a list of the proteins in each temperature, but ultimately I decided to refer back to Emma’s paper (2) and use Venny (2) the same way she did.

(1) Oliveros, J. C. Venny. An interactive tool for comparing lists with Venn’s diagrams.
https://bioinfogp.cnb.csic.es/tools/venny/index.html (accessed February 13, 2019).

(2) Timmins-Schiffman, E. B., Crandall, G. A., Vadopalas, B., Riffle, M. E., Nunn, B. L., Roberts, S. B. (2017). Integrating Discovery-driven Proteomics and Selected Reaction Monitoring To Develop a Noninvasive Assay for Geoduck Reproductive Maturation. Journal of Proteome Research, 16(9), 3298–3309.doi:10.1021/acs.jproteome.7b00288.

Kaitlyn’s notebook: oyster seed proteomics comparisons

Comparing ASCA and clustering identified proteins

I modified and reran the clustering code using the technical replicate averages. The code and other plots can be found here. I used the proteins ASCA identified as being affected by temperature.

This is the code I used to compare the analyses. A total of 25 proteins were common between clustering and ASCA analysis. ASCA identified 113 different proteins while clustering identified 8 other proteins. It would be interesting to see how manipulating the dendrogram height cutoff value during clustering affects the latter number (I used a cutoff of 250).

Here is a heatmap of the similar proteins (note the first column is Day 0 followed by 23C days 3-13 and finally 29C days 3-13):

Venn Diagram of 23C and 29C

I’m working on making a better venn diagram in R with just the silo 3 and 9 proteins. Here is a quick and dirty venn digram, but I’m trying to usethe VennDiagram package to improve it.

Note that all 0 values were changed to 0.100 in the technical replicate averages. I removed all values less than 0.600 as these would be undetected in the silo because of the change (there are 6 days of data since were are excluding day 15 because of the temperature malfunction).

This is obviously still a work in progress to create a better venn diagram.