Kaitlyn’s notebook: ANOVA on heath stack juveniles

I preformed an ANOVA on the lengths of the juveniles in the heath stacks at Pt.Whitney. I did a post-hoc test (Tukey HSD) to determine significance between he treatments.

length-treatment

length-treatment

homogenization

With blocking the anova gives a p=0.028 for treatment and p=0.245 for tray.

Here is the Tukey HSD p-values on the anova with blocking:

Tukey multiple comparisons of means
95% family-wise confidence level

AE-AA 0.54267725
EA-AA 0.25261761
EE-AA 0.72476397
EA-AE 0.91813748
EE-AE 0.10032636
EE-EA 0.03504769

H0_T-H0_B 0.9999666
H1_B-H0_B 0.9999930
H1_T-H0_B 0.9303406
H2_B-H0_B 0.9999448
H2_T-H0_B 0.9970459
H3_B-H0_B 0.9994060
H3_T-H0_B 0.9988241
H1_B-H0_T 0.9980676
H1_T-H0_T 0.8107845
H2_B-H0_T 0.9963141
H2_T-H0_T 0.9703042
H3_B-H0_T 0.9875827
H3_T-H0_T 0.9999998
H1_T-H1_B 0.9588824
H2_B-H1_B 1.0000000
H2_T-H1_B 0.9996848
H3_B-H1_B 0.9999842
H3_T-H1_B 0.9708530
H2_B-H1_T 0.9898375
H2_T-H1_T 0.9991863
H3_B-H1_T 0.9962770
H3_T-H1_T 0.4563048
H2_T-H2_B 0.9999849
H3_B-H2_B 0.9999999
H3_T-H2_B 0.9641553
H3_B-H2_T 0.9999998
H3_T-H2_T 0.8273796
H3_T-H3_B 0.9074722

Tank Treatment
H0_T EA
H0_B EE
H1_T AE
H1_B AA
H2_T EA
H2_B AA
H3_T AE
H3_B EE

Kaitlyn’s notebook: Quantifying geoduck histology

These are possible measurements that can be taken on the geoduck histology slides based on tools I’ve found:

Males:

  • pixel classification: acini vs connective tissue
  • average acinus size (measure widest portion of 25 acini/individual)

Females:

  • oocyte length
    • measure only round eggs with visible nucleus
  • average follicle size (measure widest potion of 25 follicles/individual)

Unfortunately I can’t use a pixel classifier for the females because the oocytes and connective tissue stain is very similar in color and there are significant white space between the tissue and in the nucleus. 

I have been using the Padilla-Gamiño micrscope (Nikon Eclipse Ni-U with the Nikon Elements BR [basic research] program). I have been exploring the program to find features useful to our histology slides. I can get the % area using the pixel classifier.

 

20190521_110608

Acini are deeply stained. Using the pixel classifier I can make these pixels red. The percent area for each phase (pixel stain) is shown to the right.

20190521_110629

Connective tissue is shown with the green pixels.

20190521_110634

A possible third phase in blue…

Alternatively, I can do just two phases-

20190521_130140

this is using the manual method (the bayes method has you pick sample pixels for each phase rather than showing a graph for the wavelengths).

However, the inability to remove the phase overlapping on the image makes it very difficult to find another spot to measure on the slide, and I surmise that at least 3-4 measurements throughout the slide should be taken to accurately determine the % area of the acini.

Other possible option are thresholding in Fiji (Fiji is just ImageJ [but updated]), color deconvolution in ImageJ (but I have not been able to find a way for the program to give me % area), or pixel classifier qith Qupath (a new addition to the program that still has soem kinks and I haven’t been able to use yet).

It is easy to take measurments and images with a scale burned onto the image so measuring oocytes, collicles and acini using either this program or ImageJ is easy! I am working on females first, and during any down time, i.e., someone else needs to use the scope, I am trying to learn how to sue Fiji, Qupath, or the Element BR program. 

Kaitlyn’s notebook: 2018 geoduck juveniles

Juveniles reared in heath stacks:

ID n Treatment Length Width
H0_T 7 E*A 10.78 7.36
H0_B 9 E*E 12.21 7.50
H1_T 15 E*A 9.71 6.75
H1_B 16 A*A 11.32 7.88
H2_T 14 E*A 9.77 6.84
H2_B 11 A*A 11.20 7.87
H3_T 14 E*A 11.35 7.65
H3_B 13 E*E 11.71 8.19

Checking on outplanted juveniles:

  • -0.76 low tide
    • We planted on very low point of the beach so it goes underwater very quickly. The ferries were very busy so I got out there alter than anticipated. I cut open a quarter of the tubes for thorough inspections. I checked another 50% without cutting open the tubes because the tide was coming in quickly, but it was difficult to see anything with the mesh shadows.
  • 3 zip-ties on each mesh tube were hard to cut off (return with more people!)
  • In the tubes:
    • 3-4 crabs (up to 3 inches big)
      • Many of the tubs had crabs the crabs were crawling in and out of
    • Cockles
      • Difficult to tell if holes are geoduck or cockles
        • Would recommend coming back with the intention of digging them up to be sure of geoduck survival
Tray Treatment Outplant Color # Holes Tube(s)
H0_T E*A Yellow 1 H12
H0_B E*E Grey 1 B1
H1_T E*A Pink 2 A1, B7
H1_B A*A Purple 1 A10
H2_T E*A Green 0
H2_B A*A Red 3 A3, A7, B11
H3_T E*A Dark blue 2 B12, D11
H3_B E*E Orange 1 B5

MVIMG_20190511_162746

MVIMG_20190511_162755.jpgMVIMG_20190511_163156

Kaitlyn’s notebook: Organizing geoduck methods and juvenile survival

I’m working on writing out the methods done with the broodstock (collection, water chemistry, experiment duration, and timing of sampling). Perhaps a timeline of the experiment would be better represented in a visual rather than a document (or at least condensing the current timelines/logs to one document)? There is a new, separate document now to keep track of the tasks needed to work toward sample analysis/manuscript development for the geoduck samples.

Additionally, I’m attempting to write up results on survival and histology. However, I think it would be best to get quantitative measurements for histology (e.g., oocyte size and % connective tissue). When I trained Eileen on water chemistry, we discussed the possibility of me using the Padilla-Gamiño microscope. I followed up with her and am waiting to hear back. Meanwhile, color deconvultion via ImageJ plugins or Qupath has allowed me to calculate the correct stain vectors to separate out the hematoxylin stained vs. eosin, but I don’t know how to or if I can convert that to percentage/area/pixels to compare those differences. I am continuing to look through literature to best solve this issue and give a more accurate measurement for any developmental differences between treatments shown in histology samples.

I’m also working on measurements of juvenile geoduck that were reared in the hatchery (as opposed to those outplanted in Sequim Bay) based on the images we took. Grace and I took counted survivors, removed dead geoduck and all sand, and attempted to remove any ciliate/algal growth on the juveniles. We hoped that sand removal would slow or prevent detritus buildup, however it looked like nearly all geoduck were affected when we visited the hatchery on May 2.

  • Some questions for Brent:Is it possible that the ciliates are setting on just the geoduck now rather than being dispersed between setting on the geoduck and the sand? In which case, should we add the sand back in?

Here is the survival data from April 26, 2019:

Order Tank Parental Exposure Alive Dead found
1 H0_T E*A 7 8
2 H0_B E*E 9 3
3 H1_T E*A 15 7
4 H1_B A*A 16 9
5 H2_T E*A 14 6
6 H2_B A*A 12 8
7 H3_T E*A 14 3
8 H3_B E*E 13 1

We also rearranged the heath trays because it appeared that there was uneven buildup on the bottom trays as opposed to the top trays:

This slideshow requires JavaScript.

The tides are at their lowest point right now so I reached out to Kelly to check on the outplanted geoduck as well. Shelly and I will go out to Pt. Whitney at the end of this week to setup a new pump so we can reduce feed levels for the setters and finish titrating poisoned water samples. Water chemistry may be taken again either that day, or early next week, and I am going to help Shelly extract DNA from sea lice infected salmon skin next week.

Kaitlyn’s notebook: 20190419 Pt. Whitney geoducks

Broodstock

Tank 6:

Tank 6

Tank 5:

20190419_090053.jpg

Tank 4:

20190419_090105-e1556035508146.jpg

Juveniles overwintered in heath stacks:

20190419_093604

Some signs of buildup/infestation:

  • The first geoduck is alive.
  • The second image contains two shells and a dead geoduck.

20190419_09375020190419_093952.jpg

KCL Priming experiment:

Additionally, Sam worked on an experiment to determine the optimum time for KCL egg priming. These are some important points about the experiment (more info is on Slack):

  • 6 x 10 experiment
  • Screened fertilized eggs (~1500) and took sample 1 hour post-fertilization from screen (concentrated sample).
  • Added eggs into glass bottle (0.3 um filtered seawater)
  • Bottles stored in wine cooler

Sam screening bottles:

20190419_110109.jpg

Bottles were discarded because we did not see many D-hinge larvae:

  • Sam looked and believed he saw some evidence of fertilization (formation of polar bodies) in samples taken 1 hour post-fertilization and will looks through those samples instead.

 

Kaitlyn’s notebook: Geoduck heath tray setup

Elevated header tank:IMG_20190419_085330

Dripper and outflow pipe to heath trays:

IMG_20190419_090149.jpg

Heath stack 1:

  • Red are elevated CO2 lines, blue is ambient.
  • Nozzles on the line control flow into the tray.
  • Boards are covering the top trays.20190419_090944

Elbow pipes on heath trays:

IMG_20190419_091714

Accessing the water lines:

  • It is easy to pull on the water line system when pulling out individual trays. There is access to the back of the trays so that the water line can be removed so the tray can be pulled out.
  • However, this does make the trays stick out into the walkway. We zip-tied poles onto the edge of the stack in front of the elbow pipes to prevent people from directly hitting anything.
  • Also, the elevated water treatment has some bubbles that are generated in the line.

Line from algal header tank:

  • Sam added a thinner line (connected below the green zip tie) to improve flow.
  • The line runs to an algal header tank in the breezeway that is filled in the mornings and feeds everything in this room. The tank runs out in the evening.

20190419_091615

Juvenile geoduck in H1T1:

20190419_141734.jpg

For comparison, here are 5 week old industry geoduck:

20190419_100128.jpg

Interestingly, juveniles were crawling up the side in one industry tank:

20190419_100231.jpg

 

 

Kaitlyn’s notebook: C. virginica and C. gigas

I’ve been working on theseissues which means I have been analyzing the methylation dataset for different populations of eastern oysters, as well as continuing to work on the WSG publication.

CpG O/E Eastern oyster

cpg <- read.csv("Documents/robertslab/work/CpG/eastern-oyster/ID_CpG_labelled_all.csv")

single q-q plot

data <- cpg$CL_1
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = "CL_1", font.main = 3)
lines(density(filt.data), lty=2, lwd=2)

faceted q-q plots

library(mixtools)

colname.list <- colnames(cpg)

jpeg(filename = "Documents/robertslab/work/faceted_cpg.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(10, 9)) # r x c plot

for(i in 2:length(colname.list)){
data <- cpg[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[[1]][i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
print(mean(cpg[[i]]))
}

dev.off()

pull out stress response genes

GOslim <- read.csv("Documents/robertslab/work/CpG/eastern-oyster/Blastquery-GOslim.csv", header=FALSE)
GOslim$V1 <- gsub(".*?NC","NC", GOslim$V1) #remove any artifacts from excel

stress.GO.nolistnovalues <- GOslim[grep("stress", GOslim$V3), ]
write.csv(merge(stress.GOslim, cpg, by.x="V1", by.y="ID"), "Documents/robertslab/work/CpG/eastern-oyster/stress-annotated.csv")
#2733 stress IDs identified

BPspread <- aggregate(V3~V1,GOslim,FUN=paste)
GOspread <- aggregate(V2~V1,GOslim,FUN=paste)
GOBP.spread <- merge(BPspread, GOspread, by="V1")
class(GOBP.spread$V3)

stress.GOBP.list <- GOBP.spread[grep("stress", GOBP.spread$V3), ]

stress.cpg <- merge(stress.GOBP.list, cpg, by.x = "V1", by.y="ID")

scatter plots with mean, median and range of values in original cpg table

rownames(cpg) <- cpg$ID
cpg <- cpg[,-c(1)]

library(matrixStats)

stats <- data.frame(colMeans(cpg))
colnames(stats)[1] <- "mean"
stats$median <- colMedians(as.matrix(cpg))
#stats$range <- colRanges(as.matrix(cpg)) #values too small
#stats <- subset(stats, select=-c(range))
stats$skew <- colskewness(as.matrix(cpg))
stats$kurtosis <- colkurtosis(as.matrix(cpg))
stats$variance <- colVars(as.matrix(cpg))
stats$stdev <- colVars(as.matrix(cpg), std = TRUE)

stats$ID <- rownames(stats)

aa <- ggplot(stats, aes(x=ID, y=mean)) + geom_point()
bb <- ggplot(stats, aes(x=ID, y=median)) + geom_point()
cc <- ggplot(stats, aes(x=ID, y=kurtosis)) + geom_point()
dd <- ggplot(stats, aes(x=ID, y=skew)) + geom_point()
ee <- ggplot(stats, aes(x=ID, y=variance)) + geom_point()
ff <- ggplot(stats, aes(x=ID, y=stdev)) + geom_point()

library(gridExtra)
jpeg("Documents/robertslab/work/CpG/eastern-oyster/basicstats.jpeg")
grid.arrange(aa,bb,cc,dd,ee,ff, nrow = 3)
dev.off()

stress response stats

rownames(stress.cpg) <- stress.cpg$V1
stress.cpg <- stress.cpg[,-c(1:3)]

stress.stats <- data.frame(colMeans(stress.cpg))
colnames(stress.stats)[1] <- "mean"
stress.stats$median <- colMedians(as.matrix(stress.cpg))
stress.stats$skew <- colskewness(as.matrix(stress.cpg))
stress.stats$kurtosis <- colkurtosis(as.matrix(stress.cpg))
stress.stats$variance <- colVars(as.matrix(stress.cpg))
stress.stats$stdev <- colVars(as.matrix(stress.cpg), std = TRUE)

stress.stats$ID <- rownames(stress.stats)

a <- ggplot(stress.stats, aes(x=ID, y=mean)) + geom_point()
b <- ggplot(stress.stats, aes(x=ID, y=median)) + geom_point()
c <- ggplot(stress.stats, aes(x=ID, y=kurtosis)) + geom_point()
d <- ggplot(stress.stats, aes(x=ID, y=skew)) + geom_point()
e <- ggplot(stress.stats, aes(x=ID, y=variance)) + geom_point()
f <- ggplot(stress.stats, aes(x=ID, y=stdev)) + geom_point()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/stress-basicstats.jpeg")
grid.arrange(a,b,c,d,e,f, nrow = 3)
dev.off()

stress response and all samples plotted together

jpeg("Documents/robertslab/work/CpG/eastern-oyster/all-v-stress.jpeg")
grid.arrange(aa,a,cc,c,dd,d,ee,e, nrow = 4)
dev.off()

table with mean of all samples (rather than mean of genes)

cpg <- data.frame(t(cpg))

all.stats <- data.frame(colMeans(cpg))
colnames(all.stats)[1] <- "mean"

all.stats$median <- colMedians(as.matrix(cpg))
all.stats$skew <- colskewness(as.matrix(cpg))
all.stats$kurtosis <- colkurtosis(as.matrix(cpg))
all.stats$variance <- colVars(as.matrix(cpg))
all.stats$stdev <- colVars(as.matrix(cpg), std = TRUE)

write.table(all.stats, "Documents/robertslab/work/CpG/eastern-oyster/stats-allsamples-on-samples.txt")

all.stats$ID <- rownames(all.stats)

g <- ggplot(all.stats, aes(x=ID, y=mean)) + geom_point()
h <- ggplot(all.stats, aes(x=ID, y=median)) + geom_point()
i <- ggplot(all.stats, aes(x=ID, y=kurtosis)) + geom_point()
j <- ggplot(all.stats, aes(x=ID, y=skew)) + geom_point()
k <- ggplot(all.stats, aes(x=ID, y=variance)) + geom_point()
l <- ggplot(all.stats, aes(x=ID, y=stdev)) + geom_point()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/all-on_samples.jpeg")
grid.arrange(g,h,i,j,k,l, nrow = 3)
dev.off()

density, box and Swarm plots of stress reponse genes

library(ggbeeswarm)
theme_set(theme_light()) #sets a default ggplot theme

stress <- stress.cpg
rownames(stress) <- stress$V1
stress <- stress[,-c(1:3)]

stress$Gene <- rownames(stress)

library(tidyr)
test <- gather(stress, key = "Sample", value = "CpG", 1:(ncol(stress)-1))
test$pop <- test$Sample
test$pop <- gsub("_.*","",test$pop)

test <- data.frame(colMeans(stress[,c(1:90)]))
colnames(test) <- "CpG"
test$sample <- rownames(test)
test$pop <- test$sample
test$pop <- gsub("_.*","",test$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/swarm_box.jpeg")
ggplot(test, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/swarm.jpeg")
ggplot(test, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3)
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/density.jpeg")
ggplot(test) + geom_density(aes(CpG, color= pop))
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/faceted-density.jpeg")
ggplot(test) + geom_density(aes(CpG)) + facet_wrap(~pop, scale = "free")
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/box.jpeg")
ggplot(test) + geom_boxplot(aes(pop, CpG))
dev.off()

plot provided GO ID lists

GOcpg <- merge(GOBP.spread, cpg, by.x= "V1", by.y= "ID")

#list1
list1 <- list("GO:0100044",
"GO:0042538",
"GO:0042539",
"GO:1900464",
"GO:1900069",
"GO:1900070",
"GO:0071477",
"GO:0071475",
"GO:1902074",
"GO:1902075",
"GO:0009651",
"GO:1901000",
"GO:1901002",
"GO:1901001",
"GO:1901200",
"GO:1901196",
"GO:1901199",
"GO:0071472")

list1df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]] #save GO IDs in each line as a list
for(j in 1:length(list)){
if(list[j] %in% list1){
list1df <- rbind(list1df, GOcpg[i,]) #bind a matching row to the empty df
}
}
}

rownames(list1df) <- list1df$Gene
list1df <- list1df[,-c(2:3)]

list1df <- data.frame(colMeans(list1df[,c(2:91)]))
colnames(list1df) <- "CpG"
list1df$sample <- rownames(list1df)
list1df$pop <- list1df$sample
list1df$pop <- gsub("_.*","",list1df$sample)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-swarmbox.jpeg")
ggplot(list1df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

#list 2
list2 <- list("GO:0070415",
"GO:0070414",
"GO:0090441",
"GO:0090442",
"GO:0061408",
"GO:1990845",
"GO:0031990",
"GO:0043620",
"GO:0043555",
"GO:1990611",
"GO:1990497",
"GO:0034605",
"GO:0070417",
"GO:0006950",
"GO:0036003",
"GO:0043618",
"GO:0080134",
"GO:0080135",
"GO:0097201",
"GO:0033554",
"GO:0033555",
"GO:0009271")

list2df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list2){
list2df <- rbind(list2df, GOcpg[i,])
}
}
}

list2df <- unique(list2df)
rownames(list2df) <- list2df$V1
list2df <- list2df[,-c(2:3)]

list2df <- data.frame(colMeans(list2df[,c(2:91)]))
colnames(list2df) <- "CpG"
list2df$sample <- rownames(list2df)
list2df$pop <- list2df$sample
list2df$pop <- gsub("_.*","",list2df$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-swarmbox.jpeg")
ggplot(list2df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

#list 3
list3 <- list("GO:0045087",
"GO:0061760",
"GO:0045088",
"GO:0002218",
"GO:0002227",
"GO:1905036",
"GO:1905035",
"GO:1905034",
"GO:0035420",
"GO:0045824",
"GO:0045089",
"GO:0002758",
"GO:0002766",
"GO:0090714",
"GO:0035419",
"GO:0035423",
"GO:0035422",
"GO:0035421",
"GO:0046735",
"GO:0046738",
"GO:0038075",
"GO:0039503",
"GO:0052170",
"GO:0052166",
"GO:0052167",
"GO:0052390",
"GO:0052305",
"GO:0052309",
"GO:0002220",
"GO:0038078",
"GO:0052157",
"GO:0052169",
"GO:0052033",
"GO:0052034",
"GO:0052382",
"GO:0052306",
"GO:0052308",
"GO:0052258",
"GO:0052257",
"GO:0052296",
"GO:0035663",
"GO:0035662",
"GO:0090644",
"GO:0035325",
"GO:0039699",
"GO:0046696",
"GO:0002224",
"GO:0039539",
"GO:0039538",
"GO:0039537",
"GO:0039560",
"GO:1990231",
"GO:0002755",
"GO:0002756",
"GO:0035669",
"GO:0035668",
"GO:0035667",
"GO:0035666",
"GO:0035661",
"GO:0035660",
"GO:0035665",
"GO:0035664")

list3df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list3){
list3df <- rbind(list3df, GOcpg[i,])
}
}
}

list3df <- unique(list3df)
rownames(list3df) <- list3df$V1
list3df <- list3df[,-c(2:3)]

list3df <- data.frame(colMeans(list3df[,c(2:91)]))
colnames(list3df) <- "CpG"
list3df$sample <- rownames(list3df)
list3df$pop <- list3df$sample
list3df$pop <- gsub("_.*","",list3df$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-swarmbox.jpeg")
ggplot(list3df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

qq plot on stress response genes by population

Gocpg

loop to get pop means per gene

pops <- unique(list1df$pop) #list of pop names
popmeans <- data.frame(matrix(0,19907,1)) #create empty matrix with dimensions b/c cbind sucks
for(i in (1:nrow(GOcpg))){ #for each row of GOcpg
for(j in 1:length(pops)){ # for item in the list (for each pop in list)
pattern <- paste0(pops[j],"_") # create a new variable and paste the pop item with an underscore to distinguish unique pops (ex CL and CLP)
popmeans[i,j] <- apply(GOcpg[i,grep(pattern,colnames(GOcpg))],1,mean) #fill in each new cell of matrix with the mean of indv populations for each gene
}
#print(popmeans[i,])
}
rownames(popmeans) <- rownames(GOcpg)
colnames(popmeans) <- pops

library(mixtools)

colname.list <- colnames(popmeans)

same plot

jpeg(filename = "Documents/robertslab/work/CpG/eastern-oyster/popmeans-qq-same.jpeg", width = 1000, height = 1000)

for(i in 1:length(colname.list)){
data <- popmeans[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
print(mean(popmeans[[i]]))
}

dev.off()

different plots

jpeg(filename = "Documents/robertslab/work/CpG/eastern-oyster/popmeans-qq-faceted.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(4, 4)) # r x c plot

for(i in 1:length(colname.list)){
data <- popmeans[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}

dev.off()

Kruskel and Wilcox test

#anova <- gather(popmeans, key = "Pop", value = "CpG", 1:(ncol(popmeans)))
#anova$Pop <- as.factor(anova$Pop)

#kruskal.test(CpG ~ Pop, data = anova)

#pairwise.wilcox.test(anova$CpG, anova$Pop, p.adjust.method = "BH")

Mean plot

#library("ggpubr")
#ggline(popmeans, x = "Pop", y = "Cpg",
# add = c("mean_se", "jitter"),
#order = c("","",…),
#ylab = "CpG O/E", xlab = "Population")

Q-Q plots on provided lists

GO.popmeans <- merge(popmeans, GOBP.spread, by.x="row.names", by.y="V1")

list 1

l1df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list1){
l1df <- rbind(l1df, GO.popmeans[i,])
}
}
}

l1df <- unique(l1df)
rownames(l1df) <- l1df$Row.names
l1df <- l1df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l1df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-qq-faceted.jpeg", width = 1000, height = 1000)
a <- par(mfrow = c(4, 4)) # r x c plot
for(i in 1:length(colname.list)){
data <- l1df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l1df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2=" CpG O/E")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

list 2

l2df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list2){
l2df <- rbind(l2df, GO.popmeans[i,])
}
}
}

l2df <- unique(l2df)
rownames(l2df) <- l2df$Row.names
l2df <- l2df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l2df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-qq-faceted.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(4, 4)) # r x c plot

for(i in 1:length(colname.list)){
data <- l2df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l2df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("blue", "red"), xlab2 = "CpG O/E ")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

list 3

l3df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list3){
l3df <- rbind(l3df, GO.popmeans[i,])
}
}
}

l3df <- unique(l3df)
rownames(l3df) <- l3df$Row.names
l3df <- l3df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l3df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-qq-faceted.jpeg", width = 1000, height = 1000)
a <- par(mfrow = c(4, 4)) # r x c plot
for(i in 1:length(colname.list)){
data <- l3df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l3df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2=" CpG O/E")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()