Sam’s Notebook: Transposable Element Mapping – Crassostrea gigas Genome v9 Using RepeatMasker 4.07 on Roadrunner

Per this GitHub issue, I’m IDing transposable elements (TEs) in the Crassostrea gigas genome. Even though the C.gigas genome should be fully annotated, Steven wants a comparable set of analyses to compare to the Crassostrea virginica TE mapping we previously performed.

I used the Crassostrea gigas genome we have linked on our GitHub Genomic Resources wiki:

Analysis was performed in the following Jupyter Notebok (GitHub):

[code] #!/bin/bash ## Job Name...

#!/bin/bash
## Job Name
#SBATCH --job-name=oly-mbd
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=00-100:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/sr320/analyses/2019/0327



# Directories and programs
wd=$(pwd)
bismark_dir="/gscratch/srlab/programs/Bismark-0.21.0"
bowtie2_dir="/gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/"
samtools="/gscratch/srlab/programs/samtools-1.9/samtools"
reads_dir="/gscratch/srlab/sr320/data/olurida-bs/"



source /gscratch/srlab/programs/scripts/paths.sh


find ${reads_dir}*_s456_trimmed.fq.gz \
| xargs basename -s _s456_trimmed.fq.gz | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome /gscratch/srlab/sr320/data/olurida-genomes/v081 \
-p 14 \
--non_directional \
${reads_dir}/{}_s456_trimmed.fq.gz 




${bismark_dir}/deduplicate_bismark \
--bam -p \
*.bam


${bismark_dir}/bismark_methylation_extractor \
--bedGraph --counts --scaffolds \
--multicore 14 \
*deduplicated.bam



# Bismark processing report

${bismark_dir}/bismark2report

#Bismark summary report

${bismark_dir}/bismark2summary



# Sort files for methylkit and IGV

find *deduplicated.bam | \
xargs basename -s .bam | \
xargs -I{} ${samtools} \
sort --threads 28 {}.bam \
-o {}.sorted.bam

# Index sorted files for IGV
# The "-@ 16" below specifies number of CPU threads to use.

find *.sorted.bam | \
xargs basename -s .sorted.bam | \
xargs -I{} ${samtools} \
index -@ 28 {}.sorted.bam

#bismark, #sbatch

Sam’s Notebook: Metagenomics – Taxonomic Diversity from Geoduck Water with BLASTn and Krona Plots

Continuing on getting the metagenomics sequencing project written up as a manuscript, Steven asked me to provide an overview of the taxonomic makeup of our metagenome assembly in this GitHub Issue. Earlier today, I ran analysis using BLASTp data. I initiated additional analysis using the MetaGeneMark nucleotide data to run BLASTn on Mox.

SBATCH script:

  #!/bin/bash ## Job Name #SBATCH --job-name=blastn_metagenomics ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190325_blastn_metagenomics_geoduck_metagenemark # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log wd="$(pwd)" # Paths to input/output files blastn_out="${wd}/blastn.outfmt6" blastdb_dir="/gscratch/srlab/blastdbs/ncbi-nr-nt-v5" blast_db="${blastdb_dir}/nt" fasta="/gscratch/srlab/sam/data/metagenomics/P_generosa/assemblies/20190103-mgm-nucleotides.fa" # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastn="${blast_dir}/blastn" export BLASTDB=${blastdb_dir} # Run blastx on Trinity fasta ${blastn} \ -query ${fasta} \ -db ${blast_db} \ -max_target_seqs 1 \ -outfmt "6 std staxids" \ -evalue 1e-10 \ -num_threads 28 \ > ${blastn_out}  

After the BLASTn, I followed that up by making a Krona plot using the taxonomic info pulled via BLASTn. This was run locally on my computer (swoose).

Krona plot script:

  #!/bin/env bash # Bash script for creating Krona plot of metagenomics taxonomies from BLAST outputs. # BLAST output format is expected to be: "6 std staxids" # Input/output files blast_out="blastn.outfmt6" krona_out="krona_megahit_MGM_blastn.html" krona_tax_list="krona_tax.lst" krona_stderr="krona_stderr.txt" # Krona standard error capture krona_stdout="krona_stdout.txt" # Krona standard out capture # Programs krona="/home/sam/programs/KronaTools-2.7/bin/ktImportTaxonomy" ## Extract NCBI taxon IDs from BLAST output ### Uses awk associative array to pull out unique entries with highest bitscore and unqique NCBI tax IDs ### BLAST output is sorted by bitscore for multiple matches for a single query ### Pipe unique entries to second awk command ### Set ';' and tab as field delimiters. Prevents issues with taxon ID column having multiple entries for multi-strain matching ### Prints a tab-delimited ouptut file containing the query ID and the taxon ID awk -F'[;\t]' '!seen[$1,$13]++' ${blast_out} \ | awk '{print $1 "\t" $13}' \ > ${krona_tax_list} ## Create Krona plot, specifying output filename ${krona} \ -o ${krona_out} \ ${krona_tax_list} \ 1> ${krona_stdout} \ 2> ${krona_stderr} 

Sam’s Notebook: Sample Submission – Lotterhos C.virginica Mantle MBD DNA to ZymoResearch for BSseq

Crassostrea virginica samples that were subjected to MBD enrichment (completed 20190319) were shipped, on dry ice, to ZymoResearch today for BSseq. They will perform bisulfite conversion, library prep, and sequencing. Sequencing output is to be ~50M paired-end reads (at least 100bp or 151bp) per library.

The master sample sheet for this project is here (Google Sheet):

FedEx tracking:

from Sam’s Notebook https://ift.tt/2FBU3pa
via IFTTT

Sam’s Notebook: Data Management – SRA Submission Panopea generosa Day 5 Larvae RNAseq

In preparation for some manuscripts, Steven asked that I get some geoduck RNAseq data upload to the NCBI sequencing read archive (SRA) in this GitHub Issue. Specifically, he needed the data corresponding to day 5 (pos-fertilization) larve RNAseq data that was prepped/run by Illumina on the NovaSeq. Original sample submission notebook entry is here.

List of FastQ files uploaded:

Kaitlyn’s notebook: C. virginica and C. gigas

I’ve been working on theseissues which means I have been analyzing the methylation dataset for different populations of eastern oysters, as well as continuing to work on the WSG publication.

CpG O/E Eastern oyster

cpg <- read.csv("Documents/robertslab/work/CpG/eastern-oyster/ID_CpG_labelled_all.csv")

single q-q plot

data <- cpg$CL_1
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = "CL_1", font.main = 3)
lines(density(filt.data), lty=2, lwd=2)

faceted q-q plots

library(mixtools)

colname.list <- colnames(cpg)

jpeg(filename = "Documents/robertslab/work/faceted_cpg.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(10, 9)) # r x c plot

for(i in 2:length(colname.list)){
data <- cpg[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[[1]][i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
print(mean(cpg[[i]]))
}

dev.off()

pull out stress response genes

GOslim <- read.csv("Documents/robertslab/work/CpG/eastern-oyster/Blastquery-GOslim.csv", header=FALSE)
GOslim$V1 <- gsub(".*?NC","NC", GOslim$V1) #remove any artifacts from excel

stress.GO.nolistnovalues <- GOslim[grep("stress", GOslim$V3), ]
write.csv(merge(stress.GOslim, cpg, by.x="V1", by.y="ID"), "Documents/robertslab/work/CpG/eastern-oyster/stress-annotated.csv")
#2733 stress IDs identified

BPspread <- aggregate(V3~V1,GOslim,FUN=paste)
GOspread <- aggregate(V2~V1,GOslim,FUN=paste)
GOBP.spread <- merge(BPspread, GOspread, by="V1")
class(GOBP.spread$V3)

stress.GOBP.list <- GOBP.spread[grep("stress", GOBP.spread$V3), ]

stress.cpg <- merge(stress.GOBP.list, cpg, by.x = "V1", by.y="ID")

scatter plots with mean, median and range of values in original cpg table

rownames(cpg) <- cpg$ID
cpg <- cpg[,-c(1)]

library(matrixStats)

stats <- data.frame(colMeans(cpg))
colnames(stats)[1] <- "mean"
stats$median <- colMedians(as.matrix(cpg))
#stats$range <- colRanges(as.matrix(cpg)) #values too small
#stats <- subset(stats, select=-c(range))
stats$skew <- colskewness(as.matrix(cpg))
stats$kurtosis <- colkurtosis(as.matrix(cpg))
stats$variance <- colVars(as.matrix(cpg))
stats$stdev <- colVars(as.matrix(cpg), std = TRUE)

stats$ID <- rownames(stats)

aa <- ggplot(stats, aes(x=ID, y=mean)) + geom_point()
bb <- ggplot(stats, aes(x=ID, y=median)) + geom_point()
cc <- ggplot(stats, aes(x=ID, y=kurtosis)) + geom_point()
dd <- ggplot(stats, aes(x=ID, y=skew)) + geom_point()
ee <- ggplot(stats, aes(x=ID, y=variance)) + geom_point()
ff <- ggplot(stats, aes(x=ID, y=stdev)) + geom_point()

library(gridExtra)
jpeg("Documents/robertslab/work/CpG/eastern-oyster/basicstats.jpeg")
grid.arrange(aa,bb,cc,dd,ee,ff, nrow = 3)
dev.off()

stress response stats

rownames(stress.cpg) <- stress.cpg$V1
stress.cpg <- stress.cpg[,-c(1:3)]

stress.stats <- data.frame(colMeans(stress.cpg))
colnames(stress.stats)[1] <- "mean"
stress.stats$median <- colMedians(as.matrix(stress.cpg))
stress.stats$skew <- colskewness(as.matrix(stress.cpg))
stress.stats$kurtosis <- colkurtosis(as.matrix(stress.cpg))
stress.stats$variance <- colVars(as.matrix(stress.cpg))
stress.stats$stdev <- colVars(as.matrix(stress.cpg), std = TRUE)

stress.stats$ID <- rownames(stress.stats)

a <- ggplot(stress.stats, aes(x=ID, y=mean)) + geom_point()
b <- ggplot(stress.stats, aes(x=ID, y=median)) + geom_point()
c <- ggplot(stress.stats, aes(x=ID, y=kurtosis)) + geom_point()
d <- ggplot(stress.stats, aes(x=ID, y=skew)) + geom_point()
e <- ggplot(stress.stats, aes(x=ID, y=variance)) + geom_point()
f <- ggplot(stress.stats, aes(x=ID, y=stdev)) + geom_point()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/stress-basicstats.jpeg")
grid.arrange(a,b,c,d,e,f, nrow = 3)
dev.off()

stress response and all samples plotted together

jpeg("Documents/robertslab/work/CpG/eastern-oyster/all-v-stress.jpeg")
grid.arrange(aa,a,cc,c,dd,d,ee,e, nrow = 4)
dev.off()

table with mean of all samples (rather than mean of genes)

cpg <- data.frame(t(cpg))

all.stats <- data.frame(colMeans(cpg))
colnames(all.stats)[1] <- "mean"

all.stats$median <- colMedians(as.matrix(cpg))
all.stats$skew <- colskewness(as.matrix(cpg))
all.stats$kurtosis <- colkurtosis(as.matrix(cpg))
all.stats$variance <- colVars(as.matrix(cpg))
all.stats$stdev <- colVars(as.matrix(cpg), std = TRUE)

write.table(all.stats, "Documents/robertslab/work/CpG/eastern-oyster/stats-allsamples-on-samples.txt")

all.stats$ID <- rownames(all.stats)

g <- ggplot(all.stats, aes(x=ID, y=mean)) + geom_point()
h <- ggplot(all.stats, aes(x=ID, y=median)) + geom_point()
i <- ggplot(all.stats, aes(x=ID, y=kurtosis)) + geom_point()
j <- ggplot(all.stats, aes(x=ID, y=skew)) + geom_point()
k <- ggplot(all.stats, aes(x=ID, y=variance)) + geom_point()
l <- ggplot(all.stats, aes(x=ID, y=stdev)) + geom_point()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/all-on_samples.jpeg")
grid.arrange(g,h,i,j,k,l, nrow = 3)
dev.off()

density, box and Swarm plots of stress reponse genes

library(ggbeeswarm)
theme_set(theme_light()) #sets a default ggplot theme

stress <- stress.cpg
rownames(stress) <- stress$V1
stress <- stress[,-c(1:3)]

stress$Gene <- rownames(stress)

library(tidyr)
test <- gather(stress, key = "Sample", value = "CpG", 1:(ncol(stress)-1))
test$pop <- test$Sample
test$pop <- gsub("_.*","",test$pop)

test <- data.frame(colMeans(stress[,c(1:90)]))
colnames(test) <- "CpG"
test$sample <- rownames(test)
test$pop <- test$sample
test$pop <- gsub("_.*","",test$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/swarm_box.jpeg")
ggplot(test, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/swarm.jpeg")
ggplot(test, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3)
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/density.jpeg")
ggplot(test) + geom_density(aes(CpG, color= pop))
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/faceted-density.jpeg")
ggplot(test) + geom_density(aes(CpG)) + facet_wrap(~pop, scale = "free")
dev.off()

jpeg("Documents/robertslab/work/CpG/eastern-oyster/box.jpeg")
ggplot(test) + geom_boxplot(aes(pop, CpG))
dev.off()

plot provided GO ID lists

GOcpg <- merge(GOBP.spread, cpg, by.x= "V1", by.y= "ID")

#list1
list1 <- list("GO:0100044",
"GO:0042538",
"GO:0042539",
"GO:1900464",
"GO:1900069",
"GO:1900070",
"GO:0071477",
"GO:0071475",
"GO:1902074",
"GO:1902075",
"GO:0009651",
"GO:1901000",
"GO:1901002",
"GO:1901001",
"GO:1901200",
"GO:1901196",
"GO:1901199",
"GO:0071472")

list1df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]] #save GO IDs in each line as a list
for(j in 1:length(list)){
if(list[j] %in% list1){
list1df <- rbind(list1df, GOcpg[i,]) #bind a matching row to the empty df
}
}
}

rownames(list1df) <- list1df$Gene
list1df <- list1df[,-c(2:3)]

list1df <- data.frame(colMeans(list1df[,c(2:91)]))
colnames(list1df) <- "CpG"
list1df$sample <- rownames(list1df)
list1df$pop <- list1df$sample
list1df$pop <- gsub("_.*","",list1df$sample)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-swarmbox.jpeg")
ggplot(list1df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

#list 2
list2 <- list("GO:0070415",
"GO:0070414",
"GO:0090441",
"GO:0090442",
"GO:0061408",
"GO:1990845",
"GO:0031990",
"GO:0043620",
"GO:0043555",
"GO:1990611",
"GO:1990497",
"GO:0034605",
"GO:0070417",
"GO:0006950",
"GO:0036003",
"GO:0043618",
"GO:0080134",
"GO:0080135",
"GO:0097201",
"GO:0033554",
"GO:0033555",
"GO:0009271")

list2df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list2){
list2df <- rbind(list2df, GOcpg[i,])
}
}
}

list2df <- unique(list2df)
rownames(list2df) <- list2df$V1
list2df <- list2df[,-c(2:3)]

list2df <- data.frame(colMeans(list2df[,c(2:91)]))
colnames(list2df) <- "CpG"
list2df$sample <- rownames(list2df)
list2df$pop <- list2df$sample
list2df$pop <- gsub("_.*","",list2df$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-swarmbox.jpeg")
ggplot(list2df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

#list 3
list3 <- list("GO:0045087",
"GO:0061760",
"GO:0045088",
"GO:0002218",
"GO:0002227",
"GO:1905036",
"GO:1905035",
"GO:1905034",
"GO:0035420",
"GO:0045824",
"GO:0045089",
"GO:0002758",
"GO:0002766",
"GO:0090714",
"GO:0035419",
"GO:0035423",
"GO:0035422",
"GO:0035421",
"GO:0046735",
"GO:0046738",
"GO:0038075",
"GO:0039503",
"GO:0052170",
"GO:0052166",
"GO:0052167",
"GO:0052390",
"GO:0052305",
"GO:0052309",
"GO:0002220",
"GO:0038078",
"GO:0052157",
"GO:0052169",
"GO:0052033",
"GO:0052034",
"GO:0052382",
"GO:0052306",
"GO:0052308",
"GO:0052258",
"GO:0052257",
"GO:0052296",
"GO:0035663",
"GO:0035662",
"GO:0090644",
"GO:0035325",
"GO:0039699",
"GO:0046696",
"GO:0002224",
"GO:0039539",
"GO:0039538",
"GO:0039537",
"GO:0039560",
"GO:1990231",
"GO:0002755",
"GO:0002756",
"GO:0035669",
"GO:0035668",
"GO:0035667",
"GO:0035666",
"GO:0035661",
"GO:0035660",
"GO:0035665",
"GO:0035664")

list3df <- data.frame()
for (i in 1:nrow(GOcpg)){
list <- GOcpg$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list3){
list3df <- rbind(list3df, GOcpg[i,])
}
}
}

list3df <- unique(list3df)
rownames(list3df) <- list3df$V1
list3df <- list3df[,-c(2:3)]

list3df <- data.frame(colMeans(list3df[,c(2:91)]))
colnames(list3df) <- "CpG"
list3df$sample <- rownames(list3df)
list3df$pop <- list3df$sample
list3df$pop <- gsub("_.*","",list3df$pop)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-swarmbox.jpeg")
ggplot(list3df, aes(pop,CpG, col = pop)) + geom_beeswarm(size = 3, cex = 3) + geom_boxplot(aes(pop, CpG), width = 0.5, alpha = 0.1)
dev.off()

qq plot on stress response genes by population

Gocpg

loop to get pop means per gene

pops <- unique(list1df$pop) #list of pop names
popmeans <- data.frame(matrix(0,19907,1)) #create empty matrix with dimensions b/c cbind sucks
for(i in (1:nrow(GOcpg))){ #for each row of GOcpg
for(j in 1:length(pops)){ # for item in the list (for each pop in list)
pattern <- paste0(pops[j],"_") # create a new variable and paste the pop item with an underscore to distinguish unique pops (ex CL and CLP)
popmeans[i,j] <- apply(GOcpg[i,grep(pattern,colnames(GOcpg))],1,mean) #fill in each new cell of matrix with the mean of indv populations for each gene
}
#print(popmeans[i,])
}
rownames(popmeans) <- rownames(GOcpg)
colnames(popmeans) <- pops

library(mixtools)

colname.list <- colnames(popmeans)

same plot

jpeg(filename = "Documents/robertslab/work/CpG/eastern-oyster/popmeans-qq-same.jpeg", width = 1000, height = 1000)

for(i in 1:length(colname.list)){
data <- popmeans[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
print(mean(popmeans[[i]]))
}

dev.off()

different plots

jpeg(filename = "Documents/robertslab/work/CpG/eastern-oyster/popmeans-qq-faceted.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(4, 4)) # r x c plot

for(i in 1:length(colname.list)){
data <- popmeans[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = " ", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}

dev.off()

Kruskel and Wilcox test

#anova <- gather(popmeans, key = "Pop", value = "CpG", 1:(ncol(popmeans)))
#anova$Pop <- as.factor(anova$Pop)

#kruskal.test(CpG ~ Pop, data = anova)

#pairwise.wilcox.test(anova$CpG, anova$Pop, p.adjust.method = "BH")

Mean plot

#library("ggpubr")
#ggline(popmeans, x = "Pop", y = "Cpg",
# add = c("mean_se", "jitter"),
#order = c("","",…),
#ylab = "CpG O/E", xlab = "Population")

Q-Q plots on provided lists

GO.popmeans <- merge(popmeans, GOBP.spread, by.x="row.names", by.y="V1")

list 1

l1df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list1){
l1df <- rbind(l1df, GO.popmeans[i,])
}
}
}

l1df <- unique(l1df)
rownames(l1df) <- l1df$Row.names
l1df <- l1df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l1df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-qq-faceted.jpeg", width = 1000, height = 1000)
a <- par(mfrow = c(4, 4)) # r x c plot
for(i in 1:length(colname.list)){
data <- l1df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list1-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l1df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2=" CpG O/E")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

list 2

l2df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list2){
l2df <- rbind(l2df, GO.popmeans[i,])
}
}
}

l2df <- unique(l2df)
rownames(l2df) <- l2df$Row.names
l2df <- l2df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l2df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-qq-faceted.jpeg", width = 1000, height = 1000)

a <- par(mfrow = c(4, 4)) # r x c plot

for(i in 1:length(colname.list)){
data <- l2df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list2-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l2df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("blue", "red"), xlab2 = "CpG O/E ")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

list 3

l3df <- data.frame()
for (i in 1:nrow(GO.popmeans)){
list <- GO.popmeans$V2[[i]]
for(j in 1:length(list)){
if(list[j] %in% list3){
l3df <- rbind(l3df, GO.popmeans[i,])
}
}
}

l3df <- unique(l3df)
rownames(l3df) <- l3df$Row.names
l3df <- l3df[,-c(1,17:18)]

#faceted plots
colname.list <- colnames(l3df)

jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-qq-faceted.jpeg", width = 1000, height = 1000)
a <- par(mfrow = c(4, 4)) # r x c plot
for(i in 1:length(colname.list)){
data <- l3df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2 = "CpG O/E", main2 = colname.list[i], font.main = 3)
lines(density(filt.data), lty=2, lwd=2)
}
dev.off()

#same plot
jpeg("Documents/robertslab/work/CpG/eastern-oyster/list3-qq-same.jpeg", width = 1000, height = 1000)
for(i in 1:length(colname.list)){
data <- l3df[[i]]
filt.data = 0.001 & data <= 1.5] #Cutting off high and low values
set.seed(101)
mixmdl <- normalmixEM(filt.data, k=2)
plot(mixmdl, which = 2, col2 = c("red", "blue"), xlab2=" CpG O/E")
lines(density(filt.data), lty=2, lwd=2)
par(new = TRUE)
}
dev.off()

Grace’s Notebook: 2015 DIA Oysterseed Paper Progress

A little bit of progress has been made with the DIA proteomics paper. I am aiming to have the methods section of the paper solidly done by our meeting this Wednesday at 2 pm.

Created a paper repo

https://github.com/grace-ac/paper-pacific.oyster-larvae

A lot of work needs to be done to organize both the paper’s repo and the project’s repo.

Work in progress.

Methods developing

Got some input from Steven and Emma on the methods for the paper. I’m working on paring down the methods that Rhonda wrote, as well as adding the methods of what I’ve done. Still adding to it because I’m currently working towards getting the GO and GOslim terms from the Skyline output file so that I can compare the proteins expressed between the two temperature regimes.

Started using Galaxy

I’m using Galaxy right now to run a BLASTp with the newest version of the uniprot-swprot database and the proteome I used in the DIA analysis.

From the BLAST output file, I’ll join that with the GO terms (probably in R or Jupyter… unless that’s also something you can do in Galaxy. I’ll play around with it once the BLASTp is complete). Then, I’ll create some simple Venn diagrams and other basic plots and stats to show what proteins were expressed in general, as well as to compare the two temperatures. Aiming to have this done by the Wednesday meeting as well.

from Grace’s Lab Notebook https://ift.tt/2JEqMOo
via IFTTT