Kaitlyn’s notebook: Gene enrichment

14 IDs could be mapped (out of 28) using DAVID.

Focal adhesion was enriched in the KEGG-PATHWAY with a p-value of 5.4E-2 and Benjamini 6.1E-1.

UNIPROT-ACCESSION GENE NAME Related Genes Species
A8TX70 collagen type VI alpha 5 chain(COL6A5) RG Homo sapiens
P21333 filamin A(FLNA) RG Homo sapiens

I downloaded:

  • BP_DIRECT
  • BP_FAT
  • BP_ALL
  • UP_KEYWORDS
  • the functional clustering chart for BP_FAT, BP_ALL, and BP_DIRECT

BP_DIRECT had the fewest enriched processes (and they all fit in one screenshot unlike the others that could only be accurately visualized if they were downloaded):

BP-direct

BP_DIRECT are the annotations from the source (which I believe would be considered Uniprot) without any parent terms included.

The number of enriched processes has increased quite a bit since I added in the 0 abundance proteins to even out the protein list between silos after cluster analysis. Creating a heat map with the processes doesn’t seem like it will visualize the data correctly or easily. I’m going to see what other visualization tools downstream of gene enrichment analysis exist and if any are feasible for my data that I can try.

Kaitlyn’s notebook: Uniprot codes for enrichment

https://d.pr/n/yqFrFu

[code][sr320@mox1 jobs]$ cat 1024_1200.sh #!/bin/bash...

[sr320@mox1 jobs]$ cat 1024_1200.sh 
#!/bin/bash
## Job Name
#SBATCH --job-name=oakl
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=00-100:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/sr320/analyses/1024




source /gscratch/srlab/programs/scripts/paths.sh


find /gscratch/srlab/sr320/data/oakl/*_1.fq.gz \
| xargs basename -s _s1_R1_val_1.fq.gz | xargs -I{} /gscratch/srlab/programs/Bismark-0.19.0/bismark \
--path_to_bowtie /gscratch/srlab/programs/bowtie2-2.1.0 \
--score_min L,0,-1.2 \
-genome /gscratch/srlab/sr320/data/Cvirg-genome \
-p 28 \
-1 /gscratch/srlab/sr320/data/oakl/{}_s1_R1_val_1.fq.gz \
-2 /gscratch/srlab/sr320/data/oakl/{}_s1_R2_val_2.fq.gz \

/gscratch/srlab/programs/Bismark-0.19.0/deduplicate_bismark \
--bam -p \
/gscratch/srlab/sr320/analyses/1024/*.bam


/gscratch/srlab/programs/Bismark-0.19.0/bismark_methylation_extractor \
--bedGraph --counts --scaffolds \
--multicore 14 \
/gscratch/srlab/sr320/analyses/1024/*deduplicated.bam



# Bismark processing report

/gscratch/srlab/programs/Bismark-0.19.0/bismark2report

#Bismark summary report

/gscratch/srlab/programs/Bismark-0.19.0/bismark2summary



# Sort files for methylkit and IGV

find /gscratch/srlab/sr320/analyses/1024/*deduplicated.bam | \
xargs basename -s .bam | \
xargs -I{} /gscratch/srlab/programs/samtools-1.9/samtools \
sort --threads 28 /gscratch/srlab/sr320/analyses/1024/{}.bam \
-o /gscratch/srlab/sr320/analyses/1024/{}.sorted.bam

# Index sorted files for IGV
# The "-@ 16" below specifies number of CPU threads to use.

find /gscratch/srlab/sr320/analyses/1024/*.sorted.bam | \
xargs basename -s .sorted.bam | \
xargs -I{} /gscratch/srlab/programs/samtools-1.9/samtools \
index -@ 28 /gscratch/srlab/sr320/analyses/1024/{}.sorted.bam

#bismark, #sbatch

Kaitlyn’s notebook: adding in undetected proteins and heatmaps

Goals

My current cluster eliminates proteins that were never detected because I combined the data sets from silo 3 and silo 9 that contained only the corresponding silo’s abundant proteins. This means that when my cluster analysis is finished, I have uneven amounts of proteins for each silo. I want to create an even number of proteins per silo at the end of the cluster. This means I will need to edit the original raw data containing all of the silos rather than working off of the separate silo data sets.

After I do this, I will rerun the cluster analysis to get a new list of ‘unique’ proteins (unique proteins are defined as those that were in separate cluster groups based on temperature [ie. silo]). This final unique-proteins dataframe will be used for gene enrichment and to create heatmaps. I think a unique possible heatmap would be of parent terms based on the abundance of the proteins whose genes are annotated to that parent term.

Heatmaps

I got my computer back so I made sure that all my files are up to date on all systems. I heavily modified my cluster code to create a more accurate unique-proteins dataframe since previously it had redundant and incorrect columns mixed in with correct columns. Scales for heatmaps are normalized abundance values.

heatmap-allclus

All proteins from hierarchical clustering analysis with both proteins and time clustered.

heatmap-dayclus
All proteins from hierarchical clustering analysis with time clustered.
heatmap-protclus

All proteins from hierarchical clustering analysis with proteins clustered.

heatmap-silo3

Protein abundance over time with proteins clustered based on a Diocletian distance matrix. Proteins were chosen based on different cluster assignments from Silo 9 when hierarchical clustering was preformed with all proteins from Silo 3 and Silo 9.

heatmap-silo9

Protein abundance over time with proteins clustered based on a euclidean distance matrix. Proteins were chosen based on different cluster assignments from Silo 9 when hierarchical clustering was preformed with all proteins from Silo 3 and Silo 9.

Metboanalyst Heatmap

heatmap2_0-unqprot.png

Note that in this heatmap, the data has been:

  • filtered
    • mSet<-ReplaceMin(mSet)
  • normalized (mean centered)
    • mSet<-Normalization(mSet, "NULL", "NULL", "MeanCenter", ratio=FALSE, ratioNum=20)  
    • mSet<-PlotNormSummary(mSet, "norm_0_", "png", 72, width=NA)
    • mSet<-PlotSampleNormSummary(mSet, "snorm_0_", "png", 72, width=NA)

 

 

If needed later: displaying two heatmaps by each other.

Ronit’s Notebook: Candidate Genes for qPCR

I’m interested in doing a hypoosmotic stress exposure after finishing up the qPCR for the desiccation + elevated temp. samples. However, if we’re going to compare gene expression between desiccation and hypoosmotic stress samples, it’s important that some genes linked to hypoosmotic stress are examined in the desiccation samples as well to provide a basis for comparison. Here are a few classes of genes that I think might be relevant to examine:

  • Ion and amino acid channels 
    • LTrpC-8: mediates permeation for cations such as sodium, potassium, calcium
    • KCTD1: cysteine-rich protein, binds to KV channels
  • Immune response
    • CARM1: Transfer of methyl groups to histone 3 for chromatin remodeling
    • H2AV: One of the 5 main histone proteins involved in the structure of chromatin
  • Apoptosis genes 
  • Calcium binding genes      

Laura’s Notebook: O. lurida fastq trim testing

Today I downloaded RNASeq data – four fastq files – from Olympia oyster pooled gonad. The gonad was from Fidalgo Bay and Oyster Bay oysters following a 2017 low pH exposure. I unzipped the files, then tested a couple methods of trimming and plotting quality scores for trimmed/untrimmed files.

Jupyter notebook to download/trim files: Inspecting fastq files.ipynb

RMarkdown notebook to run a program to extract and plot quality scores against bp for trimmed/untrimmed files: RNASeq-screening.md

from The Shell Game https://ift.tt/2CYU8Ty
via IFTTT

Grace’s Notebook: Worked more on R script for adding Qubit data; Started using Mox

Today I worked more on my R script for adding new Qubit files. Everything works great up until the actual joining of files. After joining, there are extra columns that have the extensions “.x” and “.y”… I think it has something to do with the fact that some columns are factors, some are characters, and some are numeric… I also started using Mox today, but am unsure how to upload the .fastq files from the C bairdi transcriptome data. Waiting to hear back on that in a GitHub issue.

R Script

Script here

Issue with the final joining.

Mox

Steven showed me some examples of how to run Trinity on Mox.

 ## Job Name #SBATCH --job-name=trinity ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes (We only get 1, so this is fixed) #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=10-100:00:00 ## Memory per node #SBATCH --mem=100G #SBATCH --mail-type=ALL #SBATCH --mail-user=sr320@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/srlab/sr320/analyses/0624b source /gscratch/srlab/programs/scripts/paths.sh /gscratch/srlab/programs/trinity/Trinity \ --seqType fq \ --max_memory 100G \ --left /gscratch/srlab/sr320/data/geoduck-RNA-seq/NR012_S1_L001_R1_001.fastq,\ /gscratch/srlab/sr320/data/geoduck-RNA-seq/NR012_S1_L002_R1_001.fastq \ --right /gscratch/srlab/sr320/data/geoduck-RNA-seq/NR012_S1_L001_R2_001.fastq,\ /gscratch/srlab/sr320/data/geoduck-RNA-seq/NR012_S1_L002_R2_001.fastq \ --trimmomatic \ --CPU 28  

hyak_mox Wiki

To upload files:
File transfers

“`ssh: connect to host 205.175.107.122 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(605) [Receiver=3.0.9] [graceac9@mox1 ~]$

from Grace’s Lab Notebook https://ift.tt/2OGAEtf
via IFTTT