Grace’s Notebook: Re-run Trinity with correct files

Today I re-ran Trinity because the original files I used that I downloaded to my computer from nightingales were too small and not .fastq.gz. Sam showed me how to do rsync from nightingales to mox. I also am working on doing a BLAST with the finished Trinity.fasta that is too small so that once my real assembled transcriptome is ready, I’ll have a pipeline set up.


GitHub Issue #452

When I downloaded the files from /nightingales/C_bairdi/, my computer wouldn’t download the whole file and it would show up like this in my downloads:

It is unknown exactly how this happened, but probably something in my settings.

I learned that I should just stick with using rsync to move files from Owl to Mox, and vice versa. img

I updated the file names in my and re-sent the job on Mox.

BLAST practice with Trinity.fasta

Here is the output from the weekend’s Trinity assembly, that was way too fast because it used the 64KB .fastq files:

 [graceac9@mox1 ~]$ cd /gscratch/srlab/graceac9/analyses/1024/trinity_out_dir/ [graceac9@mox1 trinity_out_dir]$ ls 304428_S1_L001_R1_001.fastq.P.qtrim.gz inchworm.kmer_count 304428_S1_L001_R1_001.fastq.PwU.qtrim.fq insilico_read_normalization 304428_S1_L001_R1_001.fastq.U.qtrim.gz jellyfish.kmers.fa 304428_S1_L001_R2_001.fastq.P.qtrim.gz jellyfish.kmers.fa.histo 304428_S1_L001_R2_001.fastq.PwU.qtrim.fq left.fa.ok 304428_S1_L001_R2_001.fastq.U.qtrim.gz partitioned_reads.files.list 304428_S1_L002_R1_001.fastq.P.qtrim.gz partitioned_reads.files.list.ok 304428_S1_L002_R1_001.fastq.PwU.qtrim.fq pipeliner.32691.cmds 304428_S1_L002_R1_001.fastq.U.qtrim.gz read_partitions 304428_S1_L002_R2_001.fastq.P.qtrim.gz recursive_trinity.cmds 304428_S1_L002_R2_001.fastq.PwU.qtrim.fq recursive_trinity.cmds.completed 304428_S1_L002_R2_001.fastq.U.qtrim.gz recursive_trinity.cmds.ok both.fa right.fa.ok both.fa.ok scaffolding_entries.sam both.fa.read_count trimmomatic.ok chrysalis Trinity.fasta inchworm.K25.L25.DS.fa Trinity.timing inchworm.K25.L25.DS.fa.finished  

I used rsync to transfer the Trinity.fasta from Mox to my owl folder.


Then, I started running this BLAST notebook: 20181025-blast-Cbairdi_swiss-prot.ipynb

I also have this .sh ( for BLAST that I’m working on… will run both with the Trinity.fasta (query.fa).

from Grace’s Lab Notebook

Laura’s Notebook: Denovo Oly gonad transcriptome assembly (attempt 1)

Today I got comfortable using the Mox (Hyak) supercomputer, created my directories, and queued a transcriptome assembly using Trinity.

For details, please see my notebook:

from The Shell Game

Kaitlyn’s notebook: reviewing material for possible next steps or gene enrichment visualization

Now that I have a list of terms that show some significance, I want to figure out how to visualize the data in an informative way. I’m doing some research on gene enrichment visualization tools. Here are some that I’ve come across:

  • REVIGOREVIGO-resultsREVIGO-treemap
    • The only two processes that weren’t significant were
      • negative regulation of  biological process
      • regulation of anatomical structure size
    • I also saved the results in the table REVIGO produces.
    • WebGiviwebgivi-test
      • This is a quick example of a visualization that WebGivi does. It might be interesting to reorganize the data such that proteins are drawn to parent terms (rather than GO IDs being drawn to the GO term).
    • InterMineRgithub here and bioconductor
      • Both an enrichment and visualization package in R
    • Panther and Gorilla seem to be for model organisms or else you need protein sequences to analyze against the database.
    • This website has a list of gene enrichment tools that I want to go through. I’ve come across and mentioned some already, but there is quite a few on here.
    • Some good info on gene enrichment interpretation and presentation.

Additionally, I’m reviewing literature on how gene enrichment has been visualized before, and if there are other methods that might be suitable for my data set.

Yaamini’s Notebook: DML Analysis Part 15

Preliminary bedtools analysis

In this R Markdown file, I wrote code to create BEDfiles for the DMLs and DMRs I identified.

My next step was to characterize the location of DMLs and DMRs. I did by revising code in a previously existing Jupyter notebook. I counted the number of DMLs and DMRs, as well as the number of overlaps between each of those features with exons, introns, and mRNA coding regions. I exported .csv files with the intersection of these features in this folder. One thing I noticed was that there were equal numbers of hypermethylated and hypomethylated DMLs. These DMLs were primarily in the mRNA coding regions, and more were in exons than introns. However, there were significantly more hypomethylated DMRs than hypermethylated DMRs, and more DMRs in introns than exons. This could hint at some function for alternative splicing.

For my presentation to the Lotterhos lab, I also cherry-picked some genes what were differentially methylated. Cilia and flagella genes, fatty acid desaturation genes, and multidrug resistance protein genes were hypermethylated, while heat shock protein and MAP kinase genes were hypomethylated.

One thing that came up while Steven and I were looking at my results is that Steven is getting way less DMLs and DMRs than I am. We quickly walked through our methylKit code and found no discrepancies. This is something we will need to sort out!

Going forward

  1. I will update the C. virginica gonad methylation paper with reproducible methods and preliminary results. Steven will use this information to reproduce my work and hopefully figure out why we’re getting different results.
  2. I need to figure out how to do a proper gene enrichment and flanking analysis
  3. Ready scripts for sample analysis when my Mox job finishes running

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Grace’s Notebook: Sent First Trinity Assembly Job to Mox!

Today, with Steven’s guidance, I sent the first job to mox! I am not sure if it will work, but I have it set up to get email notification when the job finishes. Below I provide the link to the .sh script to run Trinity as well as the command line code to run Trinity on mox. I also am starting to work on creating a BLAST pipeline ready. I provide the link to what I’m working on. Currently it’s in a jupyter notebook, but Steven said that this can also be done on Mox, so I’ll work on making it into an .sh script and create GitHub issues to get it checked out by Steven and Sam.

Trinity on Mox

Here’s what I put in the commandline to make it happen:

 /gscratch/srlab/graceac9/data [graceac9@mox1 data]$ ls 304428_S1_L001_R1_001.fastq 304428_S1_L002_R1_001.fastq 304428_S1_L001_R2_001.fastq 304428_S1_L002_R2_001.fastq [graceac9@mox1 data]$ cd ../ [graceac9@mox1 graceac9]$ ls analyses blastdb data jobs [graceac9@mox1 graceac9]$ cd jobs [graceac9@mox1 jobs]$ pwd /gscratch/srlab/graceac9/jobs [graceac9@mox1 jobs]$ touch [graceac9@mox1 jobs]$ ls [graceac9@mox1 jobs]$ nano [graceac9@mox1 jobs]$ head #!/bin/bash ## Job Name #SBATCH --job-name=20181024_Cbairdi_trinity_01 ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) [graceac9@mox1 jobs]$ sbatch -p srlab -A ^C [graceac9@mox1 jobs]$ sbatch -p srlab -A srlab Submitted batch job 401714 [graceac9@mox1 jobs]$ squeue | grep srlab 401714 srlab 20181024 graceac9 PD 0:00 1 (QOSGrpCpuLimit) 401715 srlab BismarkA strigg PD 0:00 1 (QOSGrpCpuLimit) 401682 srlab 20181012 yaaminiv PD 0:00 1 (QOSGrpCpuLimit) 359332 srlab Ae_canu jldimond R 13-00:30:22 1 n2211 394229 srlab BismarkA strigg R 21:40:22 1 n2203 [graceac9@mox1 jobs]$  

GitHub Issue #452 outlines some uncertainties Sam has on whether the script will work, but I’ll see what happens when the job finishes (if it finishes).

Creating BLAST Pipeline

Currently, I have this .ipynb notebook that is based on other BLAST projects I’ve done: 0181025-blast-Cbairdi_swiss-prot.ipynb

In class, Steven said that BLAST can be run on Mox as well, so I’m thinking I can make the BLAST script into a .sh. Perhaps something like this…

I don’t think either of those scripts would work in their current states, obviously. I’m going to work through them a little more and then make an Issue in GitHub to see what needs to be changed/added for either or both to work.

from Grace’s Lab Notebook

Laura’s Notebook: Survival data from Olympia oyster feeding/temp trial

Remember when I ran the Olympia oyster broodstock overwintering project, in which I held oysters in 2 temperatures (7, 10) and feeding regimes (low, high) for 3 months? No? check out these notebook entries: Experimental design post, and Broodstock fecundity post

I grew 12 “families” per treatment separately in mini culture tanks (~800mL), each family had 3 replicates, a total of 144 silos. I stocked ~800 larvae per tank (according to my error checking I likely stocked 824+/-54 larvae, a ~3% error). I collected larvae from March 31st – April 19th. At day 50 for each collection group I counted the number of live post-set.

NOTE: I did save all these oysters in their respective groups, so depending on survival over the summer without being tended I have up to 48 separate families of olympia oysters from a minimum of 16 males/females (16 separate spawning buckets) – seems like someone should do something with them!

The following are a series of % survival plots, color coded by treatment. They show the data in two ways- simply by treatment, then by treatment over time. I noticed that some of the larval groups that were released later in the experiment had higher survival.

COLD-HIGH = Cold winter temp (7C), high food concentration (cell/mL TBD)

COLD-LOW = Cold temp (7C), low food

WARM-HIGH = Warm temp (10C), high food

WARM-LOW = Warm temp (10C), low food

I ran binomial generalized linear models comparing treatments and time, and found no significant differences between either temperature or feeding levels. There was a marginal trend (p<0.1) towards higher survival with time, but not significant. Interesting that the winter feeding level had not effect!

Ronit’s Notebook: RNA Extraction for C.Gigas Desiccation + Elevated Temperature Samples (Round 2)

I decided to run another RNA isolation today with 8 more samples: D03, D04, D13, D14, T03, T04, T13, T14

Protocol was as follows:

  1. 500 µL of RNAzol RT was added to a clean tube.
  2. Tissue samples were removed and a small section was cut out for RNA extraction.
  3. Tissue portions were placed in the tube and an additional 500 µL of RNAzol RT was added to bring the volume up to 1mL.
  4. The samples were vortexed vigorously for 10 seconds
  5. Samples were incubated at room temperature for 5 minutes.
  6. 400 µL of DEPC-water was added to the samples.
  7. Samples were centrifuged for 15 minutes at 12,000 g.
  8. 750  µL of the supernatant was transferred to a new, clean tube and an equal volume of isopropanol was added to the sample.
  9. The samples were vortexed vigorously for 10 seconds.
  10. Samples were incubated at room temperature for 5 minutes.
  11. Samples were centrifuged for 15 minutes at 12,000 g.

Due to time constraints, I decided to finish up the extraction later/quantify RNA using the Qubit and stored the RNA pellet suspended in isopropanol in the -80 freezer.

Kaitlyn’s notebook: Gene enrichment

14 IDs could be mapped (out of 28) using DAVID.

Focal adhesion was enriched in the KEGG-PATHWAY with a p-value of 5.4E-2 and Benjamini 6.1E-1.

A8TX70 collagen type VI alpha 5 chain(COL6A5) RG Homo sapiens
P21333 filamin A(FLNA) RG Homo sapiens

I downloaded:

  • BP_FAT
  • BP_ALL
  • the functional clustering chart for BP_FAT, BP_ALL, and BP_DIRECT

BP_DIRECT had the fewest enriched processes (and they all fit in one screenshot unlike the others that could only be accurately visualized if they were downloaded):


BP_DIRECT are the annotations from the source (which I believe would be considered Uniprot) without any parent terms included.

The number of enriched processes has increased quite a bit since I added in the 0 abundance proteins to even out the protein list between silos after cluster analysis. Creating a heat map with the processes doesn’t seem like it will visualize the data correctly or easily. I’m going to see what other visualization tools downstream of gene enrichment analysis exist and if any are feasible for my data that I can try.

Kaitlyn’s notebook: Uniprot codes for enrichment

[code][sr320@mox1 jobs]$ cat #!/bin/bash...

[sr320@mox1 jobs]$ cat 
## Job Name
#SBATCH --job-name=oakl
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=00-100:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/sr320/analyses/1024

source /gscratch/srlab/programs/scripts/

find /gscratch/srlab/sr320/data/oakl/*_1.fq.gz \
| xargs basename -s _s1_R1_val_1.fq.gz | xargs -I{} /gscratch/srlab/programs/Bismark-0.19.0/bismark \
--path_to_bowtie /gscratch/srlab/programs/bowtie2-2.1.0 \
--score_min L,0,-1.2 \
-genome /gscratch/srlab/sr320/data/Cvirg-genome \
-p 28 \
-1 /gscratch/srlab/sr320/data/oakl/{}_s1_R1_val_1.fq.gz \
-2 /gscratch/srlab/sr320/data/oakl/{}_s1_R2_val_2.fq.gz \

/gscratch/srlab/programs/Bismark-0.19.0/deduplicate_bismark \
--bam -p \

/gscratch/srlab/programs/Bismark-0.19.0/bismark_methylation_extractor \
--bedGraph --counts --scaffolds \
--multicore 14 \

# Bismark processing report


#Bismark summary report


# Sort files for methylkit and IGV

find /gscratch/srlab/sr320/analyses/1024/*deduplicated.bam | \
xargs basename -s .bam | \
xargs -I{} /gscratch/srlab/programs/samtools-1.9/samtools \
sort --threads 28 /gscratch/srlab/sr320/analyses/1024/{}.bam \
-o /gscratch/srlab/sr320/analyses/1024/{}.sorted.bam

# Index sorted files for IGV
# The "-@ 16" below specifies number of CPU threads to use.

find /gscratch/srlab/sr320/analyses/1024/*.sorted.bam | \
xargs basename -s .sorted.bam | \
xargs -I{} /gscratch/srlab/programs/samtools-1.9/samtools \
index -@ 28 /gscratch/srlab/sr320/analyses/1024/{}.sorted.bam

#bismark, #sbatch