Yaamini’s Notebook: DML Analysis Part 5

Blasting for Uniprot codes

While I was gone, my blastx finished running!


Now that I have these Uniprot codes, I can theoretically get Entrez Gene IDs. I also emailed Mike Riffle, who made the gene enrichment program for Emmas geoduck paper and Rhonda’s data. Hopefully, he can make one for C. virginica that I can use instead of DAVID and topGO. In the meantime, Steven suggested I work on the gonad methylation paper, so that’s what I’ll do.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2mDeGXJ

Sam’s Notebook: RNA Cleanup – Tanner Crab RNA Pools


Grace had previously pooled a set of crab RNA in preparation for RNAseq. Yesterday, we/she concentrated the samples and then quantified them. Unfortunately, Qubit results were not good (concentrations were far below the expected 20ng/uL) and the NanoDrop1000 results yielded awful looking curves.

In an attempt to figure out what was wrong, I decided to use the RNeasy Plus Mini Kit (Qiagen) on the three pools. I did this due to the poor spec curves seen in the NanoDrop1000 measurements. Additionally, all of the RNA pools had undissolved/insoluble bits floating around in them. My thinking was that excess contaminants/salts could be interfering with the Qubit assay. Removing these could/should enlighten us as to what the issue might be.

Followed the manufacturer’s protocol for RNeasy MiniElute Cleanup Kit (as the RNeasy Plus Mini Kit uses the same reagents/columns for RNA purification) for samples with <100uL.

Samples were quantified on the RobertsLab NanoDrop1000 (ThermoFisher) and the Qubit 3.0 (ThermoFisher) using the RNA high sensitivity (HS) Kit. Used 1uL of each sample.


Qubit (Google Sheet): 20180719_qubit_RNA_crab_pools



The NanoDrop did not detect any RNA in the samples.

The Qubit did not detect any RNA in Crab Pool 1. The other two samples had similar concentrations (~7ng/uL). This would mean a total of ~84ng of RNA was present in each of those two samples.

All pools were expected to have well over 1000ng of RNA.

Will have to think about what should be done, but I would lean towards attempting to run some “test” samples through the RNeasy Cleanup kit to see if that would help get us more accurate Qubit readings? I don’t know, though…

from Sam’s Notebook https://ift.tt/2L9SJhH

Roberto’s Notebook: Gene mapping

Testing the program tophat-2.0.13, the data were mapped with the genome (downloaded from: http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_ensembl_tracks/Crassostrea_gigas.GCA_000297895.1.22.dna_sm.genome.fa.gz). But there were a problem with the GTF file. Steven looked on tophat page and there where a suggested and faster program (hisat2-2.1.0) than tophat.

The hisat program has been downloaded (at /usr/local/bioinformatics/). The support information (Pertea, M., Kim, D., Pertea, G. M., Leek, J. T., & Salzberg, S. L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nature protocols, 11(9), 1650.) revels that the GTF file is needed (and it has been downloaded as the Looking back at Tophat post said: Crassostrea_gigas.GCA_000297895.1.24.gtf). Using this file and the genome it was necessary to extract splice sites and exons.
For the moment, the “creating a HISAT2 index” is running and after, the reads will be mapped again.

Grace’s Notebook: Crab Pools and Skyline Update

Today Sam speed vac-ed the pools and put them on ice. We mixed the samples and ran Qubit. The readings were way too low. Sam got out hte Nanodrop and we ran them on that… and the readings were bad. Since I am out of town the rest of this week and half of next, Sam is going to try to fix some things, detailed below. In terms of Skyline, Emma has been in contact with Nick today and others from the Skyline crew. She is working on figuring out what the issue is with my super high error rates. At the end of today’s post, I also outline rest of July goals and summer goals.

Crab Sample Pools (GitHub issue #311)

So when Sam got in around 7am, he put the crab pool sample tubes back in the speed vac on medium. After four hours, he collected them and put them on ice. (Combined with yesterday, the samples starting at 270ul volume got down to 26-34ul volume after 6 hours in the speed vac on medium heat) I then vortexed them to try to mix the precipitate in the samples. Sam pipetted them to mix and then spun them down. I then ran Qubit on the pool tubes, making sure to avoid the pellets on the bottom (so as to not misleadingly inflate the results). The results were bad- way too low.

The UW Core facility needs the pool sample tubes to have at least 20ng/ul of RNA in a 50ul tube.
Some potential reasons for this could be that the original samples Qubit readings were inaccurate or that the RNA has degraded and since the Qubit dye only binds to non-degraded RNA, a Nanodrop could potentially tell us if there is RNA in the samples.

We ran the samples on Nanodrop. The results were not good. There was noise and no peak at 260 nm (wavelength). Here is the output file screenshot from those results. pool_master had to be run a second time because the first time didn’t work.

Next steps on Crab sample pools

Since I’m going to be out of town until next Thursday and we want to get these pools to the CORe to be sequenced ASAP (the turnaround time will be 3-4 months) Sam will thaw all my samples that were used to make the pools (as can be seen on the pool tabs in this spreadsheet: 20180702-crab-sampling-file.csv) and vortex them to mix, then re-run the Qubit on all of them to see if my original Qubit readings were innaccurate. If they were, then he would re-work the pooling volumes from each sample such that we hit at least the minimum amount of 20ng/ul RNA in a 50ul sample.

Skyline DIA

Emma and I talked a bit today in person and she said that she’ll work with and talk to the people from Skyline about my issue. She asked for the raw files from the 2015 Oysterseed experiment (owl link: here) and I think she’s going through it again with them. Hopefully by the end of next week I’ll have some real results I can work with to get this 2015 oysterseed DIA paper done and out by the end of summer. Steven made a good point that once school starts up late September, balancing all of these things will get harder, so I’d much rather get this done before then.

July goals

  • Send pools to core facility (3-4 month turnaround time)
  • Have data to work with for 2015Oysterseed project
  • Crab project master spreadsheet done

Summer goals

  • Have a good start on a Trinity and RNASeq analysis workflow for when I do get the crab data back
  • Work on and finish the 2015 Oysterseed paper (hopefully a chapter in my thesis as well)

from Grace’s Lab Notebook https://ift.tt/2NZtVGN

Looking back at Tophat

RNA-Seq: Tophat via iPlant

GTF available at

Though likely an updated version @ http://metazoa.ensembl.org/Crassostrea_gigas/Info/Index

Laura’s Notebook: July 18 2018, O. angasi conditioning

Question: does temp/food conditioning method used on O. lurida work on Ostrea angasi?

No standard operation procedure is in place to get O. angasi to reproduce in the hatchery. This makes for difficult production. There is interest to see whether a highly controlled temperature/feeding regime can successfully increase their gonad condition, which could allow for induced, synchronized spawning. If I’m lucky some of them will spawn during the conditioning phase.

Oysters arrive

On June 20th adult oysters (age ~3-5 years) from Port Stephens and Merimbula (northern and southern extent of New South Wales) arrived into the hatchery. They were acclimated to hatchery conditions in the same tank for 5 days. 25 oysters from each group were peeled off for the heat shock experiment, 20 of which were sampled for time=0 gonad condition (sampling #1).

Experimental design

Three temperatures are being tested: 18C (control), 21C, and 24C, 4 tanks per temperature, 2 per oyster source. In each tank is 12 oysters, and each tank has a separate programmable heater. Beginning June 26 temperatures were slowly raised to the treatment temps. The tanks are 200L with static filtered seawater (filtered to 1um), aerated, and water changed every other day. Oysters are suspended in mesh bags on the side of the tanks and fed ad libitum a ~50%-50% mix of diatom (C. muelleri) and flagellate (Tiso or Pav) – food rations are being recorded, and has been ~10-20L per tank per day, from algae that is 1-2M cells/mL.

Daily maintenance

On water change days oysters are removed from tank, rinsed with fresh water, checked for mortality, and held for ~1-2 hours out of tank during cleaning. This exposure is deliberate, as I have noticed that O. lurida frequently spawned after cleanings. I am using 2 tanks per oyster bag, so that I can fill new tanks 1-2 days prior to the water change, allow the water to warm to room temperature, and maintain a consistent temperature for the oysters. The old water is drained over a 80um-100um screen to check for released larvae or eggs.

Sampling 2: conditioning for 2 weeks at temp

On July 16th, 3 weeks after beginning the conditioning trial and 2 full weeks minimum of treatment temp, I sampled half the oysters from all tanks (n=6 per tank, n=12 per source, n=24 per temperature). Whole weight after shucking/draining, and shell weight were collected to estimate condition index. Tissues sampled included:

  • -80 & RNAlater: gill, mantle, gonad
  • fixed for histology: gonad
  • -80 only: gut/gonad complex

Oysters were also imaged prior to sampling and checked for brooded larvae/embroys. 1 oyster was brooding embryos – Merimbula 24C (tank 1).

Sampling #3 will occur on July 30th – 2 weeks after sampling #2 and 4 full weeks at temp.

from The Shell Game https://ift.tt/2uFmPhT

Grace’s Notebook: Speed Vac Pt. 1 + Testing Advancing Peak Picking in Skyline

CRAB RNA POOLS: Today I started using the Speed Vac on the three pools with Sam. They were running on low temperature from 10:30am to 1:15pm. Not much liquid had evaporated. From 1:15pm-3:15pm they were run on medium temperature. Still not enough liquid has evaporated, so Sam will put them in the Speed Vac when he gets to FTR tomorrow morning. SKYLINE: I tested out the tutorial that the people from Skyline suggested I try out. Not sure how it turned out honestly, but Emma said she can take a look at it with me tomorrow.

Speed Vac

Started at 10:30am on low temperature (room temperature)


Checked at 1:15pm. Still too much volume. Increased temp to medium


Checked on it at 3:15pm. Still too much volume

Sam offered to put it back in the Speed Vac at medium temperature tomorrow morning when he gets in to FTR. It will hopefully be done by lab meeting. Because there is some precipate in the tubes (precipitate could be salts from the RNA isolation process or un-dissolved RNA), we will mix and Qubit the pooled tubes to get an accurate RNA concentration reading and then adjust the volume for the pools if need be. The CORE facility needs “RNA normalized to a minimum of 20ng/uL with a total volume of 50uL.” (GitHub Issue #184)

2015 Oysterseed Project: Skyline trouble-shooting

My error rates from Step 5b of the DIA Protocol were around or above 50% every time I tried it. Emma prompted me to contact the Skyline Support with my issue.

I wrote on their support page and they responded that I might try their Advanced Peak Picking in Skyline. I tried out their tutorial today with their example files. They don’t have very detailed instructions in the DIA section.

Emma offered to go through it with me either tomorrow after Sam’s birthday lunch or next week when I come back from my CA trip.

from Grace’s Lab Notebook https://ift.tt/2JC5zja

Grace’s Notebook: Making Pools for RNASeq, and working on an ultimate MASTER crab spreadsheet

This post is a summation of what I did this week. I made the 3 pools for RNASeq (speed vac will happen Tuesday when both Sam and I are in), and am starting to create an ultimate master crab spreadsheet with ALL the data that we have on this project.

Sample Pools for RNASeq

Made 3 pools as detailed in July 2nd’s post.

Pools 1 and 2 were made by thawing all the tubes on ice. Vortex each tube 5s. Pipette out 15uL into a 1.7ml snap cap tube. These two pools have equal volume from each sample, but not equal RNA from each sample contributing to the pool.

The third pool, “MasterPool” as it is preliminarily called, was made by calculating how many ul of sample should be put into the pool such that each sample contributed 200ng of RNA to the pool. The pool was created in the same way as pools 1 and 2, except there were different volumes from each of the ten samples.

Pool #1

Pool #2

MasterPool imgimg You may notice that the tube numbers are different from my post from July 2nd. That is because on that day, I accidentally grabbed the tube numbers from the RNAday12 columne instead of the RNAday26. Pam brought this to my attention and I revised and reviewed numerous times. Also, I rounded the pipette volumes to the nearest 0.1 ul.

ULTIMATE master crab spreadsheet

Work in progress master csv: 20180713-crab-master-true.csv
I added some R script to the Rproj script to begin the process of creating a crab master spreadsheet with alllll the data we have. It is getting weird because there are a lot of repeat columns, so I will pull out the extra repeats. Also, I need to figure out how to add FRP codes to the crab data in the 20180125-Crab-Collection-DATA_DNA-plates.xlsx. There are only tag numbers lsted. I’m thinking this can be done in R by cross-referencing other spreadsheets where the tags numbers and FRPs are listed.

from Grace’s Lab Notebook https://ift.tt/2Jl7w3h

Hi everybody!

I am glad to be part of this group.

Well, Working on trinity (trinityrna-2.2.0) specifically in abundance estimation of sequences using RSEM package (Before to run differential expression analyses). I had two files called RSEM.isoforms.results and RSEM.genes.results. Both of them are matrices with values of length, effective length of genes, expected count, TPM (transcripts per million) and FPKM (fragments per kilo base million). The TPM and FPKM values are used by the script abundance_estimate_to_matrix.pl to estimate the matrices for differential expression, but, there is a problem generating them using the RSEM.genes.results. The part of the script

Screen Shot 2018-07-12 at 2.55.36 PM

At line 242 and 249, writes “NA” for absent gene IDs and this makes an error to create a matrix used for differential expression analyses because the script just recognize numeric values.

Screen Shot 2018-07-12 at 2.30.17 PM

Screen Shot 2018-07-11 at 2.02.48 PM

But rewriting the script changing NA by 0 helps to create the matrices but differential expression analyses had different results than using RSEM.isoforms.results.

Screen Shot 2018-07-12 at 2.35.33 PM

I can’t see the sense for this part. I mean why should be NA instead of 0? even if it needs numeric values and does this affects the differences using genes.results (where I found 10 differentially expressed genes) and isoforms.results (where I found 384 differentially expressed transcripts)? Now I am trying to have the annotation for those 10 genes and compare their ontology with the isoforms annotation.

Sam’s Notebook: Mox – Olympia oyster genome annotation progress (using Maker 2.31.10)


TL;DR – It appears to be continuing where it left off!

I decided to spend some time to figure out what was actually happening, as it’s clear that the annotation process is going to need some additional time to run and may span an additional monthly maintenance shutdown.

This is great, because, otherwise, this will take an eternity to actually complete (particularly because we’d have to move the job to run on one of our lab’s computers – which pale in comparison to the specs of our Mox nodes).

However, it’s a bit shocking that this is taking this long, even on a Mox node!

I started annotating the Olympia oyster genome on 20180529. Since then, the job has been interrupted twice by monthly Mox maintenance (which happens on the 2nd Tuesday of each month). Additionally, when this happens, the SLURM output file is overwritten, making it difficult to assess whether or not Maker continues where it left off or if it’s starting over from scratch.

Anyway, here’s how I deduced that the program is continuing where it left off.

  1. I figured out that it produces a generic feature format (GFF) file for each contig.
  2. Decided to search for the first contig GFF and look at it’s last modified date. This would tell me if it was newly generated (i.e. on the date that the job was restarted after the maintenance shutdown) or if it was old. Additionally, if there were more than one of these files, then I’d also know that Maker was just starting at the beginning and writing data to a different location.


    This shows:

    1. Only one copy of Contig0.gff exists.
    2. Last modified date is 20180530.
  3. Check the slurm output file for info.


    This reveals this important piece of info:

    MAKER WARNING: The file 20180529_oly_annotation_01.maker.output/20180529_oly_annotation_01_datastore/AC/68/Contig215522//theVoid.Contig215522/0/Contig215522.0.all.rb.out
    did not finish on the last run

All of these taken together lead me to confidently conclude that Maker is not restarting from the beginning and is, indeed, continuing where it left off. WHEW!

Just for kicks, I also ran a count of GFF files to see where this stands so far:


Wow! 622,010 GFFs!!!

Finally, for posterity, here’s the SLURM script I used to submit this job, back in May! I’ll have all of the corresponding genome files, proteome files, transcriptome files, etc. on one of our servers once the job completes.

  #!/bin/bash ## Job Name #SBATCH --job-name=20180529_oly_maker_genome_annotation ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=30-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/srlab/sam/outputs/20180529_oly_maker_genome_annotation ## Establish variables for more readable code ### Path to Maker executable maker=/gscratch/srlab/programs/maker-2.31.10/bin/maker ### Path to Olympia oyster genome FastA file oly_genome=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/jelly.out.fasta ### Path to Olympia oyster transcriptome FastA file oly_transcriptome=/gscratch/srlab/sam/data/O_lurida/oly_transcriptome_assemblies/Olurida_transcriptome_v3.fasta ### Path to Crassotrea gigas NCBI protein FastA gigas_proteome=/gscratch/srlab/sam/data/C_gigas/gigas_ncbi_protein/GCA_000297895.1_oyster_v9_protein.faa ### Path to Crassostrea virginica NCBI protein FastA virginica_proteome=/gscratch/srlab/sam/data/C_virginica/virginica_ncbi_protein/GCF_002022765.2_C_virginica-3.0_protein.faa ## Create Maker control files needed for running Maker $maker -CTL ## Store path to options control file maker_opts_file=./maker_opts.ctl ## Create combined proteome FastA file touch gigas_virginica_ncbi_proteomes.fasta cat "$gigas_proteome" >> gigas_virginica_ncbi_proteomes.fasta cat "$virginica_proteome" >> gigas_virginica_ncbi_proteomes.fasta ## Edit options file ### Set paths to O.lurida genome and transcriptome. ### Set path to combined C. gigas and C.virginica proteomes. ## The use of the % symbol sets the delimiter sed uses for arguments. ## Normally, the delimiter that most examples use is a slash "/". ## But, we need to expand the variables into a full path with slashes, which screws up sed. ## Thus, the use of % symbol instead (it could be any character that is NOT present in the expanded variable; doesn't have to be "%"). sed -i "/^genome=/ s% %$oly_genome %" "$maker_opts_file" sed -i "/^est=/ s% %$oly_transcriptome %" "$maker_opts_file" sed -i "/^protein=/ s% %$gigas_virginica_ncbi_proteomes %" "$maker_opts_file" ## Run Maker ### Set basename of files and specify number of CPUs to use $maker \ -base 20180529_oly_annotation_01 \ -cpus 24  

from Sam’s Notebook https://ift.tt/2KMfc45