Hi everybody!

I am glad to be part of this group.

Well, Working on trinity (trinityrna-2.2.0) specifically in abundance estimation of sequences using RSEM package (Before to run differential expression analyses). I had two files called RSEM.isoforms.results and RSEM.genes.results. Both of them are matrices with values of length, effective length of genes, expected count, TPM (transcripts per million) and FPKM (fragments per kilo base million). The TPM and FPKM values are used by the script abundance_estimate_to_matrix.pl to estimate the matrices for differential expression, but, there is a problem generating them using the RSEM.genes.results. The part of the script

Screen Shot 2018-07-12 at 2.55.36 PM

At line 242 and 249, writes “NA” for absent gene IDs and this makes an error to create a matrix used for differential expression analyses because the script just recognize numeric values.

Screen Shot 2018-07-12 at 2.30.17 PM

Screen Shot 2018-07-11 at 2.02.48 PM

But rewriting the script changing NA by 0 helps to create the matrices but differential expression analyses had different results than using RSEM.isoforms.results.

Screen Shot 2018-07-12 at 2.35.33 PM

I can’t see the sense for this part. I mean why should be NA instead of 0? even if it needs numeric values and does this affects the differences using genes.results (where I found 10 differentially expressed genes) and isoforms.results (where I found 384 differentially expressed transcripts)? Now I am trying to have the annotation for those 10 genes and compare their ontology with the isoforms annotation.

Sam’s Notebook: Mox – Olympia oyster genome annotation progress (using Maker 2.31.10)


TL;DR – It appears to be continuing where it left off!

I decided to spend some time to figure out what was actually happening, as it’s clear that the annotation process is going to need some additional time to run and may span an additional monthly maintenance shutdown.

This is great, because, otherwise, this will take an eternity to actually complete (particularly because we’d have to move the job to run on one of our lab’s computers – which pale in comparison to the specs of our Mox nodes).

However, it’s a bit shocking that this is taking this long, even on a Mox node!

I started annotating the Olympia oyster genome on 20180529. Since then, the job has been interrupted twice by monthly Mox maintenance (which happens on the 2nd Tuesday of each month). Additionally, when this happens, the SLURM output file is overwritten, making it difficult to assess whether or not Maker continues where it left off or if it’s starting over from scratch.

Anyway, here’s how I deduced that the program is continuing where it left off.

  1. I figured out that it produces a generic feature format (GFF) file for each contig.
  2. Decided to search for the first contig GFF and look at it’s last modified date. This would tell me if it was newly generated (i.e. on the date that the job was restarted after the maintenance shutdown) or if it was old. Additionally, if there were more than one of these files, then I’d also know that Maker was just starting at the beginning and writing data to a different location.


    This shows:

    1. Only one copy of Contig0.gff exists.
    2. Last modified date is 20180530.
  3. Check the slurm output file for info.


    This reveals this important piece of info:

    MAKER WARNING: The file 20180529_oly_annotation_01.maker.output/20180529_oly_annotation_01_datastore/AC/68/Contig215522//theVoid.Contig215522/0/Contig215522.0.all.rb.out
    did not finish on the last run

All of these taken together lead me to confidently conclude that Maker is not restarting from the beginning and is, indeed, continuing where it left off. WHEW!

Just for kicks, I also ran a count of GFF files to see where this stands so far:


Wow! 622,010 GFFs!!!

Finally, for posterity, here’s the SLURM script I used to submit this job, back in May! I’ll have all of the corresponding genome files, proteome files, transcriptome files, etc. on one of our servers once the job completes.

  #!/bin/bash ## Job Name #SBATCH --job-name=20180529_oly_maker_genome_annotation ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=30-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/srlab/sam/outputs/20180529_oly_maker_genome_annotation ## Establish variables for more readable code ### Path to Maker executable maker=/gscratch/srlab/programs/maker-2.31.10/bin/maker ### Path to Olympia oyster genome FastA file oly_genome=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/jelly.out.fasta ### Path to Olympia oyster transcriptome FastA file oly_transcriptome=/gscratch/srlab/sam/data/O_lurida/oly_transcriptome_assemblies/Olurida_transcriptome_v3.fasta ### Path to Crassotrea gigas NCBI protein FastA gigas_proteome=/gscratch/srlab/sam/data/C_gigas/gigas_ncbi_protein/GCA_000297895.1_oyster_v9_protein.faa ### Path to Crassostrea virginica NCBI protein FastA virginica_proteome=/gscratch/srlab/sam/data/C_virginica/virginica_ncbi_protein/GCF_002022765.2_C_virginica-3.0_protein.faa ## Create Maker control files needed for running Maker $maker -CTL ## Store path to options control file maker_opts_file=./maker_opts.ctl ## Create combined proteome FastA file touch gigas_virginica_ncbi_proteomes.fasta cat "$gigas_proteome" >> gigas_virginica_ncbi_proteomes.fasta cat "$virginica_proteome" >> gigas_virginica_ncbi_proteomes.fasta ## Edit options file ### Set paths to O.lurida genome and transcriptome. ### Set path to combined C. gigas and C.virginica proteomes. ## The use of the % symbol sets the delimiter sed uses for arguments. ## Normally, the delimiter that most examples use is a slash "/". ## But, we need to expand the variables into a full path with slashes, which screws up sed. ## Thus, the use of % symbol instead (it could be any character that is NOT present in the expanded variable; doesn't have to be "%"). sed -i "/^genome=/ s% %$oly_genome %" "$maker_opts_file" sed -i "/^est=/ s% %$oly_transcriptome %" "$maker_opts_file" sed -i "/^protein=/ s% %$gigas_virginica_ncbi_proteomes %" "$maker_opts_file" ## Run Maker ### Set basename of files and specify number of CPUs to use $maker \ -base 20180529_oly_annotation_01 \ -cpus 24  

from Sam’s Notebook https://ift.tt/2KMfc45

Sam’s Notebook: Mox – Password-less SSH!


The high performance computing (HPC) cluster (called Mox) at Univ. of Washington (UW) frustratingly requires a password when SSH-ing, even when SSH keys are in use. I have a lengthy, unintelligable password that I use for my UW account, so having to type this in any time I want to initiate a new SSH session on Mox is a painful process.

Today, I finally got fed up with how much time I was wasting (granted, it’s minor in the grand scheme of my day) just logging in to Mox, so I spent some time figuring out how to automate password entry for a new SSH session with Mox.

I tried to handle this using the program sshpass, but I couldn’t get it to read my password from a file – it would just hang in limbo after executing the command.

In the end, I came across a bash script that does this perfectly. Steps to implement this on Ubuntu 16.04 LTS:

  1. Install expect:
    sudo apt install expect
  2. Create following script (taken from this [StackExchange solution])(https://ift.tt/2LaGayC
     #!/usr/bin/expect spawn ssh mox expect "Password:" send "\r" interact 


    • I have an ~/.ssh/config file that allows me to use “mox” as an alias for my full SSH command
    • Replace with your own UW password.
  3. Change access to script (set read, write, execute for user only):
    chmod u=rwx,go-rwx
  4. Run script from home directory (saved in home directory):

Boom! No having to track down password, copy, and paste!

from Sam’s Notebook https://ift.tt/2L49Tw0

Sam’s Notebook: Ubuntu – Fix “No Video Signal” Issue on Emu/Roadrunner


Both Apple Xserves (Emu/Roadrunner) running Ubuntu (16.04LTS) experienced the same issue – the monitor would indicate “No Video Signal”, would go dark, and wasn’t responsive to keyboard/mouse movements. However, you could ssh into both machines w/o issue.

Although having these machines be “headless” (i.e. with no display) is usually fine, it’s not ideal for a couple of reasons:

  1. Difficult to use for other lab members who aren’t as familiar with SSH – specifically if they would want to use a Jupyter Notebook remotely (this would require setting up a tunnel to their own computer).
  2. Can’t use Remmina Remote Desktop until a user has physically logged in from the Ubuntu login screen at least once, in order to launch Remmina.

The second aspect was the major impetus in me finally being motivated to deal with this. Accessing these computers via remote desktop is much easier to manage long-running Jupyter Notebooks instead of relying on an SSH tunnel. The tunnel greatly limits my access to the Jupyter Notebook outside of the computer that has the tunnel set up.

Well, this led me down a horrible rabbit hole of Linux stuff that I won’t get fully in to (particularly, since I didn’t understand most of it and can’t remember all the crazy stuff I read/tried).

However, here’s the gist:

  1. Needed to edit /etc/default/grub
  2. After editing, needed to update grub config file: sudo update-grub

Despite the fact that both machines are (or, should be) identical, I did not get the same results. The edits I made to the /etc/default/grub file on Emu worked immediately. The edits were:

  1. Add nomodeset to this (this is the edited line) line (this seemed to be the most common suggestion for fixing the “No Video Signal” issue):

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"

  1. Comment out this line (this line was triggering an error/warning about writing the config file when running the update-grub command):


For some reason, Roadrunner did not take kindly to those changes and it took a long time to resolve, ending with changing permissions on ~/.Xauthority back to their original permissions (they got altered when I ran some command – sudo startx or something) to get out of a login loop.

Regardless, both are fixed, both can be used when physically sitting at the computer, and both can be accessed remotely using Remmina!

from Sam’s Notebook https://ift.tt/2KTjtBT

[code]-> Done reading data waiting...

-> Done reading data waiting for calculations to finish
	-> Done waiting for threads
	-> Output filenames:
	-> Fri Jul  6 23:37:52 2018
	-> Arguments and parameters for all analysis are located in .arg file
	-> Total number of sites analyzed: 786914862
	-> Number of sites retained after filetering: 696 
	[ALL done] cpu-time used =  240010.84 sec
	[ALL done] walltime used =  128473.00 sec

Grace’s Notebook: Notes from Crab Meeting

Today we had our 4th crab meeting and discussed our short-term sequencing plan for 3 libraries (1: day 9 uninfected; 2: day 9 infected; and 3:a “masterpool” from the reamining 10 treatments). We also discussed our plan going forward with qPCR and creating libraries later and we hopefully will see some cool things with the three current chosen libraries and the qPCR.

I have the meeting recorded, just have to edit and publish it as DecaPod S1E10.

We are going to go ahead with the three libraries proposed in my previous notebook post. Sam is going to check around at the UW CORE facilities available to us. We have to use a UW facility due to budget restrictions (Pam would have to re-negotiate if we wanted to use something else, like Genewiz).

I (with Sam’s assistance and insight) will create the pooled samples such that we will have a tube for each library (3). We will then use the Speed Vac to evaporate off some liquid in order to get a specific concentration (TBD- depends on what the CORE facility requires).

Sequencing takes some time, so while we are waiting for the results to come back, I will compile databases of genomic resources. Namely, find fasta files for those closest related to Chionoecetes bairdi and Hematodinium spp. and create databses. I will also practice using Trinity and BLAST with some geoduck data that we have already so that once the RNAseq data from our crabs comes back, we’ll already have a good idea on how to execute the bioinformatic pipeline.

Once we analyze the data and pick out some genes, we will make primers and use qPCR on individuals. If we see anything that we’d like to look at more closely, we can create new libraries of individuals or pools. I also may extract more RNA since I currently have only extracted RNA from <~50% of the surviving crabs (113 crabs survived the experiment and I have extracted RNA from 51 of them).

I will be learning a lot this summer and I am really excited! Reading and praciticing occasionally doesn’t stick with me as much as actually doing things, so practicing with real data will be very helpful. Pam is also interested in learning more as well, so working with her and potentially teaching her what I learn will further enrich my understanding. Looking forward to it!

from Grace’s Lab Notebook https://ift.tt/2u7HsTz

[code]#!/bin/bash ## Job Name #SBATCH...

## Job Name
#SBATCH --job-name=angsd-05
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=10-100:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/sr320/analyses/0705

source /gscratch/srlab/programs/scripts/paths.sh

/gscratch/srlab/sr320/programs/angsd/angsd \
-bam /gscratch/srlab/sr320/data/cw/all_bam.bamlist \
-out Genotypes_parentage \
-GL 1 \
-doMaf 1 \
-doMajorMinor 1 \
-minMaf 0.25 \
-SNP_pval 1e-6 \
-minInd 525 \
-minQ 20 \
-P 28 \
-doGeno 2 \
-doPost 1 \
-postCutoff 0.95 \
-doCounts 1 \
-geno_minDepth 7