RNA Extraction Wrap-Up for Desiccation + Elevated Temp. Samples (11/8)

On 11/8, I finished up the RNA extraction for the final 16 samples of the desiccation + elevated temp. exposure (D05, D06, D07, D08, D15, D16, D17, D18, T05, T06, T07, T08, T15, T17, T18). I also ran the Qubit assay for 24 samples (D03, D04, D05, D06, D07, D08, D13, D14, D15, D16, D17, D18, T03, T04, T05, T06, T07, T08, T13, T14, T15, T16, T17, T18). 4 samples (D06, T05, T15, T17) had RNA concentrations above the limit of quantification, so I will have to dilute those samples and re-run the Qubit assay. Described below is the protocol for both the RNA extraction wrap-up and the Qubit assay:

RNA Extraction Wrap-Up: 

  1. Stored RNA pellets (suspended in ethanol) were taken out from the -80 freezer and left to thaw for around 10 minutes.
  2. Supernatant was removed from all samples and 400 μL of 75% ethanol was subsequently added to each sample.
  3. Each sample was then centrifuged for 5 minutes at 1200 g.
  4. Supernatant was once again removed from each sample and each sample was then microcentrifuged for approximately 10 seconds so that any residual ethanol could be removed.
  5. 50 μL of DEPC water was added to each sample and samples were then vortexed to dissolve the RNA pellet. One sample’s (D11) RNA pellet did not fully dissolve, so an additional 50 μL of DEPC water was added and pellet was manually broken up by vigorously pipetting.

RNA Quantification (Qubit)

  1. 3980 μL of Qubit buffer and 20 μL of Qubit dye were added to a tube to create a 200:1 ratio between buffer and dye.
  2. 198 μL of the mastermix and 2 μL of the RNA samples were added to each of 16 Qubit tubes (1 tube for each sample).
  3. 2 standardization tubes were also set up. 190 μL of mastermix and 10 μL of Qubit standard #1/2 were added to 2 Qubit tubes.


Grace’s Notebook: RNeasy Test Bioanalyzer Results

Today I ran the two samples from Day 26 that I extracted using RNeasy Kit on the Bioanalzyer. The results look pretty good and we can now make a library and see how it goes.




They don’t have the same band pattern as Sam’s samples, possibly due to degradataion, but they still look good.

These samples were from Oct. 31st.

from Grace’s Lab Notebook https://ift.tt/2FweCoW

[code]#SBATCH --workdir=/gscratch/srlab/sr320/analyses/1105 /gscratch/srlab/programs/ncbi-blast-2.6.0+/bin/blastn -task...

#SBATCH --workdir=/gscratch/srlab/sr320/analyses/1105

/gscratch/srlab/programs/ncbi-blast-2.6.0+/bin/blastn \
-task blastn \
-query /gscratch/srlab/sr320/data/oly/Trinity.fasta \
-db /gscratch/srlab/sr320/data/oly/Olurida_v081 \
-out /gscratch/srlab/sr320/analyses/1105/ks-trinity-v081.tab \
-evalue 1e-20 \
-outfmt 6 \
-num_threads 28

Mapping quant-seq to genome

[sr320@mox2 jobs]$ cat 1117_1500.sh 
## Job Name
#SBATCH --job-name=ls-bow
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=5-100:00:00
## Memory per node
#SBATCH --mem=500G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/sr320/analyses/1117/

source /gscratch/srlab/programs/scripts/paths.sh

find /gscratch/srlab/sr320/data/ls-tag/*.gz | xargs basename -s L006_R1_001.fastq.gz | xargs -I{} bowtie2 \
-x /gscratch/srlab/sr320/data/oly/Olurida_v081.bowtie-index \
-U /gscratch/srlab/sr320/data/ls-tag/{}L006_R1_001.fastq.gz \
-p 28 \
-q \
-S /gscratch/srlab/sr320/analyses/1117/{}_01_bowtie2.sam


Yaamini’s Notebook: DML Analysis Part 18

Flanking analysis for mRNA coding regions

In this notebook, I used bedtools (flankBed and closestBed) to conduct a flanking analysis. I intially thought that each method would give me the same result, but I now think that these analyses provide me with two slightly different results.

I will use flankBed to add 1000 bp regions to each mRNA coding region. I can then intersect flanks with various genomic feature files. This gives me a broad idea of what DML or DMR could potentially regulate expression of an mRNA coding region no more than 1000 bp away. I will use closestBed to find the closest non-overlapping DML or DMR to each mRNA coding region. This is zooming in a little closer to the mRNA coding region, but looking for DML or DMR that could be in a nearby promoter sequence that could regulate an mRNA coding region. All of the output I generated can be found in this folder.


First, I used the following code to generate 1000 bp flanks up- and downstream of an mRNA coding region.

 ! {bedtoolsDirectory}flankBed \ #Path to flankBed -i {mRNAList} \ #Path to mRNA GFF -g 2018-11-14-Flanking-Analysis/2018-11-14-bedtools-Chromosome-Length.txt \ #Path to a list of start and stop positions for each chromosome. I pulled the data off of NCBI. The file can be found [here](https://github.com/fish546-2018/yaamini-virginica/blob/master/analyses/2018-11-01-DML-and-DMR-Analysis/2018-11-14-Flanking-Analysis/2018-11-14-bedtools-Chromosome-Length.txt) -b 1000 \ #Length of flank to add > 2018-11-14-Flanking-Analysis/2018-11-14-mRNA-1000bp-Flanks.bed #Redirect output to a new file  

Here’s where I ran into some confusion. For the longest time, I thought my output was a repeat of my mRNA GFF, but with altered start and stop positions such that they incorporate the 1000 bp up- and downstream flanks. I posted this issue to find a way to manipulate files and isolate up- and downstream flanks in different BEDfiles. When I used the code Sam suggested, I ended up with some funky results. Upon closer inspection of the two files…


…I FOUND THAT bedtools flank ALREADY SEPARATES THE FLANKS OUT FOR YOU. I swear I think I spent atleast a few weeks operating under this wrong assumption. At least I’m now more proficient in some bash wizardry.

Although I did get just the flanks in a new document, I still wanted to separate up- and downstream flanks into separate files. This way, I could easily figure out where an overlapping region was with respect to the mRNA coding region in question. To do this, I used the following command:

 awk '{ if (NR%2) print > "2018-11-14-Flanking-Analysis/2018-11-15-mRNA-Upstream-Flanks.bed"; \ #If the row number is odd, redirect the row to the upstream flank file else print > "2018-11-14-Flanking-Analysis/2018-11-15-mRNA-Downstream-Flanks.bed" }' \ #If the row number is even, redirect it to the downstream flank file 2018-11-14-Flanking-Analysis/2018-11-14-mRNA-1000bp-Flanks.bed #Path to the input file of flanking regions  

Once I separated up- and downstream flanks, I used intersectBed to find overlaps between the flanks and DML, DMR, and CG motifs. The CG motifs serve as a “background” for the other two. Upstream of mRNA coding regions, there were 95 overlaps for DML and 12 for DMR. Downstream of mRNA coding regions, there were 124 DML overlaps and 18 DMR overlaps.


I used the following code for closestBed to identify the closest nonoverlapping genomic feature to an mRNA coding region:

 ! {bedtoolsDirectory}closestBed \ #Path to program -io \ #Ignore overlapping features -a {mRNAList} \ #Path to mRNA GFF -b {} \ #DMLList, #DMRList, or #CG motifs -t all \ #In the case of a tie, report all matches -D ref \ #Report distance to A in an extra column. Use negative distances to report upstream features with respect to the reference genome. B features with a lower (start, stop) are upstream > 2018-11-14-Flanking-Analysis/2018-11-14-mRNA-Closest-NoOverlap-DMLs.txt #Redirect output to a new file  

While I got an output file each time I ran closestBed, I also got the following error each run:

 Error: Sorted input specified, but the file C_virginica-3.0_Gnomon_mRNA.gff3 has the following out of order record NC_035780.1 Gnomon mRNA 2413594 2416601 . - . ID=rna199;Parent=gene122;Dbxref=GeneID:111129373,Genbank:XM_022475729.1;Name=XM_022475729.1;gbkey=mRNA;gene=LOC111129373;model_evidence=Supporting evidence includes similarity to: 2 Proteins;product=mucin-2-like;transcript_id=XM_022475729.1  

I still got an output file, so I kept going…? May have to change that.

Going forward

  1. Update flanking analysis methods
  2. Update flanking analysis results
  3. Perform a gene enrichment for DML, DMR, and flanking analysis output

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2QM94Ym

Shelly’s Notebook:

FFAR meeting

  • Can’t untangle effects from high/low pH or static/fluctuating, so include additional fluctuating low pH treatment with new geoduck
    • Hollie has a code for this
  • Why is the oxygen low on Nov 10? What is the saturation of water at 9-10C and salinity @ 29 PSU ?
  • continuous feeding:
    • bringing peristatic pump
    • there are also programmable pumps

To-do on Friday Nov 16

  1. water chemistry
    • discrete measurements
      • pH tris curve
    • titrator
      • pH cal
      • CRM
      • water samples
  2. hemolymph samples from all animals
    • supplies needed:
      • liquid nitrogen carrier
      • at least 150 microfuge colored LABELLED tubes replace (need to order more)
      • needles
      • syringes
      • centrifuge (can we set the freezer to 4C and put it in there?)
      • styrofoam containers for carrying samples on ice to centrifuge (ice packs in freezer in garage)
      • p1000 and tips for transferring lymph
  3. labels for animals (can do this while taking hemolymph)
    • maybe just mark with paint pen for now?
  4. automate data download
    • enable remote access to router
    • test data download from dry lab building
  5. calibrate temp probes
  6. Set up tanks 5 and 6 with gas
    • need to order 4 more pumps
    • do we have probes for these?
  7. Think about respirometry
    • Presens is expensive, and preferred to stay in the dry lab
    • carting water (for containers and water bath) is lots of labor
    • need a solution

Need to order:

Yaamini’s Notebook: MEPS Revisions Part 1

Revisiting NMDS for protein abundance data

According to Github it’s been exactly a year since I last touched my SRM data :0 But now I’m back and remaking NMDS plots! For the MEPS revision, reviewers have asked me to complete an analysis of the environmental data, and include the environmental data when looking at differential protein abundance. To do this, I’m going to conduct a constrained ordination analysis, in which I examine my protein abundance data, while constraining it for environmental variables in each site and habitat. Julian suggested I revisit my protein abundance data before I proceed with the constrained analysis, so here I am.

In this R Script (side note: can you believe I ever used plain R and not R Markdown?! I cannot wait to convert this script into an R Markdown file) I took my protein abundance data and first transformed it using a Hellinger’s transformation. The idea here is to control for any rare peptides or high instances of 0s by downweighting them in the analysis. Then, I used a euclidean distance matrix on the transformed data. Finally, I ordinated the data with an NMDS. I performed either a one- or two-way ANOSIM to assess the significance of groupings

Site and Habitat

For this ANOSIM, the R statistic was 0.088, meaning that within group and between group similarities were equal (p = 0.073). There is no evidence to refute the null hypothesis that site and habitat groupings exist.

Habitat only

screen shot 2018-11-14 at 4 30 56 pm

Figure 1. Ordination of protein abundance data looking solely at habitat type. Confidence ellipses are used to demonstrate overlaps or segregation of data.

Once again, within- and between-group similarities between bare and eelgrass habitats were similar (R = 0.044, p = 0.122).

Site only

screen shot 2018-11-14 at 4 30 47 pm

screen shot 2018-11-14 at 4 31 32 pm

Figures 2-3. Ordination of protein abundance data looking solely at site designation. Oyster sample IDs for each site are either bounded by a polygon (top), or confidence ellipses are used to demonstrate overlapping nature of data (bottom). Both the polygon and confidence ellipse for Willapa Bay are slightly segregated from the other four sites.

Within- and between-group similarities between all five sites were similar (R = 0.064, p = 0.065). However, when I repeated the ANOSIM with region designation (Puget Sound vs. Willapa Bay), there was mild evidence for regional differences (R = 0.226, p = 0.053).


I correlated the NMDS scores with the original data matrix to obtain loadings, then plotted only those that were significant at the 0.001 level. Turns out that’s a lot of peptides.

screen shot 2018-11-14 at 4 30 39 pm

Figure 4. Peptide loadings significant at the 0.001 level.

I’ll need to revise the significance criteria to see if I can plot less loadings up there.

Going forward

  1. Complete ordination analysis of environmental data
  2. Meet with Julian to discuss unconstrained ordination results
  3. Plan for constrained ordination approach

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2qNxJjZ

Grace’s Notebook: Submitted GSS poster, and my goals for Th and F

Today at 4:25pm, I submitted my poster to be printed for GSS, which is tomorrow… Steven helped a lot with picking out what information would be interesting to include. My goals for Thursday and Friday include both the Crab Project and the 2015 Oysterseed Project.

GSS Poster

Google slides link to poster: here


I made the pie chart in excel really quick with this file: Blastquery-GOslim-sep.csv, which is the output file with columns tab delimited using R from this python notebook: 11052018-C_bairdi-blastn.ipynb.

To make a poster, you can use google slides and set the dimensions to 48in w x 36 in h (File > Page Setup > Custom > adjust dimensions.

Then, you export the slide as a PDF, and send it to UW Creative Commons.

Goals for the rest of the week:

2015 Oysterseed:

  • mprophet model in Skyline (notes from Emma)

Crab Project:

  • Make extraction plan for other libraries (get input from Sam and Steven)
  • Extractions
  • R script for adding new Qubit data

from Grace’s Lab Notebook https://ift.tt/2FmrUEd

Shelly’s Notebook: Wed. Nov 14, 2018 Meeting with Kaitlyn about Oyster seed Proteomics paper

Proteomics manuscript next steps:

  1. compare ASCA proteins with uniquely clustered proteins (Kaitlyn will look into this)
    • what do their abundances look like?
      • make raw abundance line plots facetted by protein
    • does the GO enrichment change when ASCA and cluster proteins are combined?
  2. does hierarchical clustering change when day 0 is removed? (Kaitlyn will try this)
  3. begin identifying and locating data files that we need to deposit in public protein repository (i.e. ProteomeXchange and PeptideAtlas) (I will start this)
    • see what Emma’s paper (and supplementary materials) included: Raw data can be accessed via ProteomeXchange under identifier PXD004921. Raw data can be accessed in the PeptideAtlas under accession PASS00943 and PASS00942. Code used to perform enrichment analysis is available in a corresponding GitHub repository (https://github.com/ yeastrc/compgo-geoduck-public) as is the underlying code for the end user web-interface (https://ift.tt/2PscZgH geoduck/pages/goAnalysisForm.jsp).
    • add file names and location, links to a googlesheet
    • confirm with Emma what files need to be made available
    • confirm with Rhonda where files are if we can’t find them
    • do we include a fasta file of the peptides ID’d by mass spec?
  4. add updated NMDS with only silo 3 and 9 to manuscript draft (Kaitlyn will do this)
  5. For supplementary table of all proteins detected:
    • re-do the BLAST of CHOYP protein sequences to 2018 UniProt database (for reproducibility and to use up-to-date info)
    • ask Steven about files related to SQLShare from April 26, 2017
      • need to regenerate ‘table_blastout_gigatonpep-uniprot’
      • need fasta file of the peptides ID’d by mass spec
      • need 2018 Uniprot DB – make a simplified supplementary table containing CHOYP IDs, UniProt Accessions, e.val, Protein names, Gene names
  6. Low priority: do we need to explain we did 2 x 4 treatments, or just say we did 1 x 2 treatments?
    • do we have survival data for other silos to compare to silo 2, 3, and 9?
    • Can we rule out silo 2 as an anomoly or should we include it?
  7. Low priority: re-focus the intro

Kaitlyn’s notebook: Silo 3 and 9 NMDS with color


I can change colors symbols, or the legends as well.