Grace’s Notebook: April 26, 2018, RNA isolation and Data org

RNA Isolation

Today I process these samples: img

As you can see, there is a comment for each one saying “contaminated” because at the end of the isolation process when I was putting 50µL of 0.1% DEPC-treated water in each tube, I noticed by the fourth sample that the bottle with the water had a cloudy substance floating in it. I must have accidentally discarded the supernatant during the final alcohol wash into the bottle containing the water instead of the designated waste bottle.

The samples from the first sampling date for each of the three crabs was likely contaminated with the water, as such the remaining tubes are not usable either becuase they only encompass the second and third sampling dates for the crabs that I selected.

Huge bummer and now I know that I should always put the lid over the bottle even if it is just for a second so as to avoid making that mistake again. When you’re doing a protocol such as this that takes a long time and involves a lot of repetitive motion, you have to be super diligent.

I am going to be working from home tomorrow – organizing data sheets and editing and publishing the podcast – but since this weekend looks rainy, I’m going to come in to lab both days for a few hours at a time to get some more isolations done before the next crab meeting on Thursday!

Here is the set that I have chosen to replace the set that I messed up today: img

Luckily there are still a lot more samples to choose from!

Data organization

From Steven: img

Spits out a file that joined the info based on the “tube_number” in common between the two spreadhseets. This was mostly done as a practice and tomorrow and over the weekend I’ll work more on my actual spreadsheets in GitHub crab project repo.

from Grace’s Lab Notebook

Yaamini’s Notebook: Gonad Methylation Analysis

Time to analyze the C. virginica data

Now that my two papers do not require my constant attention, I can start analyzing the MBDSeq data from the C. virginica project. The goal is to see if experimental ocean acidfication drove differential gonad methylation in adult oysters. This lab notebook entry will outline my plan and link to important information I’ll need down the road.

Sam received the FASTQ files and saved them here. The sample IDs follow numerical order, and are non-directional.

Here’s how I will process these samples:

  1. FastQC I previously used FastQC with some O. lurida transcriptome data, so I can follow the general steps in this Jupyter notebook.
  2. Bismark The purpose of Bismark is to align my sample files with the C. virginica genome, then extract data from methylated areas. I will first test my Bismark pipeline with a subset of one data file. Once I know it works, I will run all my samples.

Now that I know what I’m doing, I should probably do it…

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Yaamini’s Notebook: Averaging Total Alkalinity

I have water chemistry data!

Sam ran samples from the beginning, middle, and end of my Manchester adult pH exposure on the titrator. He then calculated total alkalinity for these samples. I saved the information in this .csv file. More information is available in his notebook post.

For my paper, Steven suggested I make a table with average total alkalinity values for each sampling period for both control and experimental treatments. I used this R script to average total alkalinity values and calculate standard errors, and exported the data in this .csv file. It’s something I could have just done in Excel, but this way I have functional for loops in case I need to run more water samples and perform these calculations on a larger dataset.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Sam’s Notebook:DNA Isolation & Quantification – Metagenomics Water Filters


After discussing the preliminary DNA isolation attemp with Steven & Emma, we decided to proceed with DNA isolations on the remaining 0.22μm filters.

Isolated DNA from the following five filters:


DNA was isolated with the DNeasy Blood & Tissue Kit (Qiagen), following a modified version of the Gram-Positive Bacteria protocol:

  • filters were unfolded and unceremoniously stuffed into 1.7mL snap cap tubes
  • did not perform enzymatic lysis step
  • filters were incubated with 400μL of Buffer AL and 50μL of Proteinase K (both are double the volumes listed in the kit and are necessary to fully coat the filter in a 1.7mL snap cap tube)
  • 56oC incubations were performed overnight
  • 400μL of 100% ethanol was added to each after the 56oC incubation
  • samples were eluted in 50μL of Buffer AE
  • all spins were performed at 20,000g

Samples were quantified with the Roberts Lab Qubit 3.0 and the Qubit 1x dsDNA HS Assay Kit.

Used 5μL of each sample for measurement (see Results for update).


Raw data (Google Sheet): 20180426_qubit_metagenomics_filters

Sample Concentration(ng/μL) Initial_volume(μL) Yield(ng)
Filter #10 pH 7.1 5/15/17 0.296 50 14.65
Filter #7 pH 8.2 5/15/17 8.44 50 422
Filter #7 pH 8.2 5/1917 2.52 50 126
Filter #10 pH 7.1 5/22/17 2.0 50 100
Filter #10 pH 7.1 5/26/17 11.9 50 595

Samples were stored Sam gDNA Box #2, positions G8 – H3. (FTR 213, #27 (small -20oC frezer))

from Sam’s Notebook

Kaitlyn’s Notebook: New table with annotations and Kmeans run time…

I merged the Uniprot annotated table with each silo that had the quantitative and qualitative tags I previously made: new table .

I want to note this table includes proteins that are not abundant in each silo. I choose to include this for now since they are easily removable. I was thinking that some Revigo plots with the 0 abundance proteins might reveal some differences between the silos… There is ~1000 to 1500 proteins not expressed in each table (out of about 8400 proteins).

I’m making a new scree plot since my last scree plots weren’t with the right code (nhclus.scree(x, max.k=#)), however it has not been successful yet because of the amount of time it’s taking. I’ve let it run over 4 hours with no results produced. I am trying it one more time and am planning on letting it run overnight, however if it takes that long it may not be feasible since I need to do it 3 times and then run kmeans which takes a few hours itself…

Also, this came up lab meeting: changing max.print options in R .

Grace’s Notebook: April 25, 2018, Data Organization

Index – Match

Today Pam showed me again how to do Index Match in Excel. She originally showed me on her PC, and my format on a Mac is a little different, but not much. Here’s what you do on a Mac:

Have the two files open that you want to move and match data. In the file and cell that you want data MOVED TO, this is the formula:


=INDEX(‘[Qubit-consolidated-copy.xlsx]Sheet1’!$E:$E –> the information from the spreadsheet you want to transfer

MATCH($F:$F, –> the info in the current sheet you want to match with info in the other sheet (I did tube numbers)

‘[Qubit-consolidated-copy.xlsx]Sheet1’!$P:$P,0)) –> the info from the other sheet that contains the things you want to match with the current sheet (tube numbers)

So what this all did for me, was get the Qubit RNA concentration data from the Qubit datasheet, into my RNA Isolation spreadsheet and matched the Qubit RNA concentration data to the correct sample via the matching tube numbers.

Qubit note:

I did sample 469 on the Qubit TWICE, instead of doing 496… so I will Qubit 496 real quick, and then have all of my samples in the subset with Qubit results.

Then, I will organize the spreadsheet and then pick new samples to replace the ones that have Qubit results of “0” (Out of range).

Replacement samples for RNA isolation (10 crabs; 30 samples; 4 batches)

Pam will be able to watch how it’s done tomorrow.

from Grace’s Lab Notebook

Grace’s Notebook: April 24, 2018, RNA Isolation Update and Upcoming Goals

RNA Isolation

Today I did the last set of 9 (I mistakenly thought I did this a few weeks ago, but I hadn’t done it yet): img

I am now trying to organize my data sheets so that I can have a more streamlined and readable format for my data. I have run the Qubit on all the samples I have isolated so far.

This is my OWL notebook folder with ALL of my Qubit data sheets: here. It’s awful and a mess.

Once I have things organized and a clear idea of which sets had Qubit readings of “Out of Range”, I’ll go back and pick new samples to replace those until I have a good subset that ALL have Qubit readings of at LEAST 20ng/uL with a total volume of 50uL.

This week:

  • Organize data sheets and pick new samples to replace the “Out of range” sample sets
  • Isolate RNA for the new replacement sets (show Pam how it works when she has the time)
  • Publish DecaPod S1Ep6 (Pam answering some of my and others’ questions about the project – already recorded, just have to edit)

from Grace’s Lab Notebook

Sam’s Notebook:Total Alkalinity Calculations – Yaamini’s Ocean Chemistry Samples


I ran a subset of Yaamini’s ocean chemistry samples on our T5 Excellence titrator (Mettler Toledo) at the beginning of April. The subset were samples taken from the beginning, middle, and end of the experiment. The rationale for this was to assess whether or not total alkalinity (TA) varied across the experiment. If there was little variation, then there’d likely be no need to run all of the samples. However, should there be temporal differences, then all samples should be processed.

Data analysis was performed in the following R Project:

The R Project above was initially copied from the R Project for our titrator on GitHub:

Three separate, data-file-specific versions of the TA_calculations.R script were created and run:

Salinity values (PSU) were collected from the following spreadsheet (Google Sheet) and manually entered in each of the R scripts:

Specifically, the TA calculations were performed using the seacarb library, with the at() function.

sample_names TA_values
H1 A 2/20/17 2390.88423
H2 A 2/20/17 2393.39207
T1 A 2/20/17 2367.78791
T2 A 2/20/17 2319.39360
T3 A 2/20/17 2309.88602
T4 A 2/20/17 2287.72108
T5 A 2/20/17 2336.14773
T6 A 2/20/17 2298.36327
H1 A 3/20/17 2870.73309
H2 A 3/20/17 2760.49972
T1 A 3/20/17 2930.29308
T2 A 3/20/17 2925.95472
T3 A 3/20/17 2896.55123
T4 A 3/20/17 2769.72514
T5 A 3/20/17 2743.33934
T6 A 3/20/17 2727.94064
H1 A 4/4/17 2770.20971
H2 A 4/4/17 2656.27437
T1 A 4/4/17 2801.77913
T2 A 4/4/17 2822.51611
T3 A 4/4/17 2800.87387
T4 A 4/4/17 2584.60933
T5 A 4/4/17 2645.37017
T6 A 4/4/17 2604.22677

Well, it certainly looks like there’s some variation across the experiment. It’s likely that all remaining samples will need to be processed. Will pass along data to Yaamini for her to evaluate.

from Sam’s Notebook

Sam’s Notebook:Assembly – SparseAssembler (k 111) on Geoduck Sequence Data


Continuing to try to find the best kmer setting to work with SparseAssemlber after the last attempt failed due to a kmer size that was too large (k 131; which happens to be outside the max kmer size [127] for SparseAssembler), I re-ran SparseAssembler with an arbitrarily selected kmer size < 131 (picked k 111).

The job was run on our Mox HPC node.


Output folder:

Slurm output file:

This failed with the following error message:

Error! K-mer size too large!

Well, this is disappointing. Not entirely sure why this is the case, as it’s below the max kmer setting for SparseAssembler. However, I’m not terribly surprised, as this happened previously (only using NovaSeq data) with a kmer setting of 117.

I’ve posted an issue on the kmergenie GitHub page; we’ll see what happens.

from Sam’s Notebook

Sam’s Notebook:Assembly – SparseAssembler (k 131) on Geoduck Sequence Data


After some runs with kmergenie, I’ve decided to try re-running SparseAssembler using a kmer setting of 131.

The job was run on our Mox HPC node.


Output folder:

Slurm output file:

This failed with the following error message:

Error! K-mer size too large!

Looking into this, it’s because the maximum kmer size for kmergenie is 127! Doh!

It’d be nice if the program looked at that setting first before processign all the data files…

A bit disappointing, but I’ll give this a go with a lower kmer setting and see how it goes.

from Sam’s Notebook