Grace’s Notebook: Crab RNA-Seq thoughts; Merging columns in R

Yesterday I shared some preliminary thoughts (in a GitHub issue and detailed below) on what we can do for our first Crab RNA-Seq library. Today I learned – through a whole mess of comments on a GitHub Issue because my stuff isn’t organized well – how to merge data in columns that are the same in R, so that you don’t have duplicates (example: “Test_Date.x” and “Test_Date.y”). Steven mentioned I should go over with him how to name files… which sounds good to me because I think I’m taking too much time out of my day renaming and replacing files as I update them. Sam also mentioned that my R Project is difficult to work with because it includes files that belong to two different repositories. So I’ll work on cleaning that up and making it easier for future collaboration and help.

Crab RNA-Seq plan thoughts

GitHub Issue #346

For a library using UW CORE, need 1000 ng RNA in 50ul sample.

Gave Sam 40 samples from Day 26. From each of the following groups:
uninfected, cold (11) infected, cold (10) uninfected, ambient (9) infected, ambient (10) (gave uninf. cold instead of uninf. amb. for one sample. not 10 per treatment).

Of those 40, 15 had quantifiable RNA (Qubit RNA HS): uninfected, cold: 5/11 infected, cold: 5/10 uninfected, ambient: 3/9 infected, ambient: 2/10

Below is a screenshot of just those 15 samples along with their volume (he resuspended them in 10ul H2O) and their total sample yield (total volume * original sample volume): 20180815-samps-from-sam

Total RNA (ng) for each treatment group: 20180815-total-rna-per-trtmnt

Some preliminary thoughts of mine so far:
From these 15, we could:

  • combine all 15 samples (1560.8 ng RNA)
  • combine the cold treatment together (1200.8 ng RNA)

Combining the infected cold and infected ambient is very close to 1000ng (954 ng RNA).

We can try isolating RNA using the Qiagen Kit from the second tubes taken in triplicate and then hopefully have more than enough

Merging column data in R

GitHub Issue #349

The problem:

I added the Qubit RNA HS results from the 40 samples Sam processed. I did this by “left_join” by the tube_number with the 20180813-all-hemo-with-Qubit.csv.

However, it added repeat columns (I deleted the ones that aren’t important):
screen shot 2018-08-16 at 11 55 58 am

Is there a way in R to merge information in “Test_Date.x” with “Test_Date.y” and “Original_sample_conc_ng.ul.x” with “Original_sample_conc_ng.ul.y” so that there aren’t repeats. I’ll have to keep adding more Qubit information as isolations continue.

The solution:

merge(x = hemoqub, y = qSam, by = c("tube_number", "Test_Date", "Original.sample.conc."), all.x = TRUE)

Notes from Sam and Steven

From @kubu4
Heads up, this is a bit difficult to work with because your script is traversing multiple repos (e.g. ../project-crab/data ; I don't have a project-crab folder on my computer, so I can't get to those files using your script).

Personally, I think you should do all of this work in a single repo - that would eliminate these issues. However, if you want to work with two repos, you should make some changes:

Use the download.file() function to download data files from other repos.

Add information to repo files that explain that this repo depends on the other and provide link and/or cloning instructions for that other repo.

From @sr320
We should talk about file naming also- let’s go over this tomorrow

Things I need to do:

  • Learn better method of naming files and updating them
  • Make collaboration in R projects more user-friendly by either putting everything in one repo, or being very clear about downloading files from other repos
  • Make sure everything is ready for collaboration BEFORE I make a GitHub issue asking for help

from Grace’s Lab Notebook

Sam’s Notebook: DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets


Bismark analysis of all of our current Olympia oyster (Ostrea lurida) DNA methylation high-throughput sequencing data.

Analysis was run on Emu (Ubuntu 16.04LTS, Apple Xserve). The primary analysis took ~14 days to complete.

All operations are documented in a Jupyter notebook (GitHub):

Genome used:

Kaitlyn’s notebook: take down day 2

Yesterday was a long day at Pt. Whitney as we settled the geoducks in new heath trays. In the morning, Sam and I finished tidying up wires and cleaned banjos. We also released excess pressure in the CO2 canisters. Sam showed me where all the HOBO sensors were and we offloaded data to the shuttle and placed the HOBO that used to be in the downweller into the heath trays where the geoducks now are (so there are 2 total in the heath trays).

Brent and Steven arrived and had a meeting with Kurt to finalize the next steps for the geoduck. After the meeting is was determined that they will be planted on the low tide Sept. 7 and in the meantime, they will live in 4 heath trays that are divided with fiberglass to keep the treatment groups seperate.

Matt cut fiberglass and glued in the dividers. We consolidated some of their geoduck so we could have all the geoduck in one stack. We used coarser sand so it wouldnt go through the screens so we had to sieve and rinse the new sand and evenly distribute it into the trays. The screens were between 350-450 um with the exception of one that was 800 um. 450ml of sand was added to each 1/2 tray. Next we screened the geoduck so that no old sand would enter the new setup. They screened well and were not too sticky.

Once we finished screening and loading the trays in their stack we finished draining the conicals and broke down all the pipes for storage in the dry lab. Here are the geoduck in their new home!