Sam’s Notebook: Assembly Stats – Geoduck Hi-C Final Assembly Comparison


We received the final geoduck genome assembly data from Phase Genomics, in which they updated the assembly by performing some manual curation:

There are additional assembly files that provide some additional assembly data. See the following directory:

Actual sequencing data and two previous assemblies were previously received on 20180421.

All assembly data (both old and new) from Phase Genomics was downloaded in full from the Google Drive link provided by them and stored here on Owl:

Ran Quast to compare all three assemblies provided (command run on Swoose):

  /home/sam/software/quast-4.5/ \ -t 24 \ --labels 20180403_pga,20180421_pga,20180810_geo_manual \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-03 11:05:41.596285/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-21 18:09:04.514704/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180822_phase_genomics_geoduck_Results/geoduck_manual/geoduck_manual_scaffolds.fasta 

Grace’s Notebook: Speed Vac New Pool of 15 Samples

Today I speed vac-ed the new pool of 15 samples for a little over three hours. There is still more than 50ul in the tube, so I’ll put it back in the speed vac as soon as I am in tomorrow morning.

The pooled sample is 150ul (15 samples, each 10ul).
Sam and I went over to the Speed Vac in FSH. At 10:50am, it was started on medium heat.

At 2:10pm, there was still too much liquid. As soon as I am in tomorrow am, I’ll put it back in the speed vac.

Put the sample in (cap open) a slot that has a little bit of paper towel stuffed in the bottom. Close lid. Turn on the machine. Turn the heat to medium. After it runs for a bit, turn on the vacuum (turn yellow valve marked “Vac” toward the machine). Then open up the vacuum by turning the blue dial on top of the machine to “open” (to the right).

To open it later, close the vacuum (blue dial) and then wait til it stops spinning.

from Grace’s Lab Notebook

Kaitlyn’s notebook: Oly larval measurements

Today I worked on measuring larvae using ImageJ on Emu since my laptop is broken. I organized my excel file such that I could copy and paste the measurements and I saved the ROI so that the lines could be looked at and edited at a later date. The file is currently living here, and I will continually update it. ROI files are not online yet since Emu does not have a .git connection to any of my repos.

Yaamini’s Notebook: Outreach with Botanical Gardens Summer Camp

Teaching students about cryptobenthic reef fish!

For today’s SEAS outreach event, Marta and I spent time with seven students in the UW Botanical Gardens Summer Camp. Marta has a lesson about cryptobenthic reef fishes — small, difficult to spot fish that live in different coral reef microhabitats. These fishes live in coral heads, coral rubble, and sandy habitats.


Figure 1. A cryptobenthic reef fish!

In this activity, students had the opportunity to step into Marta’s shoes — or, her dive boots — and simulate her scientific process. Students first had a chance to make observations about cryptobenthic reef fish and the microhabitats they live in. Because they were prentending to dive underwater, they weren’t able to talk to eachother during their observations!


Figure 2. Marta’s coral reef habitat replica!

Once they made observations, they had a chance to talk about their findings and make hypotheses about fish abundance and species richness in the different microhabitat. Then they had a chance to count the fishes in the different microhabitats and create graphs to depict their findings.


Figure 3. Marta explaining her Ph.D project and the day’s activity.

Overall, I think they learned a lot about coral reef habitats! The rest of their day included looking at the fish collection and the Seattle Aquarium. So fake fish –> dead fish –> live fish. Quite the progression!

The activity involves a lot of small group work and discussion. To improve the experience, I think it would help to have some larger group interactions. We discussed a lot of jargon by talking at the group, but it could have been more interesting to have a trivia game, or more back-and-forth, when they learned about those topics. It also took much longer to set-up than we expected, so we need to remember that for the future!

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Kaitlyn’s notebook: Uploading histology photos

Today and yesterday I worked on removing photos from my phone that I took of Laura’s histology and renaming the files with their appropriate sample numbers. This required using the orientation off the casette and matching it to the block. It looks like the casettes were mirrored on the slides but stayed consistent throughout all slides.

Once all the samples were matched to the appropriate tissue on the slide, I began renaming files according to the histology data management plan. File labels are DATE-SAMPLE-MAGNIFICATION (eg. 20180730-270-4X). I created a new folder in owl for O. lurida in /web/hesperornis to upload the photos.

I realized that the sample ID’s were not going to be helpful if someone wanted to find the slide and re-image or analyze it. Therefore, I created a new folder containing the date and slide labels (eg. 20180730-OLY82-107). This will allow someone to find the images based off of the slides or help someone know what slides contain these samples. I’m not sure if this will be a convention we will want to keep in the Roberts lab, but I thought that adding the slide number into the file label would create too lengthy of a name. One more benefit the new folder added is that the “maps” I created that showed which samples are on each slide can be kept separate from other slide images/experiments. However, it could just as easily be parsed out based on the date.

There was one problem labeling the histology slides. Slide OLY 102 had 7 tissue samples on it, however the casette shows that only 6 tissues were placed. Samples 355 and 353 are indistinguishable thus they are labeled “..355-353A..”, “..355-353B..”, and “..355-353C..”. Additionally, some slides had some hair caught in them. I imaged OLY 96, sample 317 and added “hair” into the label. All images are viewable here

I also downloaded ImageJ onto Emu yesterday. I tried to add it to Ostrich, but the srlab user is not an admin which is required for download.

Grace’s Notebook: New Hemolymp sample and Qubit file; New RNA-seq short-term plan

Today Sam, Steven, and I met and decided to pool the 15 samples that had quantifiable RNA (Qubit RNA HS results) from the 40 samples that Sam processed using the Qiagen RNEasy Mini Plus Kit into one sample to be sequenced. Also, they both helped me learn more about data management and creating a better file that contains the crab hemolymph sampling data and the Qubit results data.

Pool Plan

Today I combined these tubes from the 40 that Sam processed into one tube (-80 rack 2, Column 3, row 4).

Sam sent me the information of sending this off to get sequenced. Will read this evening and make a plan to do it tomorrow.

Data organization and management

I am now going to be keeping all the data files (relatively untouched and manipulated) in project-crab/data and will be working in R on scripts kept in project-crab/scripts and the new joined files will be kept in project-crab/analyses.

Today, Steven and Sam helped me clean up the hemolymph sampling data file so that it is more accurate and also just the information we really care about. The “sample_table.csv” now contains many extra rows. Steven created new tube numbers for Day 26 samples, because those were taken in triplicate (except for the three crabs in the warm treatment, for which there were 6 samples taken). So now, there are tube numbers like so: 401_1, 401_2, 401_3, etc. This is more truly accurate so that we can keep better track of what samples have been touched and how many of Day 26 we have left.

We also joined the improved “sample_table.csv” with qubit results, for which I have sample volume (ul) and total yield (ng) columns. Steven made sure to add the _1 to the Day 26 tube numbers. The new sample_qubit_table.csv has rows for every tube we have in the freezer, as well as the important information from the qubit results (Test_Date, tube_number, Original_sample_conc_ng.ul, sample_vol_ul, total_yield_ng).

There are some tubes that are not accounted for because there are some that I have processed, but haven’t run on the Qubit, so it currently looks like those samples are still untouched and in the freezer. I will go back through my notebook posts and find the tube numbers for the ones that I have done that with. I would be very surprised if there were more than a dozen tubes that that has happened with (4 crabs – three samples per crab).

Also, it would be nice to add a column that would include the method of RNA isolation. I used RNAzol RT protocol, whereas the ones that Sam has processed were done with Qiagen RNeasy Mini Plus Kit. We are currently waiting on the lyophilizer to be fixed, after which we will try a new method using Tri-Reagent.

Next steps

– Send pool to be sequenced

– Once lyophilizer is fixed, give Sam the samples I’ve picked out for him to try the Tri-Reagent

– Work through the data sheets more to account for all tubes

from Grace’s Lab Notebook

Grace’s Notebook: Streamlining Processing of Adding Qubit data to Master file

Today I played around with GitHub and R Projects. I’ve figured out a way of making adding Qubit results to my hemolymph and Qubit results master file more easy. It isn’t perfect and there are likely some things that I don’t know about that could make it easier, but I’ve detailed the new method below. I also noticed today that the tube_number column in what I did yesterday got all messed up with the merge in R. Detailed below.

New R Project and Script

Sam mentioned yesterday that it was difficult to help me because my R script and project was traversing two GitHub repositories.

So, today I worked on figuring out a way to make it easier not only for myself, but for future collaborations and help.

I made a new R Project and R script in the project-crab GitHub repository.

The script is much more straightforward and easier to use becuase the csv’s are already in this repository.

 library(tidyverse) # Script for creating joined and merged data files from project-crab. Ultimate goal is to create a true master file of all crab-related data files in this repo. # Join all hemolymph sampling data with Qubit RNA HS results data into one file. # read in all-crabs-hemo.csv hemo <- read.csv("data/20180522-all-crabs-hemo.csv") # read in Qubit-results.csv qub <- read.csv("data/20180817-Qubit-results.csv") # Convert "tube_number" in hemo.csv numeric (not factor) hemo$tube_number <- as.numeric(hemo$tube_number) # test that as.numeric conversion worked is.numeric(hemo$tube_number) # Join files based on column "tube_number" master <- left_join(hemo, qub, by = "tube_number") # Look at columns in master file colnames(master) # Select columns that are useful and relevant master2 <- master %>% select(year.FRP, FRP, infection_status, Uniq_ID, sample_day, infection_status, maturity, Holding_Tank, Low_Tag, High_Tag, sample, treatment_tank, where, tube_number, Comments, death, comments, CW, SC, CH, Test_Date, Original_sample_conc_ng.ul, total_sample_vol_ul, total_yield_ng) %>% arrange(FRP) # Write updated csv to project-crab data repo. write.csv(master2, "20180817-hemo-Qubit.csv") # Move new csv to data directory in repo project-crab and update  

With this current process there are some things that I need to do outside of R. I add the new Qubit data to the 20180817-Qubit-results.csv file and add in the “total_sample_vol_ul” and the “total_yield_ng” values. Then, at the end of the script, I have to move the new csv to the data directory within the project-crab repository and update the

Merge in R messed up tube_number columns

This all lives in this repository: crab-sample-selection
R Script
R Project

“`#Try to merge columns that are repeats. “Test_Date.x” with “Test_Date.y” and “Original_sample_conc_ng.ul.x” with “Original_sample_conc_ng.ul.y”.

qSam <- read.csv(“20180813-results_from-Sam.csv”) hemoqub <- read.csv(“../project-crab/data/20180813-all-hemo-with-Qubit.csv”)

updated <- merge(x = hemoqub, y = qSam, by = c(“tube_number”, “Test_Date”, “Original_sample_conc_ng.ul”), all.x = TRUE)

write.csv(updated, “20180816-all-hemo-with-Qubit.csv”) “`

Resulted in the new file’s columns getting messed up.
20180816-all-hemo-with-Qubit.csv The tube numbers are in column marked “X” now…

from Grace’s Lab Notebook

Grace’s Notebook: Crab RNA-Seq thoughts; Merging columns in R

Yesterday I shared some preliminary thoughts (in a GitHub issue and detailed below) on what we can do for our first Crab RNA-Seq library. Today I learned – through a whole mess of comments on a GitHub Issue because my stuff isn’t organized well – how to merge data in columns that are the same in R, so that you don’t have duplicates (example: “Test_Date.x” and “Test_Date.y”). Steven mentioned I should go over with him how to name files… which sounds good to me because I think I’m taking too much time out of my day renaming and replacing files as I update them. Sam also mentioned that my R Project is difficult to work with because it includes files that belong to two different repositories. So I’ll work on cleaning that up and making it easier for future collaboration and help.

Crab RNA-Seq plan thoughts

GitHub Issue #346

For a library using UW CORE, need 1000 ng RNA in 50ul sample.

Gave Sam 40 samples from Day 26. From each of the following groups:
uninfected, cold (11) infected, cold (10) uninfected, ambient (9) infected, ambient (10) (gave uninf. cold instead of uninf. amb. for one sample. not 10 per treatment).

Of those 40, 15 had quantifiable RNA (Qubit RNA HS): uninfected, cold: 5/11 infected, cold: 5/10 uninfected, ambient: 3/9 infected, ambient: 2/10

Below is a screenshot of just those 15 samples along with their volume (he resuspended them in 10ul H2O) and their total sample yield (total volume * original sample volume): 20180815-samps-from-sam

Total RNA (ng) for each treatment group: 20180815-total-rna-per-trtmnt

Some preliminary thoughts of mine so far:
From these 15, we could:

  • combine all 15 samples (1560.8 ng RNA)
  • combine the cold treatment together (1200.8 ng RNA)

Combining the infected cold and infected ambient is very close to 1000ng (954 ng RNA).

We can try isolating RNA using the Qiagen Kit from the second tubes taken in triplicate and then hopefully have more than enough

Merging column data in R

GitHub Issue #349

The problem:

I added the Qubit RNA HS results from the 40 samples Sam processed. I did this by “left_join” by the tube_number with the 20180813-all-hemo-with-Qubit.csv.

However, it added repeat columns (I deleted the ones that aren’t important):
screen shot 2018-08-16 at 11 55 58 am

Is there a way in R to merge information in “Test_Date.x” with “Test_Date.y” and “Original_sample_conc_ng.ul.x” with “Original_sample_conc_ng.ul.y” so that there aren’t repeats. I’ll have to keep adding more Qubit information as isolations continue.

The solution:

merge(x = hemoqub, y = qSam, by = c("tube_number", "Test_Date", "Original.sample.conc."), all.x = TRUE)

Notes from Sam and Steven

From @kubu4
Heads up, this is a bit difficult to work with because your script is traversing multiple repos (e.g. ../project-crab/data ; I don't have a project-crab folder on my computer, so I can't get to those files using your script).

Personally, I think you should do all of this work in a single repo - that would eliminate these issues. However, if you want to work with two repos, you should make some changes:

Use the download.file() function to download data files from other repos.

Add information to repo files that explain that this repo depends on the other and provide link and/or cloning instructions for that other repo.

From @sr320
We should talk about file naming also- let’s go over this tomorrow

Things I need to do:

  • Learn better method of naming files and updating them
  • Make collaboration in R projects more user-friendly by either putting everything in one repo, or being very clear about downloading files from other repos
  • Make sure everything is ready for collaboration BEFORE I make a GitHub issue asking for help

from Grace’s Lab Notebook

Sam’s Notebook: DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets


Bismark analysis of all of our current Olympia oyster (Ostrea lurida) DNA methylation high-throughput sequencing data.

Analysis was run on Emu (Ubuntu 16.04LTS, Apple Xserve). The primary analysis took ~14 days to complete.

All operations are documented in a Jupyter notebook (GitHub):

Genome used:

Kaitlyn’s notebook: take down day 2

Yesterday was a long day at Pt. Whitney as we settled the geoducks in new heath trays. In the morning, Sam and I finished tidying up wires and cleaned banjos. We also released excess pressure in the CO2 canisters. Sam showed me where all the HOBO sensors were and we offloaded data to the shuttle and placed the HOBO that used to be in the downweller into the heath trays where the geoducks now are (so there are 2 total in the heath trays).

Brent and Steven arrived and had a meeting with Kurt to finalize the next steps for the geoduck. After the meeting is was determined that they will be planted on the low tide Sept. 7 and in the meantime, they will live in 4 heath trays that are divided with fiberglass to keep the treatment groups seperate.

Matt cut fiberglass and glued in the dividers. We consolidated some of their geoduck so we could have all the geoduck in one stack. We used coarser sand so it wouldnt go through the screens so we had to sieve and rinse the new sand and evenly distribute it into the trays. The screens were between 350-450 um with the exception of one that was 800 um. 450ml of sand was added to each 1/2 tray. Next we screened the geoduck so that no old sand would enter the new setup. They screened well and were not too sticky.

Once we finished screening and loading the trays in their stack we finished draining the conicals and broke down all the pipes for storage in the dry lab. Here are the geoduck in their new home!