Sam’s Notebook: Library Decisions – C.bairdi RNAs for Library Pools

I isolated a bunch of tanner crab RNA on 20190430 and Steven asked me to try to figure out some options for RNAseq libray pools.

Using Grace’s table join of my Qubit data (hemo_qub-for-libs.csv), I created a pivot table to quickly get an idea of how things looked (note: I added a column with total RNA yield for each sample – 10uL * concentration):

C.bairdi pivot table screen cap

Assuming a minimum of 1000ng is required by the UW’s Northwest Genomics Center (that was the requirement last time we had RNAseq performed by them; have emailed to confirm it’s still required), here are the libraries I think we can/should make.

Possible libraries, by “Day”:

  • Day 9
    • Infected vs Uninfected
  • Day 12
    • Infected vs Uninfected
    • Ambient vs Cold vs Warm
  • Day 26
    • Infected vs Uninfected
    • Ambient vs Cold

The “Day 9” samples are named poorly, as these are actually the samples prior to temperature exposures. Thus, the “trtmnt_tank” column (i.e. temperature treatment) in the table above doesn’t really apply. With that in mind, Day 9 samples can only be compared as Infected/Uninfected.

So, by default, those should be two libraries.

Then, we have the Day 12 and Day 26 samples. We can either do Infected/Uninfected OR Ambient vs. Cold, but can’t do both, as there isn’t sufficient RNA…

Admittedly, I’m not sure which aspect of this project was given more weight in the grant proposal:

  • response to disease over time
  • response to temperature over time

The answer to that question will guide library pooling.

With that said, if we do the following, we’ll have a bit of both pathogen response and temperature response:

  • Day 9 Infected/Uninfected
  • Day 12 Ambient/Cold
  • Day 26 Ambient/Cold

from Sam’s Notebook

Shelly’s Notebook: Tues. May 14, 2019 Salmon-sea lice salinity x temp methylomes

Extracted gDNA from salmon skin samples

These samples are from Christian and are from animals that were infected with sea lice for the duration of the sea lice life cycle in two different temperatures (8C and 16C) and two different salinities (26psu and 32 psu). Sample info here

Kaitlyn helped with the extraction.

Followed the EZNA Mollusc DNA kit protocol

  • weighed out 40-50mg tissue and added to microfuge tube
    • remaining ethanol-fixed tissue is in a box on the bottom shelf of the 4C in 213
  • froze samples in liquid N2 and used disposable microtube pestle to pulverize sample
    • this was a little challenging as the pestle didn’t really seems to break up the tissue, just squish it. Also used pipette tip to smash tissue further.
    • samples sat on ice after being pulverized until we got through all the samples
  • incubated samlpes in 305uL ML1 buffer and 25uL Proteinase K solution for 1 hour at 60C
    • not all the sample was solubilized but a lot of it was
  • did the optional column equilibration step
  • eluted in 50uL elution buffer for 5 min at RT before spinning for 2 min at 10,000xg

DNA concentration

  • Qubit dsDNA BR kit for 20 samples + 2 standards
    • Ran Standard initially read 99.7ng/uL
      • read it again after every 5 samples and it consistently read 102ng/uL
    • sample concentrations are here: 20190408_SalmonSamples
      • lowest yield was 61ng/uL * 50uL = 3050ug so we have plenty of DNA
  • DNA are in a box in the -20C in 213 in the bottom drawer

Next steps

  • Digest 1ug of each sample with MspI and Taq-a-1 (o/n)
  • Zymo pico methyl library prep kit on salmon samples and sea lice samples

from shellytrigg

Yaamini’s Notebook: DML Analysis Part 33

Continued troubleshooting

When I asked about statistical tests for characterizing overlaps in this issue, Mac pointed out that some overlap categories for total CpGs were much lower than those for all 5x CpGs. Since total CpGs are the background for all 5x CpGs, Steven suggested I dig into this further. We went back and forth in the issue comments, but I decided to clean some stuff up and make it easier to follow.

Creating new genome feature tracks

In this Jupyter notebook, I downloaded the C. virginica genome file. I used grep to separate out individual tracks for genes, mRNA, and exons. Using subtractBed, I found areas of no overlap between mRNA and exons, and called those introns. I restricted subtractBed using -s to enforce strandedness. Although pre-made tracks are available here, it’s good practice to create these tracks myself so I know how they are made. I counted 38,929 genes, which is similar to the 39,493 genes listed in the NCBI Annotation Report. I counted 60,201 mRNA and 731,279 exons, same as the pre-made tracks. The major difference comes from how introns were counted. The track I generated has 688,167 entries, while the pre-made intron track has 319,262 entries. I’m not entirely sure why this is the case, but figured looking at the track in IGV could help. I used awk to generate BEDfiles for each genome feature track to visualze overlaps more easily in IGV. In this IGV file, I looked at CG motifs, as well as the gene, mRNA, exon, and intron tracks I generated.

By comparing the intron track I generated to the pre-made intron track, it looked like there were some regions that were being properly subtracted, and others that were not.

Screen Shot 2019-05-13 at 11 24 01 PM

Figure 1. Screenshot from IGV. From top to bottom, the gene, mRNA, exon, and intron tracks I generated are displayed. The bottom-most track was a pre-made intron designation.

At one point, I thought I had a brain blast and decided to use subtractBed with the gene track and exon track instead of the mRNA and exon tracks. Using this method, I counted 307,088 possible introns, but there were still discrepancies between the pre-made track and my track:

Screen Shot 2019-05-14 at 12 01 13 AM

Figure 2. Discrepancy between bottom three intron tracks suggests I don’t have the right code for intron track generation.

Reading the subtractBed help file, I couldn’t figure out how to solve this issue. I decided to proceed figuring out the code for the next steps while I wait for answers to my questions in this issue.

The last thing I did was look at how many CG motifs overlapped with the genome tracks I created — the source of my struggles. I first compared the number of lines in the CG motif file with a rough count of all CGs in the C. virginica genome. To get my rough estimate, I used the following code:

 fgrep -o -i CG {fullGenome} | wc -l # -o: print only matching phrases, -i: ignore case  

The 14,458,703 in the CG motif file is roughly similar to 14,277,725 in the full genome, so Sam and Steven felt comfortable with me using the CG motif file. I found 7,914,842 overlaps with genes, 7,507,167 with mRNA, and 2,330,546 with exons. For reference, there were 2,323,389 CG-exon overlaps from the pre-made track. Once I sort through my issue with the intron track, I can characterize overlaps.

Difference between gene and mRNA tracks

This screenshot shows some extra genic regions that are not included as mRNA, but are included as exons

Screen Shot 2019-05-13 at 11 59 40 PM

Figure 3. Area where gene track has extra information that is included in exons but not in the mRNA track.

My assumption is that coding sequences (CDS) are involved, but I would not expect to have to incorporate CDS into intron generation. This is something to continue investigating.

Going forward

  1. Finalize the intron track
  2. Understand the difference between the gene and mRNA tracks
  3. Characterize location differences between hypermethylated and hypomethylated DML
  4. Describe (somehow) genes with DML in them
  5. Figure out what’s going on with the gene background
  6. Figure out what’s going on with DMR
  7. Work through gene-level analysis
  8. Update paper repository
  9. Update methods and results
  10. Start writing the discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student

Shelly’s Notebook: May 9, 2019 Geoduck juveniles Pt. Whitney

cleaning juvenile heath stack screens

  • Per Matt’s direction, we cleaned the juvenile heath stack screens to prevent algae build up, especially since we know they are getting overfed. This was pretty nerve racking as Matt has never done this on geoduck this young before. But we tried to be gentle, using Matt’s sprayer on the lowest setting possible
  • The screens from H2 (right side heath stack with precious juveniles) had some algae build-up: uc?export=view&id=1EWxYhlLWADNLRor0NAvVmPkyW3yVgUqm
  • I sprayed H2-T2 first


video of spraying here

  • I returned the H2-T2 tray back to the stack and refilled it. It looked much cleaner uc?export=view&id=1F_YB1yfsjgjSZRPFmHnhpm4nCBOws4pi
  • I took a total of 3 x 5 mL grabs from different areas in the tray with the turkey baster to check the animals after cleaning. I only saw one animal in the third grab, but it was pretty lively all things considered. uc?export=view&id=1Q6CdUZ-Yed71rKzpPodI16XYGX3bg7z6
    • here’s a video of the juvi in action
    • I thought this might have been another juvenile, but it might just be sand. It wasn’t moving at all. uc?export=view&id=1LVR3ywRFDL0XjKde2z-jHrU9iln3mBvF
    • there were lots of swimming plankton in the water, copapods, something that looked like zoea
  • after seeing signs of life, Kaitlyn and I continued to spray the rest of the screens and trays.
  • we re-ordered the trays in the stacks when we put them back. uc?export=view&id=1DD_Kz5Km8C9ZuEivhsF8zAx4eiQ8PZM4

Sam’s experiment: low pH vs ambient

water chem

  • we took water chem samples around 2:30pm before we started cleaning.
  • kaitlyn poisoned them and ran the rest of the samples from the previous weeks that we were backlogged on.
  • T7 had no flow at all; we found out by taking pieces apart one by one that there was a clog in the main header tank line.
    • we blew it out with the hose to clear it and got flow back

cleaning screens

  • Low pH screens seemed more fouled than ambient screens:
    H1T1 before cleaning uc?export=view&id=1y1-yT7ncyPczZ9UIT4nlw4Y55TSXo7TTH1T1 after cleaning uc?export=view&id=1MpBfk3tzke8CUtOnhtOu3WE_7ihLRD7YH1T3 before cleaning uc?export=view&id=1n-kUSxqTlLA_0MNNLjB0U0BGBHYHgFc1H1T3 after cleaning uc?export=view&id=1_7p8ooem5NLRkEotsMqQCNVVF44tn1WS
    • T5 and T7 were not as fouled indicating maybe more food is going to the top trays.
    • We did not reorder these trays.
    • lots of juveniles were visible in all trays
      • After spraying and lifting the H1T1 screen from the holding tray, we notice about 100 animals in the holding tray. And we saw the same for H1T3, but not for any other screens. I suspect there may be a gap somewhere between the mesh and the screen frame.

Optimizing algae feed

  • Algal food has frequently been unevenly distributed with the setup with a T connecter to split the algae into the two header tanks set up on 04/29/2019.
  • I wasn’t able to add an additional head onto the pump because we didn’t have long enough screws
    • would need two 10” long 4mm or 1/8” screws, rods/wingnuts, or screw extenders
  • I made 2 6” pieces of the skinnier masterflex tubing and ran them both in the same pump head
    • This worked for even delievery over the few hours we monitored it. Matt will keep an eye on it.
    • Because this isn’t the best solution, I’m exchanging the pump head I previously ordered for a two channel pumphead that can accomodate two tubes and even smaller tubing. They are sending free samples of even skinnier tubing: 06402-13 and 14 that will be able to limit the flow rate even further so we aren’t overfeeding the juveniles.
  • Kaitlyn and I ran 2 lines of ~50 feet of 1/4” tubing out to the algae tank and into the pump.
  • We re-fixed the tubing above the header tanks

from shellytrigg

Grace’s Notebook: Closer to having some pooled libraries!

Sam extracted RNA using the Quick DNA/RNA Microprep Plus Kit (ZymoResearch) (post). I don’t think there is currently enough RNA in the libraries to reach the requiremets for NWGC… details in post.

Today I quickly joined his qubit result file with the hemolymph sampling data (hemo_table.csv) in R.

The resulting file: hemo_qub-for-libs.csv

The goal is to have 12 libraries accounting for each infection status and temperature treatment. Each pooled library needs to have at least 1000 ng of RNA in a sample of ~50ul. (GitHub Issue #184)

Pooled libraries to create:

Library Number Sample Day Infection status Temp trtmnt
1 9 0 NA
2 9 1 NA
3 12 0 Amb
4 12 1 Amb
5 12 0 Cold
6 12 1 Cold
7 12 0 Warm
8 12 1 Warm
9 26 0 Amb
10 26 1 Amb
11 26 0 Cold
12 26 1 Cold

*Note: the temperature treatments didn’t start until after the Day 9 sampling day.

As of now, I don’t think the pooled samples would have enough RNA to meet the requirements… for example, the samples that fall under Library 1 add up to 79.9 ng/ul.

I may need to check this again when I’m not sick.

from Grace’s Lab Notebook

Shelly’s Notebook: May 8, 2019 Geoduck broodstock hemolymph DMR calling

Running methylkit:

I had planned to run MethylKit on Emu’s RStudio web interface to find DMRs in the WGBS hemolymph. In reviewing a previous MethylKit R script, I realized I need sorted bam files, which my initial bismark mox script didn’t include. So I submitted a new mox job to sort my bams. Once the job finishes (after Mox maintenance), I will copy the sorted bams over to gannet and then on to Emu so I can run the Rstudio web interface (a little painful, but getting there!).

Running methylpy

So I thought it might be easier to install methylpy on the server and do DMR calling with this python program instead of with MethylKit in R. Turns out installing this on Mox was too challenging as I kept getting permission denied messages. So I tried installing it on Ostrich which turned out to also be very challenging. Finally, I got it installed and functioning after correctly installing GSL. I am currently running methylpy DMRfind on Ostrich.

from shellytrigg

Kaitlyn’s notebook: Organizing geoduck methods and juvenile survival

I’m working on writing out the methods done with the broodstock (collection, water chemistry, experiment duration, and timing of sampling). Perhaps a timeline of the experiment would be better represented in a visual rather than a document (or at least condensing the current timelines/logs to one document)? There is a new, separate document now to keep track of the tasks needed to work toward sample analysis/manuscript development for the geoduck samples.

Additionally, I’m attempting to write up results on survival and histology. However, I think it would be best to get quantitative measurements for histology (e.g., oocyte size and % connective tissue). When I trained Eileen on water chemistry, we discussed the possibility of me using the Padilla-Gamiño microscope. I followed up with her and am waiting to hear back. Meanwhile, color deconvultion via ImageJ plugins or Qupath has allowed me to calculate the correct stain vectors to separate out the hematoxylin stained vs. eosin, but I don’t know how to or if I can convert that to percentage/area/pixels to compare those differences. I am continuing to look through literature to best solve this issue and give a more accurate measurement for any developmental differences between treatments shown in histology samples.

I’m also working on measurements of juvenile geoduck that were reared in the hatchery (as opposed to those outplanted in Sequim Bay) based on the images we took. Grace and I took counted survivors, removed dead geoduck and all sand, and attempted to remove any ciliate/algal growth on the juveniles. We hoped that sand removal would slow or prevent detritus buildup, however it looked like nearly all geoduck were affected when we visited the hatchery on May 2.

  • Some questions for Brent:Is it possible that the ciliates are setting on just the geoduck now rather than being dispersed between setting on the geoduck and the sand? In which case, should we add the sand back in?

Here is the survival data from April 26, 2019:

Order Tank Parental Exposure Alive Dead found
1 H0_T E*A 7 8
2 H0_B E*E 9 3
3 H1_T E*A 15 7
4 H1_B A*A 16 9
5 H2_T E*A 14 6
6 H2_B A*A 12 8
7 H3_T E*A 14 3
8 H3_B E*E 13 1

We also rearranged the heath trays because it appeared that there was uneven buildup on the bottom trays as opposed to the top trays:

This slideshow requires JavaScript.

The tides are at their lowest point right now so I reached out to Kelly to check on the outplanted geoduck as well. Shelly and I will go out to Pt. Whitney at the end of this week to setup a new pump so we can reduce feed levels for the setters and finish titrating poisoned water samples. Water chemistry may be taken again either that day, or early next week, and I am going to help Shelly extract DNA from sea lice infected salmon skin next week.

Shelly’s Notebook: Apr 15 – May 3, 2019 Geoduck Broodstock hemolymph WGBS

Bismark alignments

I ran this job on mox to align reads to Pgenerosa_v071.

  • Raw fastqs live here and FastQC generated files are here
  • Sam ran FastQC here
  • Sam mentioned Genewiz recommends “Trimming 10 bases from the beginning of both R1 and R2 following adapter trimming eliminates the majority of Adaptase tails.”


  • Instead of running trimmomatic or following Genewiz’s recommendations, I tried just doing a crude trim removing the first 6 characters (which seemed to be lower quality)from each read before running the alignments. see mox job for code

running bismark aligment and methylation extractor

running cytosine coverage

  • because I didn’t include the genome-wide cytosine coverage option when I initially ran bismark, I ran coverage2cytosine after with this script on mox

Calculating coverage

I adapted Sam’s new coverage script to work on my data in this jupyter notebook: 20190503_Pgnr_comparison.ipynb.

It worked! And generated this plot:

Coverage of genome-wide cytosines is surprising good. Tank 2 got about 66M reads and Tank 3 got about 50M reads. The genome is about 2.4gb. If the genome was evenly fragmented at 150bp and got even coverage, you would need 16M reads to cover the genome 1x. So it’s possible the the depth these libraries got were able to to cover most cytosines.

Next steps:

Run MethylKit

  • As a first pass, run untrimmed alignments through methylkit to see what’s different

Trim properly and rerun alignments and coverage analysis

I should probably try the alignments again with the recommended trimming and see what happens

Check unmapped reads

I don’t think that Bismark outputs unmapped reads unless you specify it in your initial code:

But when I run alignments again, I’ll specify unmapped reads be output .

from shellytrigg

Shelly’s Notebook: May 2, 2019 Geoduck larvae at Pt. Whitney

Check on heath stack setup

  • food still dicey, sometimes the food is unevenly distributed.
    • ordered a new pump head today and they are sending some sample tubing to see if it will solve the problem (see below for more details).
    • animals looked good but some algal floculation

Algal counts

  • Matt recommended setters should get a concentration of 100K/mL, but not more than that.
  • Inflow algal counts direct from the feeding tube were:
    • H2 (ambient): 4.73 x 10^6 cells/mL
    • H1 (ambient/low pH): 4.8 x 10^6 cells/mL
  • Outflow algal counts direct from the feeding tube were:
    • H2 (ambient): 5.3 x 10^5 cells/mL
    • H1 (ambient): 3.8 x 10^5 cells/mL
  • Flow rate of the feeding tube is 10mL/13 seconds = 0.77mL/second = 46.2 mL/minute = 66,528 mL / day = 35,260 M cells/day
  • FAO manual recommends 0.4mg dry algae per mg spat per week – 10^6 Iso cells = 0.02mg, so 20M cells/week or 2.857M cells/day
  • Setter weight ~= 0.07mm^3 = 0.07mg? – H2 has ~ 52000 animals = 3.640 mg spat – H1 ambient has ~ 60,000 animals = 4.2mg spat


  • Ambient should be getting ~ 22.4M cells/day (2.857M cells/day * 7.84 mg total spat)
  • They are getting about 1.5X as much


Water chem

Kaitlyn and Eileen ran TA on H1 water samples. TA data here and discrete chem data here

  • Still need to run poisoned samples from H1T5-T8 from 4/19 and from 5/2. We can do this next week.

from shellytrigg

Roberto’s Noreboock: FastQC of libraries with additional WGBS.

The results seems like before, with low Qval at the last positions.
Example of library 0501_R1_001_fastq.gz
/home/roberto/Pictures/Screenshot from 2019-05-03 14-31-08.png

Using this data, Trimming is running with conditions:
java -jar /home/Apps/trimmomatic/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 /home/Data_MeOH/Seq_data/0501_R1_001.fastq.gz /home/Data_MeOH/Seq_data/0501_R2_001.fastq.gz 0501_R1_001_paired.fastq.gz 0501_R1_001_unpaired.fastq.gz 0501_R2_001_paired.fastq.gz 0501_R2_001_unpaired.fastq.gz ILLUMINACLIP:TrueSeq3-PE.fa:2:30:10 LEADING:30 TRAILING:30 SLIDINGWINDOW:4:20

I did not consider the min-length option. Last time, considering MINLENGTH:100 was not good. It reduces from 58 million of reads to 35 million…

fastQC before trimming (this library has not the Additional 25% of Seqs)
/home/roberto/Pictures/Screenshot from 2019-05-03 14-37-56.png

After trimming, the result was:
/home/roberto/Pictures/Screenshot from 2019-05-03 14-40-04.png

For this reason, the MINLENGTH is not used.