Yaamini’s Notebook: Gonad Methylation Analysis Part 18

Preparing for discoveries

Before I can understand where differentially methylated loci (DML) are located within the C. virginica genome, I first need to identify DMLs! I used this R script to identify DMLs that were at least 50% different between control and treatment (high pCO2) samples (.csv here). Looking at the PCA was interesting, as the clustering was not as tight as I expected.


Figure 1. Principal Components Analysis of methylated regions in samples.

O2-5 (ambient conditions) are more closely clustered than any of the oysters from treatment conditions. It’s possible that there are organismal differences in methylation responses, or that we just didn’t have a large enough sample size to deal with this variation.

In the last part of the script, I saved my DML information as a BED file. I mimicked Steven’s code to do this. BED files have chromosome ID, start, and stop positions that I can use to pare down information about my DMLs. I can then compare the DML location with other important genomic features using bedtools. The intersect tool seems especially useful. I know we covered bedtools in the 2016 Bioinformatics class, so I’ll review those notes!

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2Jjw3to

Yaamini’s Notebook: Gonad Methylation Analysis Part 17

Note to self: Always double check things

  • I forgot to change the code in my subset and full sample notebooks so that bismark_methylation_extractor ran on the files I produced instead of those in the dignore folder. I switched the code and everything still works!
  • I thought I double checked what bismark_methylation_extractor outputs needed to be in the .gitignore but I left out several *deduplicated.txt files that were well over 100 MB. My mistake took me three days and one Github issue to figure out. Whoops. Now I know how to use the Github command line, add things to my .gitignore, and effectively undo commits to make Github Desktop happy.

I finished up the methylation extraction, HTML report, and summary report steps! I then started methylKit on the full samples to ensure reproducibility. When I’m finished, I’ll create BEDfiles and start to understand where differentially methylated loci are located and waht the gene functions are.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2L8F6e1

Sam’s Notebook:Transposable Element Mapping – Crassostrea virginica NCBI Genome Assembly using RepeatMasker 4.07


Genome used: NCBI GCA_002022765.4_C_virginica-3.0

I ran RepeatMasker (v4.07) with RepBase-20170127 and RMBlast 2.6.0 with species set to Crassotrea virginica.

All commands were documented in a Jupyter Notebook (GitHub):

Sam’s Notebook:Transposable Element Mapping – Olympia Oyster Genome Assembly using RepeatMasker 4.07


Steven wanted transposable elements (TEs) in the Olympia oyster genome identified.

After some minor struggles, I was able to get RepeatMasker installed on on both of our Apple Xserves (emu & roadrunner; running Ubuntu 16.04LTS).

Genome used: pbjelly_sjw_01

I ran RepeatMasker (v4.07) with RepBase-20170127 and RMBlast 2.6.0 four times:

  1. Default settings (i.e. no species select – will use human genome).
  2. Species = Crassostrea gigas (Pacific oyster)
  3. Species = Crassostrea virginica (Eastern oyster)
  4. Species = Ostrea lurida (Olympia oyster)

The idea was to get a sense of how the analyses would differ with species specifications. However, it’s likely that the only species setting that will make any difference will be Run #2 (Crassostrea gigas).

The reason I say this is that RepeatMasker has a built in tool to query which species are available in the RepBase database (e.g.):

 RepeatMasker-4.0.7/util/queryRepeatDatabase.pl -species "crassostrea virginica" -stat 

Here’s a very brief overview of what that yields:

  • Crassotrea gigas: 792 specific repeats
  • Crassostrea virginica: 4 Crassostrea virginica specific repeats
  • Ostrea lurida: 0 Ostrea lurida specific repeats

All runs were performed on roadrunner.

All commands were documented in a Jupyter Notebook (GitHub):

NOTE: RepeatMasker writes the desired output files (*.out, *.cat.gz, and *.gff) to the same directory that the genome is located in! If you conduct multiple runs with the same genome in the same directory, it will overwrite those files, as they are named using the genome assembly filename.

Grace’s Notebook: May 23, 2018, RNA isolations for warm day 12 and master data file organization

RNA isolation

Because only three crabs exposed to the warm temperature treatmeent survived the experiment, we decided during out meeting on Tuesday (2018-05-22) that I would isolate RNA from warm treatment crabs (infected and uninfected) that made it to the temperature treatment stage (day 12). So today I started that process and isolated RNA from 8 samples (4 crabs, all negative for Hematodinium based on Pam’s qPCR results).

Hemolymph collection data of samples I processed today:

RNA HS Qubit results from those samples (tube number ties back to FRP in the hemolymph data sheet):

Tomorrow I will isolate RNA from two more crabs that are warm temperature treatment and uninfected (6 total uninfected) and 6 crabs that are warm treatment infected (using qPCR data from Pam).

Data organization

Working on figuring out a way to use R to organize the data into a master file such that each row is an individual crab (as designated by the unique FRP ID number), with columns for each piece of important data we have associated with that crab (hemolymph tube numbers from the sample dates, RNA isoaltion and Qubit data, morphology data, qPCR results). I am enjoying the puzzle-solving-like aspect of this process, but it can be overwhelming sometimes becuase it is a lot of data that is in many different sheets and workbooks. Our master file will include ALL crabs, including those that died after the second sampling date.

Thursday and Friday goals:


  • finish isolating RNA
  • data organization and master file creation


  • data organization and master file creation
  • work on and finish a pooling scheme for RNA sequencing using the master file

from Grace’s Lab Notebook https://ift.tt/2kinVv8

Sam’s Notebook:DNA Received – Sea lice DNA from Cris Gallardo-Escarate at Universidad de Concepción


Received Caligus tape DNA – two samples:

  • Female 1 .
  • Female 2 .

Stored in slots H4 and H5 in “Sam’s gDNA Box #2″ in the FTR 213 -20oC freezer.

Google Sheet: Sam’s gDNA Box #2


from Sam’s Notebook https://ift.tt/2LnZTv3

Sam’s Notebook:Software Installation – RepeatMasker v4.0.7 on Emu/Roadrunner Continued


After yesterday’s difficulties getting RMblast to compile, I deleted the folder and went through the build process again.

This time it worked, but it did not put rmblastn in the specified location (/home/shared/rmblast).

This fact took me a fair amount of time to figure out. Finally, after a couple of different re-builds, I ran find to see if rmblastn existed somewhere I wasn’t looking:


Additionally, I couldn’t find the location of the various BLAST executables. Some internet sleuthing led me to the NCBI page on installing BLAST+ from source, which indicates that the executables are stored in:


How intuitive! /s

In order to improve readability and usability of the /home/shared/ directory, I renamed the /home/shared/rmblast directory to reflect the BLAST version and created a symbolic link in that directory to the rmlbastn executable:

Symbolic link to RMBLAST


Initiate RepeatMasker configuration

Sam’s Notebook:Software Installation – RepeatMasker v4.0.7 on Emu/Roadrunner


Steven asked that I re-run some Olympia oyster transposable elements analysis using RepeatMasker and a newer version of our Olympia oyster genome assembly.

Installed the software on both of the Apple Xserves (Emu and Roadrunner) running Ubuntu 16.04.

Followed the instructions outlined here:

Starting with the prerequisites:

1. Download and install RMBlast

 - NCBI Blast 2.6.0 source - isb 2.6.0 patch  

Unfortunately, the make command continually failed:

 cd /home/shared/ncbi-blast-2.6.0+-src/c++ make 


While trying to troubleshoot this issue, continued with the other prerequisites:

2. Downloaded Tandem Repeat Finder v.4.09

 - Saved file (```trf409.linux64```) to ```/home/shared/bin```. NOTE: ```/home/shared/bin``` is part of the system PATH. See the ```/etc/environment``` file. - Changed permissions to be executable: <pre><code>sudo chmod 775 trf409.linux64</code></pre>  

3. Downloaded RepBase RepeatMasker Edition 20170127 (NOTE: This requires registration in order to obtain a username/password to download the file).

Installed RepeatMasker:

4. Downloaded RepeatMasker 4.0.7

 - Saved to ```/home/shared/RepeatMasker-4.0.7```  

5. Installed RepBase RepeatMasker Edition 20170127 in /home/shared//home/shared/RepeatMasker-4.0.7/Libraries

Currently re-building RMBlast and it takes forever… Will report back when I have it running.

from Sam’s Notebook https://ift.tt/2IFRP79

Yaamini’s Notebook: Gonad Methylation Analysis Part 16

A new enchilada

(P.S. I think I really want enchiladas now)

After fixing the bismark_methylation_extractor issue, Steven suggested I duplicate my notebook and rerun the analysis on a subset of the data. I created this notebook and started rerunning bismark to align the sequences to the prepared genome. I then deduplicated, sorted, and indexed the .bam files, and extracted methylation calls successfully! I also completed the HTML and Summary Report steps. All outputs from this notebook can be found in this folder.

The next step is to duplicate the notebook I created today, remove the -u argument, and run the commands on the full dataset. I created this notebook and started to run the alignment. I’ll check on it in a few days!

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2LoahTC

Sam’s Notebook:TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data (direc tional)


Earlier this week, I ran TrimGalore!, but set the trimming, incorrectly – due to a copy/paste mistake, as --non-directional, so I re-ran with the correct settings.

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

  1. Run TrimGalore! with --paired and --rrbs settings.
  2. Run FastQC and MultiQC on trimmed files.
  3. Copy all data to owl (see Results below for link).
  4. Confirm data integrity via MD5 checksums.

Jupyter Notebook: