WGBS Analysis Part 10

Analyzing the full dataset

At the end of last week, I got my samples back from ZymoResearch! These are WGBS samples of C. gigas female gonad tissue from my Manchester experiment. The purpose of sequencing these samples is to determine if the low pH exposure altered the methylome. If we find something interesting from these samples, we may sequence some larval samples, or look into gonad RNA.

Obtaining sequences

The first thing I needed to do was download the data. ZymoResearch provided two URLs: one for sample fastq files, and another for sample checksums. Each URL directed to a text file. The file with fastq information contained a link that could be curled to access the individual files. In order to get the data, I needed a straightforward way to specify multiple URLs in a curl or wget command.

I found this link with information on how to do just that! Once I downloaded the text file, I could use xargs to specify the individual URls to download:

“`curl https://ift.tt/3rccANF > download_fastq.txt #Download file provided by ZymoResearch xargs -n 1 curl -O < download_fastq.txt #Download fastq files

 Then, I verified checksums: 

curl https://ift.tt/39EDuI3 > zr3616_MD5.txt #Download MD5 checksums from ZymoResearch find *fq.gz | md5sum -c zr3616_MD5.txt #Cross-reference original checksums with downloaded files. All files passed

 Once I had the data, I needed to move it to `owl` and include metadata in the [`Nightingales Google Sheet`](https://b.link/nightingales). I used `mv` to move samples from my Desktop to `owl` mounted on my computer, but couldn't write to the directory. I posted [this issue](https://github.com/RobertsLab/resources/issues/1085) to get write access to both the `owl` folder and Google Sheet. As an aside, Sam reminded me I should use `rsync` to transfer the files: 

rsync —archive —progress —verbose zr3616_* /Volumes/web-1/nightingales/C_gigas/ #Transfer to nightingales

 While the files transferred over many hours, I updated the Google Sheet with sample metadata. ### Assessing raw data quality When I received the data from ZymoResearch, the pdf implied that adapters were trimmed out of the data. Confused, I posted [this discussion](https://github.com/RobertsLab/resources/discussions/1082) to determine if I needed to modify my trimming protocol. Sam informed me that the samples were likely not trimmed, and I would see this if I ran [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Once the files were transferred to Nightingales, I needed to transfer them to `mox` and do just that. I used `rsync` to transfer the files: 

rsync —archive —verbose —progress yaamini@*gz /gscratch/scrubbed/yaaminiv/Manchester/data/ #Transfer to mox

 Then, I set up [a `fastqc` script](https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/code/01-fastqc.sh). To streamline the repository, I moved my preliminary analyses to the [pooled subset](https://github.com/RobertsLab/project-gigas-oa-meth/blob/master/pooled-subset) subdirectory. This way, my scripts will remain linked to this repository, but it won't clutter it. Running `fastqc` seemed simple. I located the program on mox in the `srlab` programs directory. The command to run the program is `fastqc` + filenames, so I specified the directory with my data, and included code for `[`MultiQC`](https://multiqc.info/)` report compilation. When I ran this script, it obviously didn't work! Figuring Sam must have run `fastqc` recently, I found [this lab notebook entry](https://robertslab.github.io/sams-notebook/2020/11/10/FastQC-MultiQC-C.gigas-Ploidy-WGBS-Raw-Sequence-Data-from-Ronits-Project-on-Mox.html) with a `fastqc` script for `mox.` I wasn't sure why Sam needed to work with the files in an array, instead of calling `fastqc,` but I figured it had to do with the JavaScript error I encountered when I tried to do just that. I copied the code into my script and modified the variables so they pointed to the correct directories and samples. With his code, `fastqc` will generate sequence quality reports, `multiqc` will compile them, and checksums will be generated. I probably don't need the checksum information since `rsync` verifies checksums when moving files, and I have the original checksum information from ZymoResearch. In any case, it couldn't hurt. Once I finished modifying the script, I transferred the script and run it: 

rsync —archive —verbose —progress yaamini@ . #Transfer script to user directory sbatch 01-fastqc.sh #Run script “`

Once the initial quality assessment is done, I’ll run trimgalore to remove adapter sequences!

Going forward

  1. Trim samples, checking specifically for reasons to trim multiple times
  2. Start bismark
  3. Identify DML
  4. Determine if RNA should be extracted
  5. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3td5bj3

Starting GO-MWU

Getting GO-MWU Input Alright, I finally progressed to the next stage – running GO-MWU! In order to run GO-MWU, I needed two files. First, a two-column CSV table with gene IDs and a measure of significance (I used unadjusted p-value). Second, a two-column tab-separated file with gene IDs and GO terms. Obtaining the first was fairly straightforward – my R script can be found here. In that script, I pulled all genes – including those with an unadjusted p-value of NA. That might be relevant later – may want to redo with only non-NA values. Obtaining the second was also straightforward, but took a long, long time. I had previously created a newline-separated file of accession IDs for each of my 4 comparisons (reminder: elevated vs ambient, elev. vs low, amb. vs low, day 0 vs day 17 amb.). I plugged that into a script from Sam, which got the…

from Aidan F. Coyle https://ift.tt/36BB630

February 2021 Goals


With the amount of work I need to get done this short month, I’m definitely on the meow meow train. But I think at the end of this month, I’ll have a much clearer understanding of how much work is left for me before I’m solely in an analysis and writing phase, and when may be a reasonable time to defend!

January Goals Recap

Hawaii Gigas Methylation:

Gigas Gonad Methylation:

  • I didn’t get any of my data until last week, so I haven’t touched it yet!

Virginica Labwork:

  • Obtained WGBS and RNA-Seq quotes from Zymo
  • Started getting PO numbers for each quote
  • Was not able to send samples for sequencing since I don’t have both PO numbers
  • Did not determine if there are additional samples that should be extracted sequenced

My plate was pretty full with bioinformatic analyses and quarantining after getting back from California, so I didn’t make any progress on ATAC-Seq Labwork or Gigas Labwork!

February goals

Hawaii Gigas Methylation:

  • Evaluate bismark output using coral methylation workflow
  • Complete preliminary assessment of DML with methylKit
  • Try identifying DML with DSS
  • Compare methylKit and DSS DML output and determine which approach is suitable
  • Determine genomic location of DML
  • Identify significantly enriched GOterms associated with DML
  • Identify methylation islands and non-methylated regions
  • Locate an ATAC-Seq dataset that could be used to practice integrating chromatin information with methylation
  • Start drafting manuscript

Gigas Gonad Methylation:

  • Trim samples
  • Align samples with bismark and evaluate output
  • Identify DML using either methylKit or DSS
  • Determine genomic location of DML
  • Identify significantly enriched GOterms associated with DML
  • Identify methylation islands and non-methylated regions
  • Decide if it’s worth extracting gonad RNA for integrated RNA-Seq and methylation analyses
  • Start drafting manuscript

Virginica Labwork:

  • Send samples for sequencing!

ATAC-Seq Labwork:

  • Purchase reagents and identify samples to test cell dissociation protocols
  • Ensure protocol is easy to follow and is accessible for the lab
  • Dissociate and cryopreserve some cells


  • Continue work on ocean acidification and reproduction review in order to submit the manuscript this quarter
  • Complete review for Molecular Ecology
  • Watch SICB talks and prepare for live discussion session

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2MLZaJ8