WGBS Analysis Part 14

Reviewing bismark output

In the midst of my existential dread and ennui, I never reviewed my bismark output! I was able to align samples to the genome, then ran multiqc to get summary statistics. I saved bismark output here, and have HTML reports in this repo.

I then looked at the MultiQC report. For all samples, alignment was around 62-63%, which is consistent with the Hawaii samples! About 25% of the reads for each sample were duplicates that were removed from the alignments. This was greater than the Hawaii samples, but makes sense considering that I was using low-quality DNA from gonad in histology blocks. I noticed that samples 5, 7, and 8 had lower percent duplication than the rest of the samples, which is concerning since those are all from the ambient treatment.

There aren’t any big red flags that I can see in the report, so I’ll move forward with methylKit while also aligning to the new C. gigas genome.

Going forward

  1. Align to new C. gigas genome
  2. Identify DML with methylKit
  3. Identify SNPs in WGBS data
  4. Write methods
  5. Write results
  6. Identify DML
  7. Determine if RNA should be extracted
  8. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3sGpXGy
via IFTTT

Hawaii Gigas Methylation Analysis Part 6

Aligning to the new C. gigas genome

There’s a new C. gigas genome! Steven mentioned Mac may have been using the new genome, so I posted this issue to get a link to it. Although Mac never used the Roslin genome, Steven thought it would be a good idea to align to it. It has linkage groups, not just scaffolds, which is better for mapping and understanding my output. It has 10 linkage groups, which is similar to the 10 C. virginica chromosomes! My goal is to align the samples to this version of the genome instead, and move forward with these alignments in the main workflow. I’m contemplating using the previously-generated alignments as well as the new ones just to compare.

Adding the mitochondrial genome

The first thing I needed to do was add the mitochondrial genome sequence to the Roslin genome. I couldn’t figure out how to download the FASTA (NCBI why is your website so confusing). I opened up nano and copied and pasted the sequence into a new file, then saved it as a .fa and hoped for the best. Steven had the Roslin genome in one of his directories, so I copied it over to my /gscratch/scrubbed/yaamini/Haws/data/Cg-roslin folder. I appended the mitochondrial sequence to the Roslin genome using cat, and saved it as a new .fa file. I looked at the file with less and the format looked like a FASTA file, so I decided to proceed with bismark.

bismark script

I created this script to prepare a bisulfite genome and align samples. I referred to this old Jupyter notebook for the genome preparation arguments. I also used rsync to move the FASTQ files from gannet to mox so I could run the script. I submitted the script to mox and it’s in the queue to run!

Going forward

  1. Try BS-SNPer and EpiDiverse for SNP extraction from WGBS data
  2. Obtain preliminary methylation assessment from methylKit
  3. Test-run DSS and ramwas
  4. Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
  5. Transfer scripts used to a nextflow workflow
  6. Update methods
  7. Update results

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3qeks0e
via IFTTT