In the midst of my existential dread and ennui, I never reviewed my
bismark output! I was able to align samples to the genome, then ran
multiqc to get summary statistics. I saved
bismark output here, and have HTML reports in this repo.
I then looked at the MultiQC report. For all samples, alignment was around 62-63%, which is consistent with the Hawaii samples! About 25% of the reads for each sample were duplicates that were removed from the alignments. This was greater than the Hawaii samples, but makes sense considering that I was using low-quality DNA from gonad in histology blocks. I noticed that samples 5, 7, and 8 had lower percent duplication than the rest of the samples, which is concerning since those are all from the ambient treatment.
There aren’t any big red flags that I can see in the report, so I’ll move forward with
methylKit while also aligning to the new C. gigas genome.
- Align to new C. gigas genome
- Identify DML with
- Identify SNPs in WGBS data
- Write methods
- Write results
- Identify DML
- Determine if RNA should be extracted
- Determine if larval DNA/RNA should be extracted
from the responsible grad student https://ift.tt/3sGpXGy
Aligning to the new C. gigas genome
There’s a new C. gigas genome! Steven mentioned Mac may have been using the new genome, so I posted this issue to get a link to it. Although Mac never used the Roslin genome, Steven thought it would be a good idea to align to it. It has linkage groups, not just scaffolds, which is better for mapping and understanding my output. It has 10 linkage groups, which is similar to the 10 C. virginica chromosomes! My goal is to align the samples to this version of the genome instead, and move forward with these alignments in the main workflow. I’m contemplating using the previously-generated alignments as well as the new ones just to compare.
Adding the mitochondrial genome
The first thing I needed to do was add the mitochondrial genome sequence to the Roslin genome. I couldn’t figure out how to download the FASTA (NCBI why is your website so confusing). I opened up
nano and copied and pasted the sequence into a new file, then saved it as a .fa and hoped for the best. Steven had the Roslin genome in one of his directories, so I copied it over to my
/gscratch/scrubbed/yaamini/Haws/data/Cg-roslin folder. I appended the mitochondrial sequence to the Roslin genome using
cat, and saved it as a new .fa file. I looked at the file with
less and the format looked like a FASTA file, so I decided to proceed with
I created this script to prepare a bisulfite genome and align samples. I referred to this old Jupyter notebook for the genome preparation arguments. I also used
rsync to move the FASTQ files from
mox so I could run the script. I submitted the script to
mox and it’s in the queue to run!
- Try BS-SNPer and EpiDiverse for SNP extraction from WGBS data
- Obtain preliminary methylation assessment from
- Test-run DSS and ramwas
- Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
- Transfer scripts used to a nextflow workflow
- Update methods
- Update results
from the responsible grad student https://ift.tt/3qeks0e