Hawaii Gigas Methylation Analysis Part 12

Identifying SNPs

Now that samtools is installed and I merged BAM files, I’m ready to identify SNPs with BS-Snper! I’m going to identify SNPs from the merged BAM file and from individual BAM files. Steven suggested that once I have SNP vcf files, I could use bedtools intersect to figure out where the SNPs overlap with the CG track and DML I’ll identify later, so I won’t need to filter SNPs once I identify them.

Merged SNPs

First, I decided to identify SNPs from the merged BAM file I created with samtools merge. Based on Mac’s suggestion, I ran BS-Snper with the default parameters. However, I set --mincov 5 instead of using the default coverage of 10 to be consistent with bismark and methylKit parameters.

#Defaults: --minhetfreq 0.1 --minhomfreq 0.85 --minquali 15 --maxcover 1000 --minread2 2 --errorate 0.02 --mapvalue 20 #Saving output to gannet !perl /Users/Shared/Apps/BS-Snper-master/BS-Snper.pl \ --fa /Volumes/web/spartina/project-oyster-oa/data/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa \ --input /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/merged-sorted-deduplicated.bam \ --output /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/SNP-candidates.txt \ --methcg /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/CpG-meth-info.tab \ --methchg /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/CHG-meth-info.tab \ --methchh /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/CHH-meth-info.tab \ --mincover 5 \ > /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/SNP-results.vcf 2> merged.ERR.log 

I wrote the output to this gannet folder instead of trying to deal with large files my Github repository.

Individual SNPs

Since I ran the script successfully, I moved onto SNP identification from individual files! I modified a loop from Mac to identify SNPs in my files. First, I changed my working directory to the one with BAM files. Within the same code chunk, I needed to create variables for the file names, then use the modified file names within the BS-Snper loop:

%%bash FILES=$(ls *deduplicated.sorted.bam) echo ${FILES} for file in ${FILES} do NAME=$(echo ${file} | awk -F "." '{print $1}') echo ${NAME} perl /Users/Shared/Apps/BS-Snper-master/BS-Snper.pl \ --fa /Volumes/web/spartina/project-oyster-oa/data/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa \ --input ${NAME}.deduplicated.sorted.bam \ --output ${NAME}.SNP-candidates.txt \ --methcg ${NAME}.CpG-meth-info.tab \ --methchg ${NAME}.CHG-meth-info.tab \ --methchh ${NAME}.CHH-meth-info.tab \ --mincover 5 \ > ${NAME}.SNP-results.vcf 2> ${NAME}.ERR.log done 

Initially I specified --fa with a FASTA variable, but I kept running into an error! I then remembered that within Jupyter notebooks I had issues specifying variables within loops, so I used the absolute file path instead. Once I had all the files, I moved them to the BS-Snper directory on gannet.

# Move files to BS-Snper directory !mv *SNP-candidates.txt *meth-info.tab *SNP-results.vcf *ERR.log ../BS-Snper/ 

Katherine suggested I merge with vcf-merge, but I don’t think I need to do that if I could run a loop with bedtools intersect to determine if any individual SNPs overlap with DML. Now that I identified SNPs for this dataset, I can do something similar with the Manchester data! I’m also going to try EpiDiverse/snp since it would give me a relatedness matrix in addition to SNP variants.

Going forward

  1. Try EpiDiverse/snp for SNP extraction from WGBS data
  2. Run methylKit script on mox
  3. Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
  4. Test-run DSS and ramwas
  5. Transfer scripts used to a nextflow workflow
  6. Update methods
  7. Update results

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2PCHWjo