FastQC-MultiQc – C.gigas Ploidy WGBS Raw Sequence Data from Ronits Project on Mox

Earlier today, we received the C.gigas ploidy WGBS data that we submitted to ZymoResearch on 20200820.

As part of our usual work flow, I needed to run FastQC.

Ran FastQC on Mox.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=20201110_cgig_fastqc_ronit-ploidy-wgbs ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=10-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20201110_cgig_fastqc_ronit-ploidy-wgbs ### FastQC assessment of raw sequencing from Ronit's ploidy WGBS. ################################################################################### # These variables need to be set by user # FastQC output directory output_dir=$(pwd) # Set number of CPUs to use threads=28 # Input/output files checksums=fastq_checksums.md5 fastq_list=fastq_list.txt raw_reads_dir=/gscratch/srlab/sam/data/C_gigas/wgbs/ # Paths to programs fastqc=/gscratch/srlab/programs/fastqc_v0.11.9/fastqc multiqc=/gscratch/srlab/programs/anaconda3/bin/multiqc # Programs associative array declare -A programs_array programs_array=( [fastqc]="${fastqc}" \ [multiqc]="${multiqc}" ) ################################################################################### # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Sync raw FastQ files to working directory rsync --archive --verbose \ "${raw_reads_dir}"zr3534*.fq.gz . # Populate array with FastQ files fastq_array=(*.fq.gz) # Pass array contents to new variable fastqc_list=$(echo "${fastq_array[*]}") # Run FastQC # NOTE: Do NOT quote ${fastqc_list} ${programs_array[fastqc]} \ --threads ${threads} \ --outdir ${output_dir} \ ${fastqc_list} # Create list of fastq files used in analysis echo "${fastqc_list}" | tr " " "\n" >> ${fastq_list} # Generate checksums for reference while read -r line do # Generate MD5 checksums for each input FastQ file echo "Generating MD5 checksum for ${line}." md5sum "${line}" >> "${checksums}" echo "Completed: MD5 checksum for ${line}." echo "" # Remove fastq files from working directory echo "Removing ${line} from directory" rm "${line}" echo "Removed ${line} from directory" echo "" done < ${fastq_list} # Run MultiQC ${programs_array[multiqc]} . # Capture program options for program in "${!programs_array[@]}" do { echo "Program options for ${program}: " echo "" # Handle samtools help menus if [[ "${program}" == "samtools_index" ]] \ || [[ "${program}" == "samtools_sort" ]] \ || [[ "${program}" == "samtools_view" ]] then ${programs_array[$program]} fi ${programs_array[$program]} -h echo "" echo "" echo "

Data Received – C.gigas Ploidy WGBS from Ronits Project via ZymoResearch

We received the data from our whole genome bisulfite sequencing (WGBS) submission to ZymoResearch on 2020820 for Ronit’s C.gigas diploid/triploid dessication/heat stress project.

Samples were sequenced using 150bp paired-end, on the Illumina NovaSeq.

Files have been added to the C.gigas folder in nightingales on Owl (Synology server).

I’ve updated the nightingales Google Sheet database as well.

Next up:

  • Run FatQC
  • Submit to NCBI sequence read archive (SRA).
SeqID Library_Name Tissue Ploidy Dessication Heat_Stress
zr3534_1 D11-C ctenidia diploid yes no
zr3534_2 D12-C ctenidia diploid yes no
zr3534_3 D13-C ctenidia diploid yes no
zr3534_4 D19-C ctenidia diploid yes yes
zr3534_5 D20-C ctenidia diploid yes yes
zr3534_6 T11-C ctenidia triploid yes no
zr3534_7 T12-C ctenidia triploid yes no
zr3534_8 T13-C ctenidia triploid yes no
zr3534_9 T19-C ctenidia triploid yes yes
zr3534_10 T20-C ctenidia triploid yes yes

from Sam’s Notebook https://ift.tt/3f4nT5T
via IFTTT