Read Mapping – C.bairdi 201002558-2729-Q7 and 6129-403-26-Q7 Taxa-Specific NanoPore Reads to cbai_genome_v1.01.fasta Using Minimap2 on Mox

After extracting FastQ reads using seqtk on 20201013 from the various taxa I had been interested in, the next thing needed doing was mapping reads to the cbai_genome_v1.01 “genome” assembly from 20200917. I found that Minimap2 will map long reads (e.g. NanoPore), in addition to short reads, so I decided to give that a rip.

Minimap2 was run on Mox.

SBATCH script (GitHub):

#!/bin/bash ## Job Name #SBATCH --job-name=20201014__cbai_minimap_nanopore-megan6-taxa-reads ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=15-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20201014_cbai_minimap_nanopore-megan6-taxa-reads ################################################################################### # These variables need to be set by user ## Assign Variables # CPU threads to use threads=27 # Genome FastA path genome_fasta=/gscratch/srlab/sam/data/C_bairdi/genomes/cbai_genome_v1.01.fasta # Paths to programs minimap2="/gscratch/srlab/programs/minimap2-2.17_x64-linux/minimap2" samtools="/gscratch/srlab/programs/samtools-1.10/samtools" # Programs array declare -A programs_array programs_array=( [minimap2]="${minimap2}" \ [samtools_sort]="${samtools} sort" \ [samtools_view]="${samtools} view" ) ################################################################################### # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Capture date timestamp=$(date +%Y%m%d) # Loop through each FastQ for fastq in *.fq do # Parse out sample name sample=$(echo "${fastq}" | awk -F"_" '{print $2}') # Caputure taxa taxa=$(echo "${fastq}" | awk -F"_" '{print $3}') # Capture filename prefix prefix="${timestamp}_${sample}_${taxa}" # Run Minimap2 with Oxford NanoPore Technologies (ONT) option # Using SAM output format (-a option) ${programs_array[minimap2]} \ -ax map-ont \ ${genome_fasta} \ ${fastq} \ | ${programs_array[samtools_sort]} --threads ${threads} \ -O sam \ > "${prefix}".sorted.sam # Capture FastA checksums for verification () echo "Generating checksum for ${fastq}" md5sum "${fastq}" > fastq_checksums.md5 echo "Finished generating checksum for ${fastq}" echo "" done # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log # Capture program options ## Note: Trinity util/support scripts don't have options/help menus for program in "${!programs_array[@]}" do { echo "Program options for ${program}: " echo "" ${programs_array[$program]} --help echo "" echo "" echo "