Genome Annotation – P.generosa v1.0 Assembly Using BLASTn for BlobToolKit on Mox

To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run BLASTn according to the BlobToolKit “Getting Started” guide on Mox.

SBATCH script (GitHub):


#!/bin/bash ## Job Name #SBATCH --job-name=20210415_pgen_blastn-nt_Panopea-generosa-v1.0 ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=10-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20210415_pgen_blastn-nt_Panopea-generosa-v1.0 ### BLASTn of P.generosa genome assembly Panopea-generosa-v1.0.fa ### against NCBI nt database. ### In preparation for use in BlobTools2 ################################################################################### # These variables need to be set by user # Set number of CPUs to use threads=40 # Input/output files fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Panopea-generosa-v1.0.fa" blast_db="/gscratch/srlab/blastdbs/20210401_ncbi_nt/nt" # Programs blastn="/gscratch/srlab/programs/ncbi-blast-2.10.1+/bin/blastn" # Programs associative array declare -A programs_array programs_array=( [blastn]="${blastn}" ) ################################################################################### # Exit script if any command fails set -e # Run BLASTn with custom format/settings for use in blobtools2 ${programs_array[blastn]} \ -db ${blast_db} \ -query ${fasta} \ -outfmt "6 qseqid staxids bitscore std" \ -max_target_seqs 10 \ -max_hsps 1 \ -evalue 1e-25 \ -num_threads ${threads} \ -out Panopea-generosa-v1.0_blobtools2_blast.out ################################################################################### # Capture program options echo "Logging program options..." for program in "${!programs_array[@]}" do { echo "Program options for ${program}: " echo "" # Handle samtools help menus if [[ "${program}" == "samtools_index" ]] \ || [[ "${program}" == "samtools_sort" ]] \ || [[ "${program}" == "samtools_view" ]] then ${programs_array[$program]} # Handle DIAMOND BLAST menu elif [[ "${program}" == "diamond" ]]; then ${programs_array[$program]} help # Handle NCBI BLASTx menu elif [[ "${program}" == "blastx" ]] \ || [[ "${program}" == "blastn" ]]; then ${programs_array[$program]} -help fi ${programs_array[$program]} -h echo "" echo "" echo "