Sam’s Notebook: Annotation – Olurida_v081 MAKER Proteins InterProScan5 on Mox

Continuation of genome annotation of the Olympia oyster genome. Determined initial gene models using MAKER with two rounds of SNAP and then performed protein-level annotations using BLASTp. Next, I’m going to run InterProScan5 (IPS5) to help functionally characterize the O.lurida proteins ID’d by MAKER. Once this is complete, I’ll use MAKER to incorporate the IPS5 and BLASTp results into a much more neatly (i.e. human-readable) annotated genome!

THE IPS5 analysis was performed on Mox with the following SBATCH script:

  #!/bin/bash ## Job Name #SBATCH --job-name=interproscan ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=2 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=15-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190107_oly_maker_interproscan # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log # Variables interproscan=/gscratch/srlab/programs/interproscan-5.31-70.0/interproscan.sh maker_prot_fasta=/gscratch/scrubbed/samwhite/outputs/20181127_oly_maker_genome_annotation/Olurida_v081.all.maker.proteins.fasta # Run InterProScan 5 ## disable-precalc since this requires external database access (which Mox does not allow) ${interproscan} \ --input ${maker_prot_fasta} \ --goterms \ --disable-precalc 

Sam’s Notebook: VCF Splitting – C.virginica VCF Using BCFtools

Steven asked that I split up a Crassostrea virginica VCF file:

Overview of process:

  1. Downloaded file.
  2. Gunzipped file.
  3. Sorted and bgzipped file.
  4. Indexed sorted file with tabix.
  5. Filled AN/AC values with bcftools AN/AC fill plugin.
  6. BCFtools to split sorted/filled VCF in to individual VCF files.

The entire process is documented in the Jupyter Notebook linked below.

Jupyter Notebook (GitHub):