Shelly’s Notebook: Fri. Aug. 16, 2019 Geoduck genome paper BS analysis

Methylkit analysis of steven’s alignments

  1. copy data to emu
      srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/$ mkdir dedup_bams srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ cd dedup_bams/ srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-205_S26_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-206_S27_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-214_S30_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-215_S31_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-220_S32_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-221_S33_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-226_S34_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam srlab@emu:~/GitHub/Shelly_Pgenerosa/analyses/JuviPgen_ALSL2lowd145/Get_DMRs_for_SR_v074_alignments/dedup_bams$ wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807-004/EPI-227_S35_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.sorted.bam  
  2. Create new methylkit R project and perform analysis

IGV analysis

  • Yaamini gave me a quick tutorial about how she used IGV to visualize differential methylation (thanks Yaamini!)
  • Jupyter notebook for preparation of files to load into IGV here: 20190816_Pgnrv074_DMRs_in_IGV.ipynb
    • Summary of analysis: I did a bedtools intersect between hypo- + hyper-DMRs and coverage files (filtering for positions that have 3x coverage).
    • IGV session here: 20190816_Pgnrv074_DMRs.xml
      • I loaded in both percent methylation and number of C’s at each position overlapping with DMRs for samples:
        • 205(amb.-low)
        • 206(amb.-low)
        • 214(Super.low-low)
        • 215(Super.low-low)
        • 220(Super.low-low)
        • 221(Super.low-low)
        • 226(amb.-low)
        • 227 (amb.-low)
      • I loaded in CDS, mRNA, and gene tracks for v074 genome.
      • I loaded in DMRs (as regions)
      • Here’s an examples of what the diff. meth. looks like for scaffold 9 completely zoomed out:
        • Screen%20Shot%202019-08-16%20at%208.40.59%20PM.png
        • the first 8 tracks are the number of C’s at each position. The first 4 tracks are the ambient-low group and the following 4 are the Super.low-low group.
        • The bottom 8 tracks are the percent methylation at each position order the same way as the first 8 tracks.
      • Here is one DMR zoomed in:
        • Screen%20Shot%202019-08-16%20at%208.39.11%20PM.png
          • This looks like the ambient-low group shows hypermethylation in this DMR compared to the Super.low-low group (see the first 4 of the bottom 8 tracks), HOWEVER there doesn’t seem to be any coverage of this DMR in the Super.low-low group where there is 4x coverage of it in each of the samples from the ambient-low group. So this seems like a false positive since the DMR was called because of lack of coverage. This is an issue that Yaamini encountered in here IGV diff. meth. analysis of C.virginica data
          • To weed out instances like this, I need to look into:
            • some kind of normalization before comparing reads?
            • stricter methylkit parameters for calling DMRs
            • lit on how people do DMR analysis to avoid these false positives

from shellytrigg https://ift.tt/2YXANxB
via IFTTT

Sam’s Notebook: Genome Comparison – Pgenerosa_v074 vs Pgenerosa_v070 with MUMmer Promer on Mox

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Basically, MUMmer promer will take a query genome assembly (Pgenerosa_v074 in this case), translate it into all six reading frames, align it to a reference genome, and determine contig similarities/differences. So, this should provide further insight on what is happening in Pgenerosa_v074 Scaffold 1, when compared to related (or different) species’ genomes.

I’ve previously run nucmer (for nucleotide comparisons) and thought it might be a good idea to compare these sequences at the protein level, since there is usually a greater degree of sequence conservation at the protein level than at the nucleotide level – thus an increased possibility of alignment across these different species.

INPUT FILES:

Query:

Reference genome (Panopea generosa – Pacific geoduck):

This was run using MUMmer v3.23 on Mox using the SBATCH script below.

NOTE: The previous nucleotide comparison using nucmer was performed using MUMmer4 nucmer– however, MUMmer4 promer throws an error every time I tried to run it (even after attempting to patch the promer.pl script).

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=promer_pgen074_vs_pgen070 ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20190813_pgen_mummer_promer_pgen-v074_pgen-v070 # Exit if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log ### Set variables # Filename prefix prefix="pgen-v074_pgen-v074" pga1_coords="PGA_scaffold1.coords.txt" # Program paths promer="/gscratch/srlab/programs/MUMmer3.23/promer" show_coords="/gscratch/srlab/programs/MUMmer3.23/show-coords" # P.generosa Pgenerosa_v070 FastA pgen_v070_fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Pgenerosa_v070.fa" # P.generosa Pgenerosa_v074 FastA pgen_v074_fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Pgenerosa_v074.fa" ### Run MUMmer (promer) # Compares pgen_v074 (query) to pgen_v070 (reference) "${promer}" \ -p "${prefix}" \ "${pgen_v070_fasta}" \ "${pgen_v074_fasta}" # Parse promer delta output into more userfriendly format # -b useful for syteny - merges overlapping alignments # -c show percent coverage info # -q option sorts by query "${show_coords}" \ -b \ -c \ -q \ "${prefix}".delta \ > "${prefix}".coords.txt # Parse out PGA_scaffold1__77_contigs__length_89643857 head -n 5 "${prefix}".coords.txt > "${pga1_coords}" grep "PGA_scaffold1__77_contigs__length_89643857" "${prefix}".coords.txt >> "${pga1_coords}"  

Sam’s Notebook: Genome Comparison – Pgenerosa_v074 vs S.glomerata NCBI with MUMmer Promer on Mox

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Basically, MUMmer promer will take a query genome assembly (Pgenerosa_v074 in this case), translate it into all six reading frames, align it to a reference genome, and determine contig similarities/differences. So, this should provide further insight on what is happening in Pgenerosa_v074 Scaffold 1, when compared to related (or different) species’ genomes.

I’ve previously run nucmer (for nucleotide comparisons) and thought it might be a good idea to compare these sequences at the protein level, since there is usually a greater degree of sequence conservation at the protein level than at the nucleotide level – thus an increased possibility of alignment across these different species.

INPUT FILES:

Query:

Reference genome (Saccostrea glomerata – Sydney rock oyster):

This was run using MUMmer v3.23 on Mox using the SBATCH script below.

NOTE: The previous nucleotide comparison using nucmer was performed using MUMmer4 nucmer– however, MUMmer4 promer throws an error every time I tried to run it (even after attempting to patch the promer.pl script).

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=promer_pgen074_vs_sglo-ncbi ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20190813_pgen_mummer_promer_pgen-v074_sglo-ncbi # Exit if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log ### Set variables # Filename prefix prefix="pgen-v074_sglo-ncbi" pga1_coords="PGA_scaffold1.coords.txt" # Program paths promer="/gscratch/srlab/programs/MUMmer3.23/promer" show_coords="/gscratch/srlab/programs/MUMmer3.23/show-coords" # M.glomerata NCBI FastA sglo_fasta="/gscratch/srlab/sam/data/S_glomerata/genomes/GCA_003671525.1_Sgl1.0_genomic.fna" # P.generosa Pgenerosa_v074 FastA pgen_v074_fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Pgenerosa_v074.fa" ### Run MUMmer (promer) # Compares pgen_v074 (query) to sglo-ncbi (reference) "${promer}" \ -p "${prefix}" \ "${sglo_fasta}" \ "${pgen_v074_fasta}" # Parse promer delta output into more userfriendly format # -b useful for syteny - merges overlapping alignments # -c show percent coverage info # -q option sorts by query "${show_coords}" \ -b \ -c \ -q \ "${prefix}".delta \ > "${prefix}".coords.txt # Parse out PGA_scaffold1__77_contigs__length_89643857 head -n 5 "${prefix}".coords.txt > "${pga1_coords}" grep "PGA_scaffold1__77_contigs__length_89643857" "${prefix}".coords.txt >> "${pga1_coords}"  

Sam’s Notebook: Genome Comparison – Pgenerosa_v074 vs M.yessoensis NCBI with MUMmer Promer on Mox

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Basically, MUMmer promer will take a query genome assembly (Pgenerosa_v074 in this case), translate it into all six reading frames, align it to a reference genome, and determine contig similarities/differences. So, this should provide further insight on what is happening in Pgenerosa_v074 Scaffold 1, when compared to related (or different) species’ genomes.

I’ve previously run nucmer (for nucleotide comparisons) and thought it might be a good idea to compare these sequences at the protein level, since there is usually a greater degree of sequence conservation at the protein level than at the nucleotide level – thus an increased possibility of alignment across these different species.

INPUT FILES:

Query:

Reference genome (Mizuhopecten yessoensis – Yesso scallop):

This was run using MUMmer v3.23 on Mox using the SBATCH script below.

NOTE: The previous nucleotide comparison using nucmer was performed using MUMmer4 nucmer– however, MUMmer4 promer throws an error every time I tried to run it (even after attempting to patch the promer.pl script).

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=promer_pgen074_vs_myes-ncbi ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20190813_pgen_mummer_promer_pgen-v074_myes-ncbi # Exit if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log ### Set variables # Filename prefix prefix="pgen-v074_myes-ncbi" pga1_coords="PGA_scaffold1.coords.txt" # Program paths promer="/gscratch/srlab/programs/MUMmer3.23/promer" show_coords="/gscratch/srlab/programs/MUMmer3.23/show-coords" # M.yessoensis NCBI FastA myes_fasta="/gscratch/srlab/sam/data/M_yessoensis/genomes/GCA_002113885.2_ASM211388v2_genomic.fna" # P.generosa Pgenerosa_v074 FastA pgen_v074_fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Pgenerosa_v074.fa" ### Run MUMmer (promer) # Compares pgen_v074 (query) to myes-ncbi (reference) "${promer}" \ -p "${prefix}" \ "${myes_fasta}" \ "${pgen_v074_fasta}" # Parse promer delta output into more userfriendly format # -b useful for syteny - merges overlapping alignments # -c show percent coverage info # -q option sorts by query "${show_coords}" \ -b \ -c \ -q \ "${prefix}".delta \ > "${prefix}".coords.txt # Parse out PGA_scaffold1__77_contigs__length_89643857 head -n 5 "${prefix}".coords.txt > "${pga1_coords}" grep "PGA_scaffold1__77_contigs__length_89643857" "${prefix}".coords.txt >> "${pga1_coords}"  

Sam’s Notebook: Genome Comparison – Pgenerosa_v074 vs H.sapiens NCBI with MUMmer Promer on Mox

Sam’s Notebook: Genome Comparison – Pgenerosa_v074 vs C.virginica NCBI with MUMmer Promer on Mox

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Basically, MUMmer promer will take a query genome assembly (Pgenerosa_v074 in this case), translate it into all six reading frames, align it to a reference genome, and determine contig similarities/differences. So, this should provide further insight on what is happening in Pgenerosa_v074 Scaffold 1, when compared to related (or different) species’ genomes.

I’ve previously run nucmer (for nucleotide comparisons) and thought it might be a good idea to compare these sequences at the protein level, since there is usually a greater degree of sequence conservation at the protein level than at the nucleotide level – thus an increased possibility of alignment across these different species.

INPUT FILES:

Query:

Reference genome (Crassostrea virginica – Eastern oyster):

This was run using MUMmer v3.23 on Mox using the SBATCH script below.

NOTE: The previous nucleotide comparison using nucmer was performed using MUMmer4 nucmer– however, MUMmer4 promer throws an error every time I tried to run it (even after attempting to patch the promer.pl script).

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=promer_pgen074_cvir-ncbi ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20190813_pgen_mummer_promer_pgen-v074_cvir-ncbi # Exit if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log ### Set variables # Filename prefix prefix="pgen-v074_cvir-ncbi" pga1_coords="PGA_scaffold1.coords.txt" # Program paths promer="/gscratch/srlab/programs/MUMmer3.23/promer" show_coords="/gscratch/srlab/programs/MUMmer3.23/show-coords" # C.virginica NCBI FastA cvir_fasta="/gscratch/srlab/sam/data/C_virginica/genomes/GCF_002022765.2_C_virginica-3.0/GCF_002022765.2_C_virginica-3.0_genomic.fa" # P.generosa Pgenerosa_v074 FastA pgen_v074_fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Pgenerosa_v074.fa" ### Run MUMmer (promer) # Compares pgen_v074 (query) to cvir-ncbi (reference) "${promer}" \ -p "${prefix}" \ "${cvir_fasta}" \ "${pgen_v074_fasta}" # Parse promer delta output into more userfriendly format # -b useful for syteny - merges overlapping alignments # -c show percent coverage info # -q option sorts by query "${show_coords}" \ -b \ -c \ -q \ "${prefix}".delta \ > "${prefix}".coords.txt # Parse out PGA_scaffold1__77_contigs__length_89643857 head -n 5 "${prefix}".coords.txt > "${pga1_coords}" grep "PGA_scaffold1__77_contigs__length_89643857" "${prefix}".coords.txt >> "${pga1_coords}"  

Sam’s Notebook: Genome Comparison – Pgenerosa_v074 vs C.gigas NCBI with MUMmer Promer on Mox

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Basically, MUMmer promer will take a query genome assembly (Pgenerosa_v074 in this case), translate it into all six reading frames, align it to a reference genome, and determine contig similarities/differences. So, this should provide further insight on what is happening in Pgenerosa_v074 Scaffold 1, when compared to related (or different) species’ genomes.

I’ve previously run nucmer (for nucleotide comparisons) and thought it might be a good idea to compare these sequences at the protein level, since there is usually a greater degree of sequence conservation at the protein level than at the nucleotide level – thus an increased possibility of alignment across these different species.

INPUT FILES:

Query:

Reference genome (Crassostrea gigas – Pacific oyster):

This was run using MUMmer v3.23 on Mox using the SBATCH script below.

NOTE: The previous nucleotide comparison using nucmer was performed using MUMmer4 nucmer– however, MUMmer4 promer throws an error every time I tried to run it (even after attempting to patch the promer.pl script).

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=promer_pgen074_vs_cgig-ncbi ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20190813_pgen_mummer_promer_pgen-v074_cgig-ncbi # Exit if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Load Open MPI module for parallel, multi-node processing module load icc_19-ompi_3.1.2 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log ### Set variables # Filename prefix prefix="pgen-v074_cgig-ncbi" pga1_coords="PGA_scaffold1.coords.txt" # Program paths promer="/gscratch/srlab/programs/MUMmer3.23/promer" show_coords="/gscratch/srlab/programs/MUMmer3.23/show-coords" # C.gigas NCBI FastA cgig_fasta="/gscratch/srlab/sam/data/C_gigas/genomes/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa" # P.generosa Pgenerosa_v074 FastA pgen_v074_fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Pgenerosa_v074.fa" ### Run MUMmer (promer) # Compares pgen_v074 (query) to cgig-ncbi (reference) "${promer}" \ -p "${prefix}" \ "${cgig_fasta}" \ "${pgen_v074_fasta}" # Parse promer delta output into more userfriendly format # -b useful for syteny - merges overlapping alignments # -c show percent coverage info # -q option sorts by query "${show_coords}" \ -b \ -c \ -q \ "${prefix}".delta \ > "${prefix}".coords.txt # Parse out PGA_scaffold1__77_contigs__length_89643857 head -n 5 "${prefix}".coords.txt > "${pga1_coords}" grep "PGA_scaffold1__77_contigs__length_89643857" "${prefix}".coords.txt >> "${pga1_coords}"