Sam’s Notebook:Read Mapping – Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2


We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.

So, we wanted to get a quick idea of how well our geoduck assemblies are by performing some quick alignments using Bowtie2.

Used the following assemblies as references:

  • sn_ph_01 : SuperNova assembly of 10x Genomics data
  • sparse_03 : SparseAssembler assembly of BGI and Illumina project data
  • pga_02 : Hi-C assembly of Phase Genomics data

The analysis is documented in a Jupyter Notebook.

Jupyter Notebook (GitHub):

NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.

Here’s a brief overview of the process:

  1. Generate Bowtie2 indexes for each of the genome assemblies.
  2. Map 1,000,000 reads from the following Illumina NovaSeq FastQ files: