Sam’s Notebook: Metagenomics – Taxonomic Diversity from Geoduck Water with BLASTp and Krona plots

we’re working on getting the metagenomics sequencing project written up as a manuscript and Steven asked me to provide an overview of the taxonomic makeup of our metagenome assembly in this GitHub Issue.

I previously assembled all of the sequencing data in to a single assembly (i.e. did not assemble by experimental treatments):

Subsequently, I ran some gene prediction software to help refine the assembly in to a more conservative representation, in hopes of getting a more realistic view of biologically relevant DNA (i.e. analyzing sequenced DNA that actually has putative functions, as opposed to random eDNA that may have been floating around in the water):

For getting taxonomic info, I took the MetaGeneMark proteins FastA file and ran BLASTp against the NCBI SwissProt database (v5) to get taxonomic IDs. See this Jupyter Notebook (GitHub):

This was followed up by using Krona to plot the data in an interactive fashion, according to NCBI taxonomic ID abundance (see Results below).