Ran BUSCO on our completed annotation of the [P.generosa v071 genome] (GFF)(https://ift.tt/2GYMI4Jgeoduck_maker_genome_annotation/Pgenerosa_v071_genome_snap02.all.renamed.putative_function.domain_added.gff) (subset of sequences >10kbp). See this notebook entry for genome annotation info. This provides a nice metric on how “complete” a genome assembly (or transcriptome) is. Additionally, BUSCO is tied in with Augustus for gene prediction and generates _ab initio gene models. With that said, since I just want to evaluate the completeness of this particular genome assembly, I’ll be using the annotated genome generated through two rounds of SNAP gene prediction. Otherwise, I’d use the initial MAKER annotations to generate an Augustus gene model that could be used in conjuction with the SNAP models (I’ll likely do this at a later date).
Firstly, I needed a FastA as input for BUSCO, so I extracted the FastA from the GFF with the following script:
#!/bin/env bash # Script to extract FastA sequences from GFF3 (specifically, those produced by MAKER) # User needs to set GFF path and desired output file name #