Steven asked to subset the Pgenerosa_v070.fa (2.1GB) in this GitHub Issue #705. In that issue, it was determined that a significant portion of the sequencing data that was assembled by Phase Genomics clustered in “scaffolds” 1 – 18. As such, Steven asked to subset just those 18 scaffolds.
This was done by using the
samtools faidx program.
Process is documented in the following Jupyter Notebook (GitHub):