Sam’s Notebook:Assembly – Geoduck Hi-C Assembly Subsetting

0000-0002-2747-368X

Steven asked me to create a couple of subsets of our Phase Genomics Hi-C geoduck genome assembly:

  • Contigs >10kbp
  • Contigs >30kbp

I used pyfaidx and the following commands:

 faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta  
 faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta  
Results:

Output folder: 20180512_geoduck_fasta_subsets/

10kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_10k_plus.fasta

30kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_30k_plus.fasta

from Sam’s Notebook https://ift.tt/2IhZSHg
via IFTTT