Sam’s Notebook: Data Wrangling – Renaming, Splitting, and Feature Counts of Updated Pgenerosa_v074 GenSAS Merged GFF

In the final GFF from our GenSAS Pgenerosa_v074.a4 annotation , we noticed that there were no repeat motifs/sequences identified on Scaffold 01. The remaining scaffolds all had repeat motifs present on them, so something seemed amiss (see this GitHub Issue for more info).

I ended up contacting GenSAS and it turned out there was a bug on their end that led to this issue:

Taein Lee Nov 26, 2019, 7:27 PM (8 days ago) to me, jhumann

Hi Sam,

Thank you so much for your report. There was a bug and it has been fixed. Your gff3 files has been re-generated.

-Taein From: gensas-admin on behalf of sam white Sent: Tuesday, November 26, 2019 3:45 PM To: gensas-admin; jhumann; taein_lee Subject: [Website feedback] Merged GFF missing repeats on only one chromosome

Sam ( sent a message using the contact form at


I generated a merged GFF after I “published” my annotation. I included RepeatModeler features in the merged GFF.

My genome has 18 chromosomes. All of them except one chromosome (name: PGA_scaffold1__77_contigs__length_89643857) has the expected repeats annotations present.

I looked at the individual RepeatMasker and RepeatModeler jobs, and both of those GFFs identified repeats on PGA_scaffold1__77_contigs__length_89643857.

Would you happen to have any ideas on why PGA_scaffold1__77_contigs__length_89643857 isn’t showing any repeat features in the merged GFF?>

This is for my project Pgenerosa_v074.

Thanks for any insight!


So, now that I have the updated, final GFF, I want to re-run the GFF splitting into separate feature files, as well as counts and sequence length stats for all features (including repeats).

Everything is documented in this Jupyter Notebook (GitHub):

Sam’s Notebook: PCR – Crassostrea gigas and sikamea Mantle gDNA from Marinelli Shellfish Company

I ran this PCR a couple of times before and, embarrassingly, I had ordered/used the wrong primers.

Well, I ordered the correct universal cytochrome oxidase primers and used those!

SR ID Primer Name Sequence
1739 HC02198 taaacttcagggtgaccaaaaaatca
1738 LCO1490 ggtcaacaaatcataaagatattgg

Primers and cycling parameters were taken from this publication:

Universal cytochrome oxidase primers were from this paper:

This is a multiplex PCR, where the HC02198 and LCO1490 primers should amplify any Crassostrea spp. DNA (i.e. a positive control – 697bp) and the other two primers will amplify either C.gigas (Cgi269r – 269bp) or C.sikamea (Csi546r – 546bp).

Master mix calcs:

Component Single Rxn Vol. (uL) Num. Rxns Total Volumes (uL)
2x Apex Master Mix 12.5 18 225
HC02198 (100uM) 0.15 18 2.7
LCO1490 (100uM) 0.15 18 2.7
COCgi269r (100uM) 0.1 18 1.8
COCsi546r (100uM) 0.1 18 1.8
H2O 8 18 144
25 Add 21uL to each PCR tube

Cycling params:

95oC for 10mins

30 cycles of:

  • 95oC 1min
  • 51oC 1min
  • 72oC 1min

72oC 10mins

PCR reactions were run on a 1.5% agarose, 1x low TAE gel with ethidium bromide.

Used the GeneRuler DNA Ladder Mix (ThermoFisher) for all gels:

GeneRuler DNA Ladder Mix