I previously performed this analysis using a different version of our Ostrea lurida genome assembly. Steven asked that I repeat the analysis with a modified version of the genome assembly (Olurida_v081) – only has contigs >1000bp in length.
Genome used: Olurida_v081
I ran RepeatMasker (v4.07) with RepBase-20170127 and RMBlast 2.6.0 four times:
- Default settings (i.e. no species select – will use human genome).
- Species = Crassostrea gigas (Pacific oyster)
- Species = Crassostrea virginica (Eastern oyster)
- Species = Ostrea lurida (Olympia oyster)
The idea was to get a sense of how the analyses would differ with species specifications. However, it’s likely that the only species setting that will make any difference will be Run #2 (Crassostrea gigas).
The reason I say this is that RepeatMasker has a built in tool to query which species are available in the RepBase database (e.g.):
RepeatMasker-4.0.7/util/queryRepeatDatabase.pl -species "crassostrea virginica" -stat
Here’s a very brief overview of what that yields:
- Crassotrea gigas: 792 specific repeats
- Crassostrea virginica: 4 Crassostrea virginica specific repeats
- Ostrea lurida: 0 Ostrea lurida specific repeats
All runs were performed on roadrunner.
All commands were documented in a Jupyter Notebook (GitHub):
NOTE: RepeatMasker writes the desired output files (*.out, *.cat.gz, and *.gff) to the same directory that the genome is located in! If you conduct multiple runs with the same genome in the same directory, it will overwrite those files, as they are named using the genome assembly filename.