Roberto’s Notebook: Problem with Stringtie IDs.

Using the program Stringtie for the transcript abundance estimation for libraries from 2 thermal-resistant (TR) and 2 thermal-susceptible (TS) oyster families exposed to oscillatory thermal challenge during 30 days. The program assigned different gene IDs (different from C. gigas gene IDs) specifically in merge step in the output file (/Volumes/toaser/roberto/Hisat_results/stringtie_results/stringtie_merged.gtf) but it has the reference ID (CGI_10000005 for example). As Steven suggested, using finder on stringtie_merged.gtf file, I found the missing CGI gene IDs in the gene expression table (where only 4379 genes had CGI IDs from 60643 total expressed genes).
Doing a test, the stringtie ID “MSTRG.21417” corresponds to DNMT1 gene CGI_10021920 (https://www.uniprot.org/uniprot/K1QQH9). It is differentially expressed between TR (samples Os13, Os14, Os15, Os16, Os17 and Os18) and TS families (Os1, Os2, Os3, Os10, Os11 and Os12) at day 30.
DNMT1 dene expression day 30

Where phenotype looks with isoform preferences:
DNMT1 phenotype expression day 30

As is the case for the isoform #2 (considering up to down in the figure) located in:
scaffold1862: transcript_position: 618219-640790 / gene_id “MSTRG.21417”; transcript_id “MSTRG.21417.4”; ref_gene_id “CGI_10021920” with 34 exons is more present in TR. This could suggest… 🙂