Bairdi Immune Genes Lit Search

Bad news: in the time since my last notebook post, I caught the novel coronavirus known as COVID-19. Perhaps you’ve heard of it – it definitely isn’t a barrel of fun. Good news: I have since recovered from the novel coronavirus, and have been rockin’ it all week! This week was spent on examining immune genes in Chionoecetes. So earlier, I created a list of genes with the GO term associated with “immune response” (that’s TK GO TERM) for each of our three transcriptomes. Again, that’s unfiltered (cbai_v2.0), Chionoecetes-only (cbai_v4.0), and Hematodinium-only (hemat_v1.6). To better understand what’s going on with the immune system of these Tanner crab, I assigned myself two goals: 1: Better understand the pathways of the crustacean immune system more broadly 2: Examine the specific genes expressed in the crab (that’s the immune genes observed in cbai_v4.0), and search for the importance of those genes in similar…

from Aidan F. Coyle https://ift.tt/316RisZ
via IFTTT

Chatting with Pam Jensen

Background Longtime readers of Aidan’s lab notebook are no doubt familiar with Pam Jensen, the recently-retired Hematodinium expert. Formerly with NOAA, she worked with Grace to make the 2017 C. bairdi / Hematodinium transcriptome project (referred to later as the 2017 NPRB study). The vast majority of my research so far has been analyzing data collected in this project. Pam retired soon before I joined the lab, and upon her retirement, shipped around 30,000 genetic samples to the Roberts lab. These samples are partially described in our post from this August. Most come from Alaskan snow and Tanner crab from either the Eastern Bering Sea (EBS) or Southeast surveys. However, there’s a decent smattering of samples from other species and locations. These samples were nearly all collected from 2005-2019 To get a better idea of the importance and content of the samples, and to gain additional background on the host/parasite…

from Aidan F. Coyle https://ift.tt/2ZBUwnS
via IFTTT

WGBS Analysis Part 36

Miscellaneous enrichment investigations

I have a two outstanding things I want to look at before I close the book on my enrichment analysis results.

Unannotated GOterms and potential nesting

When I was creating my annotation lists, I noticed that not all GOterms from topGO were present in the genome annotation I generated. I want to see if these terms were nested inside of other GOterms that were enriched, or if they were parent terms of other terms present in my annotation.

My biological process GOterms without matching annotations were GO:0001539 (cilium or flagellum-dependent cell motility), GO:0003341 (cilium movement), GO:0060285 (cilium dependent cell motility), GO:0060294 (cilium movement involved in cell motility), GO:0097722 (sperm motility), and GO:0001700 (embryonic development via the syncytial blastoderm). The terms GO:0002165 (instar larval or pupal development), GO:0009791 (post-embryonic development), and GO:0007391 (dorsal closure) had matching annotations for some transcripts but not others. I looked for parent and child terms on the QuickGO.

Several motility terms were related to eachother, including ones with and without annotations:

Screen Shot 2021-10-25 at 2 51 58 PM

I encountered a similar situation looking at two GOterm lineages for developmental terms:

Screen Shot 2021-10-25 at 3 21 45 PM

Most of the cellular component GOterms were missing matching annotations, and were all related to eachother:

Unknown

I had one cellular component term with some annotations, and it was not related to the other terms:

Unknown-2

I’m not sure why some of the topGO GOterms show up in the annotation or not, but my guess is that topGO may consider the entire term “family tree” when performing enrichment in a way that’s different than the annotation.

Enrichment of hyper- vs. hypomethylated DML

I wanted to see if the genes with enriched GOterms contained more hyper- or hypomethylated DML. While we found a pretty even split of hyper- and hypomethylated DML, any differential enrichment of these DML may give us additional insight into the function of methylation. To do this, I took my dataframe with enriched GOterms and used it as a transcript index to filter my master DML list:

sigRes.allBPMethDiff <- unique(sigRes.allBPProduct %>% dplyr::select(., transcript) %>% left_join(., allDMLGOtermsFiltered, by = "transcript") %>% filter(., GOcat == "P") %>% dplyr::select(., geneID, chr, start, end, meth.diff)) #Isolate transcript column; join with DML list; remove all rows that aren't BP GOterms, select gene ID/chr/start/end/meth.diff columns 

Based on my filtering, I identified 16 unique DML in genes with enriched BP GOterms. Of these 16 DML, 13 were hypermethylated and 3 were hypomethylated. When I did the same thing with my CC GOterms, six of the seven unique DML were hypermethylated. There is definitely an enrichment bias for genes with hypermethylated DML.

Going forward

  1. Update methods
  2. Update results
  3. Revise discussion
  4. Revise introduction
  5. Identify journals for submission
  6. Format for submission
  7. Submit preprint to bioRXiv
  8. Submit paper for publication
  9. Report mc.cores issue to methylKit
  10. Perform randomization test
  11. Update mox handbook with R information
  12. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2ZjNfZE
via IFTTT

WGBS Analysis Part 36

Miscellaneous enrichment investigations

I have a two outstanding things I want to look at before I close the book on my enrichment analysis results.

Unannotated GOterms and potential nesting

When I was creating my annotation lists, I noticed that not all GOterms from topGO were present in the genome annotation I generated. I want to see if these terms were nested inside of other GOterms that were enriched, or if they were parent terms of other terms present in my annotation.

My biological process GOterms without matching annotations were GO:0001539 (cilium or flagellum-dependent cell motility), GO:0003341 (cilium movement), GO:0060285 (cilium dependent cell motility), GO:0060294 (cilium movement involved in cell motility), GO:0097722 (sperm motility), and GO:0001700 (embryonic development via the syncytial blastoderm). The terms GO:0002165 (instar larval or pupal development), GO:0009791 (post-embryonic development), and GO:0007391 (dorsal closure) had matching annotations for some transcripts but not others. I looked for parent and child terms on the QuickGO.

Several motility terms were related to eachother, including ones with and without annotations:

Screen Shot 2021-10-25 at 2 51 58 PM

I encountered a similar situation looking at two GOterm lineages for developmental terms:

Screen Shot 2021-10-25 at 3 21 45 PM

Most of the cellular component GOterms were missing matching annotations, and were all related to eachother:

Unknown

I had one cellular component term with some annotations, and it was not related to the other terms:

Unknown-2

I’m not sure why some of the topGO GOterms show up in the annotation or not, but my guess is that topGO may consider the entire term “family tree” when performing enrichment in a way that’s different than the annotation.

Enrichment of hyper- vs. hypomethylated DML

I wanted to see if the genes with enriched GOterms contained more hyper- or hypomethylated DML. While we found a pretty even split of hyper- and hypomethylated DML, any differential enrichment of these DML may give us additional insight into the function of methylation. To do this, I took my dataframe with enriched GOterms and used it as a transcript index to filter my master DML list:

sigRes.allBPMethDiff <- unique(sigRes.allBPProduct %>% dplyr::select(., transcript) %>% left_join(., allDMLGOtermsFiltered, by = "transcript") %>% filter(., GOcat == "P") %>% dplyr::select(., geneID, chr, start, end, meth.diff)) #Isolate transcript column; join with DML list; remove all rows that aren't BP GOterms, select gene ID/chr/start/end/meth.diff columns 

Based on my filtering, I identified 16 unique DML in genes with enriched BP GOterms. Of these 16 DML, 13 were hypermethylated and 3 were hypomethylated. When I did the same thing with my CC GOterms, six of the seven unique DML were hypermethylated. There is definitely an enrichment bias for genes with hypermethylated DML.

Going forward

  1. Update methods
  2. Update results
  3. Revise discussion
  4. Revise introduction
  5. Identify journals for submission
  6. Format for submission
  7. Submit preprint to bioRXiv
  8. Submit paper for publication
  9. Report mc.cores issue to methylKit
  10. Perform randomization test
  11. Update mox handbook with R information
  12. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3nz0tdw
via IFTTT

WGBS Analysis Part 35

Gene librarian for genes with DML associated with enriched GOterms

Last week I finally matched my list of enriched GOterms with gene IDs and methylation difference information. This week, I spent a lot of time playing gene librarian and understanding the functions of each of these genes in the gonad or during embryogenesis, probable responses to low pH stress, and overlaps between other studies.

Methods

To separate my manual annotations and searches from tables I produced in R, I created this Excel spreadsheet. I searched for each of the gene products in publicly available DML lists from the four other ocean acidification and methylation in oyster studies: Downey-Wall et al. 2020, Venkataraman et al. 2020, Chandra Rajan et al. 2021, and Lim et al. 2020. I also looked at the C. gigas-specific reproduction and gene expression papers: Dheilly et al. 2012 and Broquard et al. 2021. Important note with these papers: they use the oyster_v9 genome, while I used the Roslin genome. Finally, I searched for papers on Google Scholar examining these genes in marine invertebrates, but preferably bivalves. For these searches, I looked for low pH responses and/or egg-specific studies. I went through this process for biological process and cellular component lists separately, but there was some overlap between these gene lists.

Biological processes: General functions and interesting notes

Overall, genes are mainly involved in developmental progression and regulation.

  • Dynein heavy chain 5, axonemal: Associated with sperm (flagellar and cillar) motility). Ddynein heavy chain 1, 8, 12, cytoplasmic dynein 1 heavy chain 1-like, dynein heavy chain 1, and cytoplasmic dynein 2 heavy chain 1-like contained DML in C. virginica gonad (Venkataraman et al. 2020). Higher expression of dynein proteins over the course of spermatogeneisis in C. gigas (Dheilly et al. 2012). Dynein does not have strict functions relating to sperm motility. Differential abundance of dynein proteins was found in shotgun proteomic characterization of C. gigas ctenidia (Timmins-Schiffman et al. 2014), and can help move materials for calcification in response to low pH stress in mantle tissue (De Wit et al. 2018). Unfertilized sea urchin eggs contain several dynein proteins (Pratt 1980, Porter 1988). With new genome, it’s possible that this annotation may be a cytoplasmic homologue, as we did not sequence any male DNA. Cytoplasmic dynein would be involved in organelle transport.
  • Unconventional myosin-VI: Motor protein. Decreased expression during gametogenesis, with higher expression in immature gonad (Dheilly et al. 2012). The paramyosin homologue is highly expressed in females over males, with declining expression as gametes mature (Broquard et al. 2021). Unconventional myosin-XVIIIa-like contained a DML in C. virginica gonad (Venkataraman et al. 2020). Unconventional myosins were also found to be important for embryonic development in urchins, specifically to regulate the onset of blastulation and gastrulation stages (Sirotkin 2000). Changes in methylation of this gene could impact progression of gametogenesis or embryogenesis.
  • Serine/threonine-protein kinase 36: Involved in signaling pathways. Higher expression of serine/threonine-protein kinases in mature female gonads (Dheilly et al. 2012), especially during sex determination processes (Broquard et al. 2021). Examination of the C. gigas kinome found several serine/threonine-protein kinases in eggs and embryos, with some gene expression changes in response to abiotic stress (Epelboin et al. 2016). Also found to be related to oocyte maturation in the king scallop (Pauletto et al. 2017). Several serine/threonine-protein kinases contained DML in C. virginica (Venkataraman et al. 2020), and one was differentially methylated in C. hongkongensis larvae. Methyalation of this gene could promote homeostasis by regulating gonad development processes.
  • Helicase domino: Exchanges phosphorylated histone for acetylated form, with chromatin remodeling impacting gene expression. Potentially involved in oogenesis. Outlier SNP loci were found in this gene in O. lurida, potentailly related to immune or stress response (Silliman 2019). Potential methylation regulation of chromatin structure.
  • Protein neuralized (neuralized-like protein): Involved in cell fate decisions. Part of the ubiquitin ligase family and broadly involved in protein ubiquitination processes (Hu 2005, Jiang 2011). Changes in protein abundance in Strongylocentrotus purpuratus egg and embryo related to early cleavage and initiation of gastrulation. Increased expression of this gene in Artemia sinica can suppress cell division and macromolecule synthesis (Jiang 2011), so methylation may be regulating gonad development.
  • Cytoplasmic aconitate hydratase (aconitase): Catalyzes part of the TCA cycle. Found in C. virginica eggs and trocophores, with enzyme activity increasing as development progressed (Black 1962). Associated with energy metabolism during gametogenesis in Pecten maximus (Boonmee 2016), and similarly associated with energy production in C. gigas during spermatogenesis (Kingtong et al. 2013). Higher expression in S. purpuratus populations more exposed to low pH conditions after one day of low pH exposure (embryos), but lower expression after seven days of low pH exposure (larvae) (Evans et al. 2017). Methylation may “prime” embryos for low pH exposure and dictate energy metabolism.
  • Serine-protein kinase ATM: Cell cycle checkpoint kinase, regulates downstream proteins, involved in cellular response to DNA damage. Gene contained DML in C. virginica gonad (Venkataraman et al. 2020). Associated with female gamete generation and development in Sinonovacula constricta. Mobilized under embroygenesis and during environmental stress, regulating cell-cycle progression (Epelboin et al. 2016). Involved in cellular stress response when C. gigas was exposed to low pH and As (Moreira et al. 2018).
  • Cation-independent mannose-6-phosphate receptor: Transport lysosomal enzymes to lysosome. Potential involvement in D. polymorpha contaminant (Leprêtre et al. 2020) and Marsupenaus japonicus bacteria immune responses.

Cellular components

  • Lots of similar genes to BP! Dynein heavy chain 5, unconventional myosin, and helicase domino had enriched CC GOterms. Helicase domino was associated with histone and protein acetyltransferase complexes, and dynein and myosin were associated with microtubule complexes and the cytoskeleton.
  • Cytoplasmic dynein 2 light intermediate chain 1-like: Forms a motor protein complex to transport items along microtubules in cilia and flagella. Similar findings to dynein heavy chain 5 (see above)
  • Kinesin-like protein KIF23: Helps with organelle transport and required for cytokinesis. Increased expression over the course of spermatogenesis, and higher expression in mature male and female gonads (Dheilly et al. 2012). Differential expression between mature gonad and stripped oocytes (Dheilly et al. 2012). Kinesin-like protein KIF12 and KIF15 contained DML in C. virginica gonad (Venkataraman et al. 2020). Changes in kinesin-like protein gene expression have been seen in larval S. purpuratus exposed to low pH (Padilla-Gamiño et al. 2013) and Pocillopora acuta exposed to heat stress (Poquita-Du et al. 2020), and these changes could be associated with reduced calcification. Also associated with immune response in C. gigas (Lorgeril et al. 2011).

Going forward

  1. Look into unannotated GOterms and potential nesting
  2. Determine if there’s a bias towards hyper- or hypomethylated genic DML with enriched functions
  3. Report mc.cores issue to methylKit
  4. Perform randomization test
  5. Update mox handbook with R information
  6. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3CesOvL
via IFTTT

WGBS Analysis Part 34

Digging into enrichment results

I’m returning to the topGO analysis I did previously. One of Steven’s main concerns after looking at the enrichment results was that several enriched GOterms appeared to be related to sperm motility. Since I only used female samples, having terms related to sperm didn’t make a lot of sense. Additionally, I wrote and discussed my results in the context of significantly enriched GOterms, but didn’t tie back to the genes. So I’m going back to this R Markdown file to look at the enrichment results, annotations, and related genes.

Revising annotations

The first thing I noticed when I looked at my results spreadsheet was that I never actually added gene product annotations. To do this, I imported the list of gene products and transcript IDs I generated in this Jupyter Notebook. Then, I used merge to combine both spreadsheets and saved the file here.

When I used simplifyEnrichment to visualize the enrichment results, I created a dataframe that clustered my enriched GOterms by semantic similarity. I wanted to append the cluster information to my spreadsheet so I could use that to understand the different gene functions. Cluster 1 is related to cilia and motility, while clusters 2-5 are related to development. It’s possible that clusters 2-5 are really a supercluster instead of four clusters with one enriched GOterm each. Looking at my dataframe after merging the product information with the clusters, I saw I was missing information for cluster 4. It was at this point that I realized I should have used left_join when combining dataframes (now and upstream). It’s likely that the GOterms related to the fourth cluster didn’t have any transcript annotations from the annotation table I generated, so those rows were being dropped with merge.

I went back to the top of my script and proceeded to use left_join instead of merge. A couple of significant GO IDs had topGO annotations, but were not present in my manual blastx annotations. I thought it would be good to keep the GOterms associated with these GO IDs only present in the topGO database. I also wanted to pull the transcript IDs associated with these enriched GOterms. I used genesInTerm to extract the genes of interest, converted it into a dataframe, then extracted the transcript IDs so they were on separate rows:

sigRes.allBPGenes <- genesInTerm(allGOdataBP, whichGO = allRes.allBP$GO.ID[1:10]) #Extract genes associated with each GOterm from topGO's annotations sigRes.allBPGenes <- as.data.frame(vapply(sigRes.allBPGenes, paste, collapse = ",", character(1L))) #Collapse all transcript IDs into one column and convert into a dataframe. GO IDs will be the rownames colnames(sigRes.allBPGenes) <- c("transcript") sigRes.allBPGenes$GOID <- rownames(sigRes.allBPGenes) #Convert row names to a separate column rownames(sigRes.allBPGenes) <- 1:nrow(sigRes.allBPGenes) #Convert row names to numeric values sigRes.allBPGenes <- separate_rows(sigRes.allBPGenes, transcript, sep = ",") #Separate transcripts so each transcript is on a separate row 

I used left_join to match this information with topGO enrichment scores and GO IDs. As expected, every GO ID had at least one corresponding transcript. Then, I used left_join based on GO ID and transcript to add my manual annotations and methylKit information. At this step, I realized that the GO IDs not present in my manual annotations did not get any corresponding methylKit information, even though there was a transcript ID. I tried for a while to think of a way to complete the annotation process, but got stuck and figured if those cells had N/A values, then I would know those were the GOterms only present in the topGO database. Finally, I appended the product and cluster information to all rows and saved the output here. I repeated this process with the CC GOterms and found an overlap between the genes in each list! I didn’t look at cluster information, since there was only one cluster determined by simplifyEnrichment. I saved the spreadsheet with product information here.

Going forward

  1. Look into unannotated GOterms and potential nesting
  2. Understand function of genes with enriched GOterms
  3. Determine if there’s a bias towards hyper- or hypomethylated genic DML with enriched functions
  4. Report mc.cores issue to methylKit
  5. Perform randomization test
  6. Update mox handbook with R information
  7. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3ACn11q
via IFTTT

Nematostella Care

Protocol for care during non-experimental time

For the next few weeks, I’ll be helping take care of Nematostella vectensis anenomes in our lab. These anemones are clonal lines from individuals collected from various marshes in Massachusetts and North Carolina. By familiarizing myself with the animals a bit more, I’ll have a better idea of what capacity we have for future experiments.

Preparing to feed

  1. Grab the brine shrimp eggs from the freezer and head down to the Nematostella room
  2. Disconnect the air line for the live shrimp and clip to the aerator. Let the shrimp settle for a few minutes such that the unhatched eggs remain on the top.
  3. Obtain a large glass dish for live brine shrimp, a filter, and a small glass container to hold the filter. Place the filter in the small glass container.
  4. Place the aerator on a shelf and carefully drain only the live shrimp into the glass container. Pour the live shrimp into the filter.
  5. Rinse the live shrimp in the filter with half-strength seawater, then fill the container with half-strength seawater.

Feeding

To feed the anemones, squeeze 3-7 drops of shrimp into the various holding containers. The number of drops depends on the size of the container. It’s best to place shrimp near anemones that have longer extended tentacles. Since the anemones are not being experimented with currently, they do not need to be fed an exact amount. The anemones can always be checked a few minutes after they are fed to see if there are still live shrimp remaining, or to add extra shrimp.

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3mygqjg
via IFTTT

October 2021 Goals

gibbs

After a month-long vacation, I’m in Woods Hole and I’m now a Postdoctoral Scholar :open_mouth: So as much as I’d like to be like Gibbs and leap over my responsibilities like a flying loaf of bread, it’s time to get back to it and make the most of the next 18 months.

August Goals Recap

Gigas Gonad Methylation:

  • Addressed reading committee edits!

Hawaii Gigas Methylation:

-Addressed reading committee edits! I was pretty low on time (and energy) so I didn’t do more than the minimum requirements

Dissertation:

  • Format published chapters using Huskydown in this Github repository. There were several things that were a bit annoying to figure out since I needed to use Latex for certain things like tables, but now I have a pretty nice template for individual papers.
  • Submitted my dissertation and it was approved!

Other:

  • Cleaned out office and reorganized -80ºC freezer
  • Put a plan together for finishing remaining projects

October Goals

Gigas Gonad Methylation:

  • Revisit genes with DML and association with enrichment
  • Update methods and results
  • Rework discussion to include more direct comparisons to other oyster methylation papers

Hawaii Gigas Methylation:

  • Clean up any remaining comments
  • Send manuscript to Maria to get methods information
  • Email Rajan and ask for input
  • Review DSS script and determine if I should go back to methylKit for better interpretation
  • Extract SNPs with EpiDiverse and create a relatedness matrix
  • Look at methylation islands and non-methylated regions
  • Examine overlaps between DML and other epigenomic datasets from Sascha

Coral Transcriptomics:

  • Review Maggie’s extraction protocol and previous work
  • Determine if metatranscriptomics, metabolomics, and/or methylation analysis are viable options

Other:

  • Complete onboarding
  • Think about goals and potential projects for postdoc
  • Outline a project for NSF PRFB and contact letter writers
  • Identify additional papers for ocean acidification and reproduction review
  • Complete review for Scientific Data

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2YmMi2B
via IFTTT

Dissertation!

I learned how to use R Markdown and Latex together to format my final dissertation! Repo can be found here: https://github.com/yaaminiv/dissertation

Things that were annoying to figure out: table formatting, figure formatting, making things horizontal vs. vertical.