Sam’s Notebook: Primer Design – Geoduck Vitellogenin using Primer3

In preparation for designing primers for developing a geoduck vitellogenin qPCR assay, I annotated a de novo geoduck transcriptome assembly last week. Next up, identify vitellogenin genes, design primers, confirm their specificity, and order them!

All of this was done in a Jupyter Notebook on my computer (Swoose).

Jupyter notebook (GitHub):

Annoated transcriptome FastA (271MB):

Although everything is explained pretty well in the Jupyter Notebook, here’s the brief rundown of the process:

  1. Download FastA file.
  2. Split into individual FastA files for each sequence (used pyfaidx v0.5.5.2)
  3. Identify sequences (in original FastA file, not individual files) annotated as vitellogenin.
  4. Design primers on best vitellogenin match (based on TransDecoder score and BLASTp e-values) using Primer3.
  5. Confirm primer specificity using EMBOSS(v6.6.0) primersearch tool to check all individual sequence files for possible matches.

Yaamini’s Notebook: MEPS Revisions Part 4

Using simper and new figures

I spent most (yes……most…………RIP) of the past two days revising my ordinations and creating ordination figures. Sidenote: I also ordinated means and variances separately as per Julian’s suggestion.

simper

During my meeting with Julian on Friday, he suggested I use simper to identify loadings that contribute to any significant ANOSIM results. The code is as follows:

 simper(dataUsedInANOSIM, group = groupUsedInANOSIM) #I used the same code I did for ANOSIM, but just changed "grouping" to "group"  

Since FB-WB and SK-WB were the significant contrasts in my protein abundance data, I decided to use those contrasts consistently for my protein abundance NMDS, mean data NMDS, and variance data NMDS.

Here are the protein abundance simper results…

screen shot 2018-11-30 at 1 01 59 pm

screen shot 2018-11-30 at 1 02 19 pm

…and my thought process for selecting loadings to display:

56539232733__1a692e89-7bc2-400c-bfd5-9ef81f773aff

In essense, I identified the first 10 peptides in the simper output for the FB-WB and SK-WB contrasts. I then selected the peptides that were common between the two.

For the mean and variance simper output, I again looked at the FB-WB and SK-WB contrasts. I took the first 5 dates in the simper output for each environmental variable that was significant in an ANOSIM (dissolved oxygen and temperature for means; pH, dissolved oxygen, salinity, and temperature for variance). The later outplant dates were the ones differentiating FB-WB, while the earlier dates were more important for distinguishing SK-WB.

56538954001__4cf036a2-3d26-48bb-9913-17fff8acdeb5

56538870793__a80d0866-d225-44f7-bbc3-75892a6ce1b2

In this script, I cobbled together a multipanel plot with all of my ordination plots! It still needs a little work for the inner margins, not cutting thigns off, and legend display (maybe under each ordination set…?), but I was too tired after 3 hours of plot manipulation to do anything else shrugs. The pdf can be found here, and a screenshot is below:

screen shot 2018-12-02 at 8 13 06 am

Figure 1. Multipanel ordination plot.

Displaying environmental data

I wanted a better plot for my environmental data, so I made one over Thanksgiving! Note how it uses the same colors for each site as the multipanel plot 😉 The code can be found here.

envdata.

Figure 2. Environmental variables by site and habitat.

Protein abundance heatmap

As per reviewer suggestion, I changed the colors on my protein abundance heatmap (code here). In InDesign, I changed which peptides and proteins were marked as signficantly different. I know there’s a way I could do this in R, but it was 8 p.m. and I was in SAFS on a Saturday so…shrugs.

One important thing to note with the heatmap is that there are different proteins and peptides marked as significantly different! Most of the proteins are the same as what was in my original analysis, but some have been removed. Protein disulfide isomerase 1 and 2 are now identified as significantly different. This is something I need to alter in my results and discussion.

heatmap

Going forward

  1. Update manuscript with new methods, results, and figures
  2. Tweak discussion
  3. Address remaining reviewer comments
  4. Send to co-authors!

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2Pf7BZb
via IFTTT

Add legend to outside of multipanel plot

Google lead me to a solution from Katie Lotterhos! I found it extremely useful.

http://dr-k-lo.blogspot.com/2014/03/the-simplest-way-to-plot-legend-outside.html

Grace’s Notebook: Make Taxonomy Pie Charts in R

Today I fiured out how to make pie charts in R, and I did so with the taxonomy output from the BLAST with the C. bairdi transcriptome and the nt taxonomy database. I mostly worked on my repo for Fish546, and it’s still a work in progress. Also, update on crab extractions – Bioanalyzer isn’t reading chips….

Pie charts in R

taxa_breakdown.R

img

This was tricky because as I note in the script, the pie chart was made based off of how many proteins came from animals whose common name included the word “crab”. When I took a closer look at the data, I noticed that (in particular) some Chionoecetes sp. didn’t have a common name associated with them, and instead just had their genus species repeated… So I had to add in the “Chionoecetes” -specific label to grep. I can’t be sure if the same didn’t happen for other crab species whose scientific name I don’t know and whose common names are listed as repeats of the scientific name…

I’m currently working on creating a pie chart that focuses in on the composition of the crab proteins based on the unique species… it’s proving to be a bit difficult, so I may make an issue related to figuring this out if I can’t figure it out when I revisit it later. taxa-crab-breakdown.R

Bioanalyzer – sidenote

The bioanalyzer isn’t able to read the RNA chips…

But Steven suggested that I work with the eight samples I have extracted and run on the Qubit (post:11/21/18) and try to get them to “library ready”… meaning that I need to make sure that the samples are pure and the quantification is accurate (Bioanalyze and Nanodrop), and I need to create the pool such that each sample contributes an equal amount of RNA to the pool.

from Grace’s Lab Notebook https://ift.tt/2G0xcWe
via IFTTT

Sam’s Notebook: Primer Design – Geoduck Vitellogenin using Primer3

In preparation for designing primers for developing a geoduck vitellogenin qPCR assay, I annotated a de novo geoduck transcriptome assembly last week. Next up, identify vitellogenin genes, design primers, confirm their specificity, and order them!

All of this was done in a Jupyter Notebook on my computer (Swoose).

Jupyter notebook (GitHub):

Annoated transcriptome FastA (271MB):

Although everything is explained pretty well in the Jupyter Notebook, here’s the brief rundown of the process:

  1. Download FastA file.
  2. Split into individual FastA files for each sequence (used pyfaidx v0.5.5.2)
  3. Identify sequences (in original FastA file, not individual files) annotated as vitellogenin.
  4. Design primers on best vitellogenin match (based on TransDecoder score and BLASTp e-values) using Primer3.
  5. Confirm primer specificity using EMBOSS(v6.6.0) primersearch tool to check all individual sequence files for possible matches.

Yaamini’s Notebook: MEPS Revisions Part 3

Final unconstrained and constrained ordinations

I finally figured out all of my N/A problems and completed my unconstrained and constrained ordinations!

Protein abundance

I didn’t have any problems with my protein abundance NMDS and previously obtained pairwise ANOSIM R-statistic and p-values. In this R script, I visualized my ordinations. You can find one version with sample numbers here, and another with site and habitat distinctions here. Because I exported my files as pdfs and don’t feel like changing the code and re-exporting files, I’m also including screenshots of the ordinations.

screen shot 2018-11-29 at 10 58 33 am

screen shot 2018-11-29 at 10 58 49 am

Figures 1 and 2. Protein abundance ordinations with confidence ellipses.

Environmental data

My original problem with the environmental data was that I had N/As in my dataframe, but was unable to use method = "gower" with the metaMDS function in the package vegan. Turns out vegan uses vegdist, which doesn’t handle N/As. Instead, I used the daisy function in the cluster library to calculate a Gower’s dissimilarity (distance) matrix. I then inputted that matrix directly into metaMDS with method = "euclidean". It worked! My code can be found here. I then conducted an Analysis of Similarity (ANOSIM) to assess the significance of my results.

Table 1. One-way ANOSIM results for environmental data NMDS based on site (Case Inlet vs. Fidalgo Bay vs. Port Gamble Bay vs. Skokomish River Delta vs. Willapa Bay), habitat (bare vs. eelgrass), parameter (mean vs. variance), and environmental variable (pH vs. dissolved oxygen vs. salinity vs. temperature). Significant p-values are bolded.

One-way ANOSIM R p-value
Site -0.03428841 0.943
Habitat -0.02037461 0.982
Parameter 0.7378174 0.001
Environmental Variable 0.1796322 0.001

Only the parameter and environmental variable ANOSIMs were significant. My guess is that because I ordinated means and variances in the same space, this is skewing my results. I decided to conduct two-way ANOSIMs to go further.

Table 2. Two-way ANOSIM results for environmental data NMDS. Significant p-values are bolded.

Two-way ANOSIM R p-value
Site and Habitat -0.09991229 1
Parameter and Site 0.4778027 0.001
Parameter and Habitat 0.4778027 0.001
Environmental Variable and Site 0.02098092 0.315
Environmental Variable and Habitat 0.1089352 0.007
Environmental Variable and Parameter 0.7802082 0.001

When accounting for parameter, site and habitat were significant. When accounting for environmental variable, only habitat was significant. I included environmental variable and parameter, but I already figured that would be significant. I need to double check with Julian, but I belive conducting pairwise ANOSIMs based on the significant two-way ANOSIM results will help me understand what is driving the significant results.

I also visualized my ordination here.

screen shot 2018-11-29 at 11 08 28 am

Figure 3. Environmental data NMDS. Means are outlined in solid lines, variances are outlined in dashed lines.

Constrained ordination

In this R Markdown file, I conducted a constrained ordination to look at relationships in protein abundance based on my environmental data. To do this, I first created an environmental data matrix with mean and variance values from the entire outplant, and matched them with my objects, oyster sample IDs. This meant I had the same objects in both my protein abundance and environmental data matrices. I calculated the gradient length and found that the underlying model was linear, so I used an RDA. I specified na.action = na.exclude in my rda function so it could handle my missing values.

Table 3. Variance explained by constrained ordination. The RDA explains 29.21% of all variance.

Variance Partition Inertia Proportion
Total 0.015616 1
Constrained 0.004561 0.2921
Unconstrained 0.011055 0.7079

Table 4. ANOVA results for overall RDA analysis. With F(6,19) = 1.3064 and p-value = 0.195, the RDA is not significant.

Partition Df Variance F Pr(>F)
Model 6 0.0045607 1.3064 0.195
Residual 19 0.0110550

Table 5. Significance of each RDA axis. No axis is significant.

Axis Df Variance F Pr(>F)
RDA1 1 0.0023293 4.0033 0.316
RDA2 1 0.0013380 2.2995 0.603
RDA3 1 0.0003888 0.6682 0.998
RDA4 1 0.0003065 0.5268 0.994
RDA5 1 0.0001395 0.2397 0.999
RDA6 1 0.0000586 0.1008 0.999

Table 6. Significance of each environmental descriptor in the RDA. Temperature mean and variance are marginally significant predictors.

Environmental Variable Df Variance F Pr(>F)
Temperature Mean 1 0.0013542 2.3275 0.065
Temperature Variance 1 0.0012584 2.1627 0.067
pH Mean 1 0.0003063 0.5264 0.781
pH Variance 1 0.0007082 1.2171 0.301
Dissolved Oxygen Mean 1 0.0006024 1.0354 0.349
Dissolved Oxygen Variance 1 0.0003312 0.5692 0.725
Residual 19

One weird thing I noticed is that my salinity variables do not show up in Table 6 or my ordinations below. That’s something I’ll have to figure out with Julian. Other than that, it’s interesting to note that my RDA really isn’t significant, which fits the loose interpretation we had earlier of environment affecting protein abundance in the manuscript.

To visualized my contrained ordination, I included all proteins and environmental descriptors here, and only significant proteins and environmental descriptors here.

screen shot 2018-11-29 at 11 06 57 am

screen shot 2018-11-29 at 11 07 06 am

Figures 4-5. RDA visualization including all proteins and descriptors (top) and only those that are significant (bottom).

Going forward

  1. Discuss all results with Julian
  2. Create a better ordination visual
  3. Update manuscript with new methods and results
  4. Tweak discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2ACXXui
via IFTTT

Kaitlyn’s notebook: Proteomics paper

Goals

Referenced from Shelly’s post

  • Compare ASCA proteins (high loadings) with hierarchical cluster (differentially clustered) proteins
    • make raw abundance line plots facetted by protein
    • examine if GO enrichment changes when ASCA and cluster proteins are combined
  • Determine how time is factored into the cluster
  • Determine if the permutation test with ASCA tests needs to be improved for the high loadings proteins to be considered highly influential
  • Redo BLAST of CHOYP proteins to 2018 Uniprot database
  • Begin identifying and locating data files that we need to deposit in public protein repository (i.e. ProteomeXchange and PeptideAtlas)
    • need to regenerate ‘table_blastout_gigatonpep-uniprot’
    • need fasta file of the peptides ID’d by mass spec
    • make a simplified supplementary table containing CHOYP IDs, UniProt Accessions, e.val, Protein names, Gene names
      • Can modify current datasheet to get this info as well

Completed tasks

Paper questions

Question: Does temperature influence the proteome of larval C. gigas, and if so, how?

Referenced from Shelly’s post

  • Do we need to explain we did 2 x 4 treatments, or just say we did 1 x 2 treatments?
  • Do we have survival data for other silos to compare to silo 2, 3, and 9?
    • Can we rule out silo 2 as an anomaly or should we include it?