Sam’s Notebook: Primer Design – Geoduck Vitellogenin using Primer3

In preparation for designing primers for developing a geoduck vitellogenin qPCR assay, I annotated a de novo geoduck transcriptome assembly last week. Next up, identify vitellogenin genes, design primers, confirm their specificity, and order them!

All of this was done in a Jupyter Notebook on my computer (Swoose).

Jupyter notebook (GitHub):

Annoated transcriptome FastA (271MB):

Although everything is explained pretty well in the Jupyter Notebook, here’s the brief rundown of the process:

  1. Download FastA file.
  2. Split into individual FastA files for each sequence (used pyfaidx v0.5.5.2)
  3. Identify sequences (in original FastA file, not individual files) annotated as vitellogenin.
  4. Design primers on best vitellogenin match (based on TransDecoder score and BLASTp e-values) using Primer3.
  5. Confirm primer specificity using EMBOSS(v6.6.0) primersearch tool to check all individual sequence files for possible matches.

Yaamini’s Notebook: MEPS Revisions Part 3

Final unconstrained and constrained ordinations

I finally figured out all of my N/A problems and completed my unconstrained and constrained ordinations!

Protein abundance

I didn’t have any problems with my protein abundance NMDS and previously obtained pairwise ANOSIM R-statistic and p-values. In this R script, I visualized my ordinations. You can find one version with sample numbers here, and another with site and habitat distinctions here. Because I exported my files as pdfs and don’t feel like changing the code and re-exporting files, I’m also including screenshots of the ordinations.

screen shot 2018-11-29 at 10 58 33 am

screen shot 2018-11-29 at 10 58 49 am

Figures 1 and 2. Protein abundance ordinations with confidence ellipses.

Environmental data

My original problem with the environmental data was that I had N/As in my dataframe, but was unable to use method = "gower" with the metaMDS function in the package vegan. Turns out vegan uses vegdist, which doesn’t handle N/As. Instead, I used the daisy function in the cluster library to calculate a Gower’s dissimilarity (distance) matrix. I then inputted that matrix directly into metaMDS with method = "euclidean". It worked! My code can be found here. I then conducted an Analysis of Similarity (ANOSIM) to assess the significance of my results.

Table 1. One-way ANOSIM results for environmental data NMDS based on site (Case Inlet vs. Fidalgo Bay vs. Port Gamble Bay vs. Skokomish River Delta vs. Willapa Bay), habitat (bare vs. eelgrass), parameter (mean vs. variance), and environmental variable (pH vs. dissolved oxygen vs. salinity vs. temperature). Significant p-values are bolded.

One-way ANOSIM R p-value
Site -0.03428841 0.943
Habitat -0.02037461 0.982
Parameter 0.7378174 0.001
Environmental Variable 0.1796322 0.001

Only the parameter and environmental variable ANOSIMs were significant. My guess is that because I ordinated means and variances in the same space, this is skewing my results. I decided to conduct two-way ANOSIMs to go further.

Table 2. Two-way ANOSIM results for environmental data NMDS. Significant p-values are bolded.

Two-way ANOSIM R p-value
Site and Habitat -0.09991229 1
Parameter and Site 0.4778027 0.001
Parameter and Habitat 0.4778027 0.001
Environmental Variable and Site 0.02098092 0.315
Environmental Variable and Habitat 0.1089352 0.007
Environmental Variable and Parameter 0.7802082 0.001

When accounting for parameter, site and habitat were significant. When accounting for environmental variable, only habitat was significant. I included environmental variable and parameter, but I already figured that would be significant. I need to double check with Julian, but I belive conducting pairwise ANOSIMs based on the significant two-way ANOSIM results will help me understand what is driving the significant results.

I also visualized my ordination here.

screen shot 2018-11-29 at 11 08 28 am

Figure 3. Environmental data NMDS. Means are outlined in solid lines, variances are outlined in dashed lines.

Constrained ordination

In this R Markdown file, I conducted a constrained ordination to look at relationships in protein abundance based on my environmental data. To do this, I first created an environmental data matrix with mean and variance values from the entire outplant, and matched them with my objects, oyster sample IDs. This meant I had the same objects in both my protein abundance and environmental data matrices. I calculated the gradient length and found that the underlying model was linear, so I used an RDA. I specified na.action = na.exclude in my rda function so it could handle my missing values.

Table 3. Variance explained by constrained ordination. The RDA explains 29.21% of all variance.

Variance Partition Inertia Proportion
Total 0.015616 1
Constrained 0.004561 0.2921
Unconstrained 0.011055 0.7079

Table 4. ANOVA results for overall RDA analysis. With F(6,19) = 1.3064 and p-value = 0.195, the RDA is not significant.

Partition Df Variance F Pr(>F)
Model 6 0.0045607 1.3064 0.195
Residual 19 0.0110550

Table 5. Significance of each RDA axis. No axis is significant.

Axis Df Variance F Pr(>F)
RDA1 1 0.0023293 4.0033 0.316
RDA2 1 0.0013380 2.2995 0.603
RDA3 1 0.0003888 0.6682 0.998
RDA4 1 0.0003065 0.5268 0.994
RDA5 1 0.0001395 0.2397 0.999
RDA6 1 0.0000586 0.1008 0.999

Table 6. Significance of each environmental descriptor in the RDA. Temperature mean and variance are marginally significant predictors.

Environmental Variable Df Variance F Pr(>F)
Temperature Mean 1 0.0013542 2.3275 0.065
Temperature Variance 1 0.0012584 2.1627 0.067
pH Mean 1 0.0003063 0.5264 0.781
pH Variance 1 0.0007082 1.2171 0.301
Dissolved Oxygen Mean 1 0.0006024 1.0354 0.349
Dissolved Oxygen Variance 1 0.0003312 0.5692 0.725
Residual 19

One weird thing I noticed is that my salinity variables do not show up in Table 6 or my ordinations below. That’s something I’ll have to figure out with Julian. Other than that, it’s interesting to note that my RDA really isn’t significant, which fits the loose interpretation we had earlier of environment affecting protein abundance in the manuscript.

To visualized my contrained ordination, I included all proteins and environmental descriptors here, and only significant proteins and environmental descriptors here.

screen shot 2018-11-29 at 11 06 57 am

screen shot 2018-11-29 at 11 07 06 am

Figures 4-5. RDA visualization including all proteins and descriptors (top) and only those that are significant (bottom).

Going forward

  1. Discuss all results with Julian
  2. Create a better ordination visual
  3. Update manuscript with new methods and results
  4. Tweak discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2ACXXui
via IFTTT

Kaitlyn’s notebook: Proteomics paper

Goals

Referenced from Shelly’s post

  • Compare ASCA proteins (high loadings) with hierarchical cluster (differentially clustered) proteins
    • make raw abundance line plots facetted by protein
    • examine if GO enrichment changes when ASCA and cluster proteins are combined
  • Determine how time is factored into the cluster
  • Determine if the permutation test with ASCA tests needs to be improved for the high loadings proteins to be considered highly influential
  • Redo BLAST of CHOYP proteins to 2018 Uniprot database
  • Begin identifying and locating data files that we need to deposit in public protein repository (i.e. ProteomeXchange and PeptideAtlas)
    • need to regenerate ‘table_blastout_gigatonpep-uniprot’
    • need fasta file of the peptides ID’d by mass spec
    • make a simplified supplementary table containing CHOYP IDs, UniProt Accessions, e.val, Protein names, Gene names
      • Can modify current datasheet to get this info as well

Completed tasks

Paper questions

Question: Does temperature influence the proteome of larval C. gigas, and if so, how?

Referenced from Shelly’s post

  • Do we need to explain we did 2 x 4 treatments, or just say we did 1 x 2 treatments?
  • Do we have survival data for other silos to compare to silo 2, 3, and 9?
    • Can we rule out silo 2 as an anomaly or should we include it?

Kaitlyn’s notebook: clustering without day 0

I redid the hierarchical clustering of the combined silo 3 and 9 datasheet without day 0.

Include Day 0 Remove Day 0
Agglomerate coefficient 0.9964979 0.9976217
Cophenetic correlation 0.9477519 0.9630485
Clusters 41 84
Diferentially clustered proteins 33 213

31 “diferentially clustered proteins” remained differentially clustered whether day 0 was included or excluded. So, removing day 0 causes more proteins to cluster separately (the agglomerate coefficient is slightly increased).

  • i.e., removing day 0 causes more proteins to be identified as having different abundances.
    • All abundances are the same for both silos on day 0 since no treatment had been administered yet.
      • Removing day 0 means that only days following treatment are analyzed which makes more sense since we are attempting to identify proteins that have different abundances during treatment only.

 

This slideshow requires JavaScript.

This slideshow requires JavaScript.

Grace’s Notebook: Start new DIA analysis using Walnut, EncyclopeDIA, and Bri-line

Today I tried doing the mprophet model in Skyline Daily. It wasn’t super great and I got stuck, but Emma reminded me that a lot of people who do DIA analysis prefer a new method using Walnut, EncyclopeDIA, and Bri-line. I started this new protocol today and am currently letting Walnut make the new chromatogram library, which will take a while. I’ll move on to the next phase for this tomorrow when the library is done.

mprophet

mprophet is part of the Advanced Peak Picking Tutorial. I finished the model, but didn’t go into checking the error rates because Emma reminded me that the Walnut, EncyclopeDIA, “Bri-line” method is what others have been preferring.

20181126-2015-Cgseed.sky.zip is too large to add to OWL… but it’s on Woodpecker (FTR 209)

New method in a nutshell

  1. Convert .raw to .mzML using MSConvert (Check)
  2. Walnut –> make .elib file using the .mzML files made in Step 1. “Save Chromatogram Library” (in progress – will be done tomorrow)
  3. EncyclopeDIA –> Search wide-window data with library from Step 2.
  4. Bri-line –> ELIB browser: look at peaks; RAW file browser: look at RAW files

This method is pretty straight forward and seemingly easy to follow – will report back on that in tomorrow’s notebook post.

Goals for the week:

Tuesday

  1. Finish new DIA analysis method

Wednesday

  1. Extraction plan for Crab samples – get input from Steven and Sam
  2. Start extractions (?)

Thursday
I have a million classes (3), so I won’t have time to fit in extractions, but:

  1. Work on R script for adding new Qubit data

Friday

  1. Extractions
  2. Anything from my previous goals that I couldn’t finish

from Grace’s Lab Notebook https://ift.tt/2RcspSV
via IFTTT

Ronit’s Notebook: qPCR with Desiccation + Elevated Temp. Samples (DNMT1)

Before the long weekend, I ran a qPCR assay with my desiccated + elevated temp. samples. There were 40 samples in total and I ran one plate with DNMT1 (DNA methyltransferase) as my gene target.

Plate Layout:

The plate is laid out with duplicates in adjacent wells and the order of samples is as follows:

D01, D02, D03, D04, D05, D06, D07, D08, D09, D10, D11, D12, D13, D14, D15, D16, D17, D18, D19, D20, T01, T02, T03, T04, T05, T06, T07, T08, T09, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20

There are 2 positive controls with gDNA in wells G9 and G10. There is a no-template control (negative control) with no template in wells G11 and G12.

To run the qPCR assay, I created a mastermix with 20 µL of forward primer, 20 µL of reverse primer, 400 µL of 2x qPCR master mix, and 320 µL of DEPC-treated water. 19 µL of the mastermix was put into each well and a subsequent 1 µL of cDNA was put in for each sample (water for the negative controls and gDNA for the positive controls). Note that all sample cDNA used was a 1:5 dilution.

Grace’s Notebook: Cold, Infected Pool with RNeasy and DecaPod S1E13

Today I edited and published S1E13 of DecaPod. In this episode I interview Madi Shipley (MS, SAFS) who works with Bering Sea Tanner crabs. I also extracted RNA from 8 samples (all Cold, infected, Day 26) using Qiagen RNeasy and definitely have enough RNA for a pool!

DecaPod

S1E13 posted

RNeasy Extraction: 8 samples D26, cold, infected

Today I extracted RNA from 8 samples (all the second out of the three that were taken per crab):
img

I used the RNeasy protocol, with a few modifications:

  • At steps 5-7, I centrifuged at 10,000 rcf
  • Added 2-ME to Buffer ATL (since 8 samples, did 3mL buffer, 30µL 2-ME)
  • Did not add RNA carrier
  • Eluted with 14µL water (provided in Kit)

img

Qubit results look pretty amazing! My bioanalyzer results from last time looked pretty good, so I think we can trust these results. EXCEPT for tube number 413-2 because some of the working solution did not make it into the qubit tube.

img

The tubes (now with 13µL of material) are in Rack 5, Column 4, Row 3 in the -80.

NWGC wants 20ng/µL of RNA in 50µL sample (GitHub issue #184). It looks like we have more than enough!

When I return from the short holiday break this weekend, I’ll start picking out samples for the other pools and get feedback from Steven and Sam.

from Grace’s Lab Notebook https://ift.tt/2FAjDg2
via IFTTT