Triple-nested for loops are not fun
As I mentioned in a previous post, Steven suggested I create a table with information about the peptide vs. biomarker regressions I did. I was trying to do that using a three for loops nested within eachother.
It did not go well.
When I was talking with Sam today, he suggested I break the nested for loops. Instead, he thought I should save all of the regression information into a new table, and then reformat that information into the table I wanted to save. That was the right idea! The code can be found in this R Script, or below.
The table I produced can be found here.
My next steps are to (finally) quality control my data, remake pH, DO and salinity plots, make a table of important environmental variables, and do something with the growth data. Oh, and write I guess.
…can you tell I’m a bit tired of this yet?
from yaaminiv.github.io http://ift.tt/2Aj54qd
Time flies, especially when I get various waves of data that needs analyzing, and also Thanksgiving, tests/papers from class, etc… Lots of tasks will carry over from November, BUT nevertheless, headway has been made.
Olys at dock
- Clean, measure broodstock (after hatchery meeting)
- Finish Results section
- Correlation btwn various parameters, protein abundance
- Try to use Structural Equation Model instead of linear regression model
- QA %>1SD & %>2SD calculations
- Get more methods info from Micah, Emma
- Focus on DISCUSSION
- Identify target journals
- Determine figures for paper. Candidates:
- Box/violin plots of protein abundances by subbasin, including 3 peptides separately. Like this, but with 4 panes for each protein:
- NMDS plot (like what I already have) with overlaying circles by subbasin
- 3-pane plot of pH, T, and DO (continuous)? Would be messy with all 8 traces from each site.
- Correlation matrices for: variables identified in model (DO var, pH >1sd %, and growth)
- Remaining things that Emma said I should do:
- Linear Response Plot, as per Emma: Peak area on the y, amount of peptide (moles) on the x. Don’t know absolute quantity of experimental peptides, could make the x-axis relative quantity or something. Can generate plots like these in MSstats.
- DIA error rate
- SRM dilution curve
- Idea for paper, but would take a while (probably a few days): screen all proteins in DIA Skyline data for a “menu” of targetable, detectable proteins (based on visual check of chromatograms). This could be published alongside the paper as a very usable resource.
- More lit Research
- Growth related to protein expression – but why only some proteins were diff. expressed?
- Determine if abundance difference is “biologically relevant.” If I can find SRM abundance good (haven’t found good references yet), OR find citable % diff.
Prep for Oly genetics hatchery vs. wild analysis
- re-Read Fischer et al. 2012
- Get most current data set from Crystal
- ID best program to use
- Finish NDSEG, get letters
- AA travel awards (?)
- Submit MS paperwork to office & proposal online (pending approval from Rick, Jackie)
- Learn how to tag notebook posts
- Create README.md for proteomics data on Owl
- Determine which Oly samples to sequence from last spring/winter
- Meet with STATS resource to figure out how to analyze Oly larval survival – parse out survival data to 224um, to juvenile, by time for reps
- Record podcast pilot with Megan
- Find local vendor that sells dry suits, try on to determine size, then purchase gear
Accomplishments from last month
- Checked out Oly histology data from last year’s experiment
- Cireculated MS proposal for approval from committee
- Submitted NSA abstracts, travel grant app
- Sampled experimental Olys for histology (time=0), got animals into treatments
- Received collection permit @ Mud Bay, made contact with homeowners for access, collected first batch
- Gave Oly shells to Heater @ Wood’s lab for Polydora inspection
- Got the CoEnv travel grant for AA
- Proteomics stuff:
- Finalized SRM tech. rep filtering process
- Tracked down tidal charts from each Geoduck Proteomics ouplant location; used to process/filter environmental data from Micah
- Generated summary statstics on environmental data
- Re-ran proteomics analyses on the peptide-level, where transitions within each peptide were summed. Lambda-transformed peptide abundance for normal distribution, then ran 2-way ANOVAs.
- Learned how to run stepwise linear resgression models, ran on diff. expressed proteins with environmental summary stats.
- Refined methods section of paper
- Made headway on results section of paper
- Drafted intro; will likely need to revise based on results
- Finished parasite class things
- Generated notebook for DIA data in Skyline
- R-script for merging DIA results with annotations, extracting SRM targets from this dataframe
- Generated peptide variability stats for paper
- Discarded unecessary vials (the “just in case” vials, and autosampler vials prepped for mass spec). If I need to re-run samples, I’d prepare new autosamper vials from my digested peptides.
from LabNotebook http://ift.tt/2C7PsXW
a.k.a. How to take a break from proteomics
Instead of poring over environmental data, I worked on an MBD-Sequencing protocol for the Crassostrea virginica gonad samples this week. Turns out I know absolutely nothing about different sequencing protocols, so I had to do a bunch of reading about bisulfite sequencing.
Here’s what I learned from a general literature review:
- Most papers are focused on either plant, mice or human epigenomes. Those are not invertebrates.
- According to Olova et al. 2017, “amplification-free and post-bisulfite procedures should become the gold standard for [Whole Genome Bisulfite Sequencing (WGBS)] library preparation”
- MBD-Seq is not the same as WGBS! It’s called lots of things, like MethylCap-Seq. According to this Illumina post, proteins bind to methylated cytosines, then precipitated out using beads.
- MeDip uses antibodies. Riviere et al. 2017 used this method to profile Pacific oyster methylomes at different methylation states, so it would probably work with C. virginica. However, bisulfite sequencing methods are found to be more accurate. We’re going to ignore this method.
After learning about the different sequencing methods, I reviewd the methods section from Gavery and Roberts 2013.
- Sheared DNA was incubated with MBD-Biotin Protein and dynabeads following MethylMinder instructions
- Enriched DNA was eluted from the beads and purified
- Illumina Tru-Seq adapters were used to prepare the DNA libraries
- Qiagen’s EpiTect Bisulfite Kit was used for bisulfite treatment
- Library preparation and sequencing were done on the Illumina HiSeq 2000
This protocol matches with Olova et al. 2017’s suggestion of post-bisulfite treatment and no PCR amplification. However, I don’t know much about the kits she used, and whether or not there are better alternatives. Kurdyukov and Bullock 2016 lay out information for MethylCap (Diagenode) and MethylMiner, but don’t really compare the efficacy of the two. I’m going to ping Sam and Mac to see if they have any opinions on the kits.
My overall takeway is that Mac’s protocol should be sufficient for our work!
from yaaminiv.github.io http://ift.tt/2kaUTNy
(Now I have November and December Dwights!)
So…I guess it’s the end of the year now. Welp. Cue the cramming of work into what’s left of the quarter!
November Goals Recap:
- Turned in my milestone paperwork!
- Successfully presented at GSS and WSN
- Half-completed my DNR Proteomics paper
- Revised methods and results
- Outlined discussion
- Made good progerss on integrating many data sources for my analyses
- Finish. this. proteomics. paper.
Identify sequencing protocol for C. virginica gonad samples
- Prepare C. virgincia samples for sequencing
- Add Disqus to my blog
from yaaminiv.github.io http://ift.tt/2zRUXZ5
The last bit of data!
Micah sent over growth data for the oysters, as well as site rankings for outplant depth and eelgrass extent in the bay.
Outplant elevation, from deepest to most shallow: Case Inlet (never dry), Fidalgo Bay (never dry), Skokomish (exposed at low tide), Port Gamble Bay (exposed at low tide), Willapa Bay (frequently exposed at low tide)
Eelgrass extent in bay, from most to least: Fidalgo Bay (eelgrass dominant), Willapa Bay (eelgrass dominant), Port Gamble Bay (eelgrass common), Case Inlet (eelgrass common), Skokomish (eelgrass limited)
from yaaminiv.github.io http://ift.tt/2AA1065
Yesterday’s meeting notes
In the third installment of our “what does all of this actually mean” meetings, Micah, Alex, Emma, Brent, Laura, Steven and I discussed the progress we’ve made integrating all of our data into one cohesive story.
- Dissolved oxygen measurements
- FB most eelgrass dominated, higher pH, could have daily super saturation (DO > 12)
- Need to do literature survey to verify measurements are “real”
- Padilla Bay: DO ~ 19.3 max for sensors that never come out of the water
- Should clip DO, pH and salinity data
- Conservative one hour/one foot clipping
- Use Union for SK tidal data
- Just use bare for all sites
- Correcting values to the right mean salinity from sensors can be difficult, lead to discrepancies
- End drops in salinity and pH could be burials
- Can examine brief window of environmental data one or two days before sampling
- Number of low tides could be interesting
- ex. Lots of drops in salinity at WB –> could number of low tides affect protein expression?
- Eelgrass extent as an explanatory variable
- Global eelgrass effect could override any bare sites?
- Biomarker data
- Ignore fatty acid data for now since there’s a low sample size
- Final height is a proxy for growth
- FB grew the most, CI grew the least
- Tissue mass highest in FB, then PG. WB, SK, and CI were similar
- Figure out biomarker comparison table code
- Scrub data
- Make environmental variable table
- Standard deviation/variance
- Number of observations above/below SD/2 SDs
- Number of exposures/low tides
- Days exposed
- Total time exposed
from yaaminiv.github.io http://ift.tt/2jTd5Ls
Some more regressions
I tried to write a for loop in this R script to make a table with each peptide vs. biomarker comparison, R-squared value, and slope……but I’m hardcore struggling with it. I wrote a for loop within a for loop to create all of the plots, but now I can’t write another for loop to take all of the information I’m generating and put it in a new dataframe. I’m going to keep trying though!
After looking at my peptide vs. biomarker regressions, Steven suggested I make the same plots for each site. I used the R script linked above to do that. The plots can be found in these folders:
Port Gamble Bay
Skokomish River Delta
Once I figure out how to get my triple for loop to work, I’ll make a table for this information too. Now I guess I’ll wait for our meeting with Micah and Alex to see what to do next.
from yaaminiv.github.io http://ift.tt/2ArUsbA