Kaitlyn’s Notebok: Gene enrichment of unique proteins

I grouped proteins that had 0 abundance on day 1 based on the number of days they were abundant and ran it through CompGO for biological processes at 0.1. To see if there was a differences in the number of proteins expressed in each group, I made a small table. All values were similar. Highlighted values had p-values of at least 0.1.

I think it’s interesting that all silos had enrichment with proteins that had abundance for only 1 day. All silos had peptidyl-tyrosine-dephosphorylation or “the removal of phosphoric residues from peptidyl-O-phospho-tyrosine to form peptidyl-tyrosine” at p-values greater than 1E-1.

Silo 2- 1 day of protein abundance:

Silo 3- 1 day of protein abundance:

Silo 9- 1 day of protein abundance:

Other enriched processes are:

Silo 2-  cellular response to retionoic acid (6 days),

Silo 3-  intracellular protein transport (4 days) and maturation of SSU-rRNA from tricistronic rRNA transript (5 days),

and Silo 9: negative regulation of endopeptidase activity (7 days). This can be viewed below in respective order.

Silo 2- 6 days of protein abundance:

Silo 3- 4 and 5 days of protein abundance respectively:

Silo 9- 7 days of protein abundance:


Kaitlyn’s Notebook: Unique Expression

I have continued working with Rhonda’s data and did some gene enrichment analysis on any proteins that had abundance on any day of the experiment after 0 abundance on day 1. I used Animal Genome for GO terms and produced a graph based on those GO terms.

I also made a graph for GO terms that had 0 abundance on any day of the experiment after some abundance on day 1.

I also thought it would be worthwhile examining what process seemed to change overall. Therefore I combined the data and produced the following graph:

Although biological process isn’t descriptive, for many proteins that was the only GO term which can be seen in screenshots and charts here. Furthermore, of the proteins I identified, many were not enriched which is why I choose to analyze gene enrichment for proteins that appeared or disappeared at any point in the experiment.

However I have now produced excel sheets that can identify proteins that were expressed only 1, 2, 3, 4 or all 5 days after no abundance on the first day. In other words we can now look at proteins based on the number of days they appeared after 0 abundance. This is also separated by silo as the analysis was before.

You can see that the original data was converted to a dichotomy using R and then based on the sum of those columns, we can identify proteins that were abundant for 1, 2 , 3, 4 or all 5 days. I included the original data so that we could identify if any of those proteins were abundant at very high levels such as the protein in the second photo that was expressed on only 1 day but at almost 40 abundance. I also included some annotations that we can look through as well.

Silo 2- unique expression based on days abundance

Here are the links for the other silos:

Silo 3- unique expression based on days abundance

Silo 9- unique expression based on days abundance

Kaitlyn’s Notebook: Unique Proteins that Appeared

I parsed out proteins from Rhonda’s data that initially had 0 abundance on day 1, but later had some measurable abundance for at least 1 day in the experiment. I ran the list of proteins I identified for each silo through CompGO. Many of the proteins did not have associated GO terms which was disappointing since some of those proteins were very uniquely abundant in the experiment.

I recorded this in a jupyter notebook entry.

Kaitlyn’s Notebook: Differential Protein Expression

I’ve been trying to identify diferentially expressed proteins in Rhonda’s oyster data, however there has been significant difficulty finding a way to compare the different treatments or days. Initially I would take the differences between the proteins however this did not consider the majority of proteins in the ABACUS data which means I could be missing important enriched genes. Furthermore, finding the differences between two days was simple when looking at the differences, but impossible when comparing all 13 days with each treatment.

Working with Sean, we tried a PCA plot and K means clustering however neither worked effectively. Yaamini is using MSstats to analyze her Skyline data. I am not sure if ABACUS data can be analyzed using MSstats but I know MSstats can analyze DDA so I am going to look into it further.

I helped Yaamini label some tubes for her and Laura’s DNR a little today as well!

Kaitlyn’s Notebook: Silo 2 Protein Expression

I want to start by saying I wasn’t sure why everyone hated DAVID, but I understand now guys! I used DAVID to make GO terms to enter into REVIGO to visualize changes in protein expression based on gene enrichment from days 11 and 13 in silo 2 based on Rhonda’s NMDS Plot:

I produced this plot with REVIGO based on proteins changed by more than 10, which is arbitrary but it seemed like a decent visual cutoff based on values in my table, plus this is preliminary analysis. I left the side bar on the plot since all points weren’t labelled and I figured it’s always easier to crop anyway!

I’m still trying to understand this plot, but it looks like there was over representation for proteins that had to do with reproduction and muscle development. I made separate plots for proteins that increased in expression and decreased in expression which can be found at the end of my jupyter notebook here.

I’m going to continue playing with this data and try to identify changes in protein expression on day 9 since it looks like differentiation between silos began to occur here.

Kaitlyn’s Notebook: Uniprot Annotations and SQL Join

After Blasting the oyster data against a Uniprot database, I joined it with Uniprot annotations from sr320 on SQL Share.
left join
sr.column1 = kr."Protein ID"


Now I’ve condensed some of the information to have a table that is easier to quickly read:


There are a lot of possible directions to go from here. The goal is to identify proteins that are highly expressed or vary between treatments (23C vs 29C). Proteins that are highly expressed and/or do not vary between treatments could indicate essential functions for pacific oysters which I believe is not well understood. There are a lot of questions that could be answered from here so right now I am just trying to form a more specific question to investigate using the data I now have.

Kaitlyn’s Notebook: BLAST on Jupyter

I went out with the #LabLadies and shucked my first oyster! I also really enjoyed looking around the Manchester facility and working with everyone! (Thanks for inviting me guys!)

I’ve also finally figured out how to work Github and Jupyter. I’ve now successfully ran a file through BLAST using Jupyter, although it was a practice file downloaded from the internet rather than the pacificoysterdata. I still have to figure out to BLAST that file or if that is the correct file to BLAST. Anyway, I was having problems because I wasn’t specifying the entire path but with a little help from Sam, I finally got it figured out! I also created my first repository, and although it looks pretty empty right now, I’ve moved directories in and out as well as individual files. I’m using the terminal to do this. I did download GitHub Desktop but because I was already working in the terminal so much, it made more sense to me. I understand a lot more of the terminology now as well as how GitHub tracks file changes in your computer. It was pretty exciting getting it all to finally work for me!

Finally, I’m working on an anemone project for my BIO463 (Advanced Physiology) class. I am going to manipulate salinity (hypoosmotic conditions) and temperature (increase) and then test tentacle flexion, tentacle retraction time, changes in symbiote presence and possible tentacle regeneration time of Aiptasia. I ordered them from Carolina Catalogs and they are surprisingly large (about 5cm)! Unfortunately, their symbiote presence will probably prevent the ability to look at methylation patterns however I will learn a lot about anemones for this project and hopefully I can do a separate project  studying changes in global methylation patterns for the Roberts Lab.