Kaitlyn’s Notebook: Unique Proteins that Appeared

I parsed out proteins from Rhonda’s data that initially had 0 abundance on day 1, but later had some measurable abundance for at least 1 day in the experiment. I ran the list of proteins I identified for each silo through CompGO. Many of the proteins did not have associated GO terms which was disappointing since some of those proteins were very uniquely abundant in the experiment.

I recorded this in a jupyter notebook entry.

Kaitlyn’s Notebook: Differential Protein Expression

I’ve been trying to identify diferentially expressed proteins in Rhonda’s oyster data, however there has been significant difficulty finding a way to compare the different treatments or days. Initially I would take the differences between the proteins however this did not consider the majority of proteins in the ABACUS data which means I could be missing important enriched genes. Furthermore, finding the differences between two days was simple when looking at the differences, but impossible when comparing all 13 days with each treatment.

Working with Sean, we tried a PCA plot and K means clustering however neither worked effectively. Yaamini is using MSstats to analyze her Skyline data. I am not sure if ABACUS data can be analyzed using MSstats but I know MSstats can analyze DDA so I am going to look into it further.

I helped Yaamini label some tubes for her and Laura’s DNR a little today as well!

Kaitlyn’s Notebook: Silo 2 Protein Expression

I want to start by saying I wasn’t sure why everyone hated DAVID, but I understand now guys! I used DAVID to make GO terms to enter into REVIGO to visualize changes in protein expression based on gene enrichment from days 11 and 13 in silo 2 based on Rhonda’s NMDS Plot:

I produced this plot with REVIGO based on proteins changed by more than 10, which is arbitrary but it seemed like a decent visual cutoff based on values in my table, plus this is preliminary analysis. I left the side bar on the plot since all points weren’t labelled and I figured it’s always easier to crop anyway!

I’m still trying to understand this plot, but it looks like there was over representation for proteins that had to do with reproduction and muscle development. I made separate plots for proteins that increased in expression and decreased in expression which can be found at the end of my jupyter notebook here.

I’m going to continue playing with this data and try to identify changes in protein expression on day 9 since it looks like differentiation between silos began to occur here.

Kaitlyn’s Notebook: Uniprot Annotations and SQL Join

After Blasting the oyster data against a Uniprot database, I joined it with Uniprot annotations from sr320 on SQL Share.
SELECT *FROM
krmitch7."table_CSV-avg-oysterdata-only.csv"kr
left join
sr320."GIgaton-Uniprot-Join"sr
on
sr.column1 = kr."Protein ID"

Capture

Now I’ve condensed some of the information to have a table that is easier to quickly read:

oysterproteintable

There are a lot of possible directions to go from here. The goal is to identify proteins that are highly expressed or vary between treatments (23C vs 29C). Proteins that are highly expressed and/or do not vary between treatments could indicate essential functions for pacific oysters which I believe is not well understood. There are a lot of questions that could be answered from here so right now I am just trying to form a more specific question to investigate using the data I now have.

Kaitlyn’s Notebook: BLAST on Jupyter

I went out with the #LabLadies and shucked my first oyster! I also really enjoyed looking around the Manchester facility and working with everyone! (Thanks for inviting me guys!)

I’ve also finally figured out how to work Github and Jupyter. I’ve now successfully ran a file through BLAST using Jupyter, although it was a practice file downloaded from the internet rather than the pacificoysterdata. I still have to figure out to BLAST that file or if that is the correct file to BLAST. Anyway, I was having problems because I wasn’t specifying the entire path but with a little help from Sam, I finally got it figured out! I also created my first repository, and although it looks pretty empty right now, I’ve moved directories in and out as well as individual files. I’m using the terminal to do this. I did download GitHub Desktop but because I was already working in the terminal so much, it made more sense to me. I understand a lot more of the terminology now as well as how GitHub tracks file changes in your computer. It was pretty exciting getting it all to finally work for me!

Finally, I’m working on an anemone project for my BIO463 (Advanced Physiology) class. I am going to manipulate salinity (hypoosmotic conditions) and temperature (increase) and then test tentacle flexion, tentacle retraction time, changes in symbiote presence and possible tentacle regeneration time of Aiptasia. I ordered them from Carolina Catalogs and they are surprisingly large (about 5cm)! Unfortunately, their symbiote presence will probably prevent the ability to look at methylation patterns however I will learn a lot about anemones for this project and hopefully I can do a separate project  studying changes in global methylation patterns for the Roberts Lab.

Kaitlyn’s Notebook: Moving on to Jupyter

I’ve been working with Bash the last couple of days. I’m running Bash on my Windows (which is capable of running a Linux based shell with the latest Windows update). There are definitely still some challenges. I found out you must change your home directory due to Windows file organization, but I am able to navigate my PC through Bash fairly effectively. I can now create directories and delete files, however I am still confused about utilizing pipes and filters in Bash so I’m still attempting to understand/work through this on Bash as well as opening files through the terminal. Most functions transfer over well from Linux to Windows except opening files. There is a patch from Windows as well as a few workarounds including running cbwin (another terminal) which runs through Bash. While it is a lot of work, running linux through Windows seems like the better option once I figure out these bugs because of the ability to use linux based commands which permits more actions.

While continuing to manage Bash, I’ve also downloaded Anaconda3 which includes Python 3.6 and Jupyter. I’m familiarizing myself with Jupyter (which I open through the GUI system until I figure out Bash). Once I am more comfortable with Jupyter, I will move onto running Blast so that I can run the oyster proteomics data through Blast and hopefully identify some proteins!

Kaitlyn’s Notebook: Oysters and Excel/Continuing Work

Using excel, I was asked to identify proteins that were consistently high or varied across samples in the Pacific oyster proteomic data.

First I had to figure out how to open up a .tsv file from Github which I had never done before. I saved the .tsv file by right clicking on the RAW link then I followed these instructions which were very straightforward.

Once I had the file open in excel, I decided used the average(n…) and median(n…) function on all rows.I then selected conditional formatting and choose a color gradient in order to better visualize protein values for each rows. The average would show those that had higher protein counts while the median could provide insight to potential outliers.

I also wanted to provide a range for each row, however I could not find a command for this action. Instead I used the min(n…) and max(n…) functions in separate columns. I created a subsequent column subtracting the minimum value from the maximum value in each row to provide one value representing a range. This time I chose data bars for conditional formatting, mostly to mix it up from the previous selection.

After posting my progress on the file, there was discussion of possible error in technical replicates. In an attempt to show where differences in the sample and technical replicates(denoted by …#A) may be substantial, I calculated averages and medians for the samples and technical replicates. Next, I subtracted the replicate protein values from the original sample values. Then I assigned new rule under conditional formatting to mark values with a difference greater than 10- which I arbitrarily choose but can easily be changed.

Rules in conditional formatting

It seems there are continued problems identifying why replicates had significantly different values, but I will work on using blast to identify these proteins next.

I have also been trying to familiarize myself with bash. Fortunately I am running a 64 bit version of windows which enables me to use bash rather than Git. I enabled developer mode which allowed me to run Linux based programs including bash. I am going to start working through the bash tutorial for FISH546. I will also start looking into running blast with large files (to identify the Pacific Oyster proteins) in addition to familiarizing myself with Jupyter.

This is all pretty new to me (Github and WordPress included) but I’m really enjoying learning more about bioinformatics and working with something new!