Kaitlyn’s Notebook: Oysters and Excel/Continuing Work

Using excel, I was asked to identify proteins that were consistently high or varied across samples in the Pacific oyster proteomic data.

First I had to figure out how to open up a .tsv file from Github which I had never done before. I saved the .tsv file by right clicking on the RAW link then I followed these instructions which were very straightforward.

Once I had the file open in excel, I decided used the average(n…) and median(n…) function on all rows.I then selected conditional formatting and choose a color gradient in order to better visualize protein values for each rows. The average would show those that had higher protein counts while the median could provide insight to potential outliers.

I also wanted to provide a range for each row, however I could not find a command for this action. Instead I used the min(n…) and max(n…) functions in separate columns. I created a subsequent column subtracting the minimum value from the maximum value in each row to provide one value representing a range. This time I chose data bars for conditional formatting, mostly to mix it up from the previous selection.

After posting my progress on the file, there was discussion of possible error in technical replicates. In an attempt to show where differences in the sample and technical replicates(denoted by …#A) may be substantial, I calculated averages and medians for the samples and technical replicates. Next, I subtracted the replicate protein values from the original sample values. Then I assigned new rule under conditional formatting to mark values with a difference greater than 10- which I arbitrarily choose but can easily be changed.

Rules in conditional formatting

It seems there are continued problems identifying why replicates had significantly different values, but I will work on using blast to identify these proteins next.

I have also been trying to familiarize myself with bash. Fortunately I am running a 64 bit version of windows which enables me to use bash rather than Git. I enabled developer mode which allowed me to run Linux based programs including bash. I am going to start working through the bash tutorial for FISH546. I will also start looking into running blast with large files (to identify the Pacific Oyster proteins) in addition to familiarizing myself with Jupyter.

This is all pretty new to me (Github and WordPress included) but I’m really enjoying learning more about bioinformatics and working with something new!