Data Wrangling – C.virginica Gonad RNAseq Transcript Counts Per Gene Per Sample Using Ballgown

As we continue to work on the analysis of impacts of OA on Crassostrea virginica (Eastern oyster) gonads via DNA methylation and RNAseq (GitHub repo), we decided to compare the number of transcripts expressed per gene per sample (GitHub Issue). As it turns out, it was quite the challenge. Ultimately, I wasn’t able to solve it myself, and turned to StackOverflow for a solution. I should’ve just done this at the beginning, as I got a response (and solution) less than five minutes after posting! Regardless, the data wrangling progress (struggle?) was documented in the following GitHub Discussion:

  • [Help with unwiedldy table(

The final data wrangling was performed using R and documented in this R Markdown file:


Output file (CSV):

Ultimately, the solution came down to this tiny bit of code (see the R Markdown file linked above for actual info about it):

whole_tx_table %>%
select(starts_with(c("gene_name", "FPKM"))) %>%
group_by(gene_name) %>%
summarise((across(everything(), ~sum(. > 0))))

That’s it!

from Sam’s Notebook

#ifttt, #sams-notebook