Sam’s Notebook: Data Analysis – C.virginica MBD Sequencing Coverage

A while ago, Steven tasked me with assessing some questions related to the sequencing coverage we get doing MBD-BSseq in this GitHub issue. At the heart of it all was really to try to get an idea of how much usable data we actually get when we do an MBD-BSseq project. Yaamini happened to have done an MBD-BSseq project relatively recently, and it’s one she’s actively working on analyzing, so we went with that data set.

Data was initially trimmed:

Subsequently, the data was concatenated, subset, and aligned using Bismark:

Today, I finally found the time to dedicate to evaluating alignment coverage of each of the Bismark sequence subsets. It was done in a Jupyter Notebook and solely with Bash/Python! I used this as project as an excuse to dive into using Python a bit more, instead of using R. For what I needed to accomplish, I just felt like this approach was simpler (instead of creating an R project and all of that).