So it looks like we got the pecan workflow to work on a subset of isolation windows and samples. I also learned a few more things about `pecanpie` during the process.
pecanpie adds a
-Q argument to the
percolator command run on each sample. According to the
percolator documentation, this argument does not exist and results in an error and a failed run. After removing it from the
percolator` job file it works fine.
2. This is likely only an issue with toy/shortened data sets, but percolator is sensitive to the number of true positives available to populate a training data set, if it doesn't have enough to detect directionality in the trend it throws an error asking for changing the False Detection Rate to something greater than the stock 0.01 via a
-F argument. This doesn't quite work as there are two methods to change FDR rate, the
-F flag as mentioned in the error message as well as a
-t found via a GitHub repo issue which is not mentioned in the error message.
3. This is a big one for quality of life. the
--pecanMemoryRequest argument from
pecanpie is apparently for a single instance of pecan, so if you allow 12 instances to run, the actual memory requested is
number of instances *
--pecanMemoryRequest. I found this out the hard way when I thought I allowed RoadRunner to use 30gigs of the 48 gigs of memory available for the pecan runs. I instead let it use 30 gigs * 12 instances of memory. Oops.
I also met with Hollie today and got some plans worked out for near term Geoduck methylation analysis. I made a 10x coverage whole genome methylation file for all samples, as well as started working on a Circos plot for the genome. The Circos plot is going to take some tinkering, as they're generally built around loci found in some small number of chromosomes as opposed to loci found in 40,000 scaffolds. We also decided to look at the beta-binomial model implemented in RadMeth to compare to the results from MACAU.
All in all, plenty of things to work on over the next week or so!