Steven prepared a smaller geoduck proteome for me, edited to only include proteins related to stress response. This decreased the # proteins from ~30,000 to 600, the idea being that Pecan will run faster and won’t max out its “temporary memory.” Here’s the synopsis of me preparing files and running Pecan:
Add PRTC sequence to the new database
Use Protein Digestion Simulator to execute in silico digestion
Use Galaxy online to remove excess columns; note that the output from PDS is a .txt file
Resulting edited file (note this is a .tabular format, which is fine)
Killing existing Pecan run
Copied input files into new folder entitled “Pecan-inputs6” – This is the 6th Pecan run attempted on this data.
Executing Pecan; note that I needed to edit the file paths (output and input)
Checking to see that there is, in fact, a Pecan job queue
Making the .job files in /percolator/ and /pecan2blib/ directories executable
This “Pecan6” run was started at ~3:30pm – I’m curious to see how quickly things go with this reduced background proteome!
from The Shell Game http://ift.tt/2ny4Y7j
Pecan has been a thorn in our side for the past month or so. Between the huge memory requirements and the lack of meaningful error messages it’s been slow going for results. At lab meeting last week we kicked around the idea of running Pecan on an AWS instance because of the extreme memory requests of the program (Laura’s data wants ~92gb, Yaaminis’s ~140) and the impending deadlines. So I spent a bit of time today figuring that out.
As it stands, I have it sort of working on a tiny (free) instance. Pecan, Percolator, pyMZML, and numpy installed and working. The final step will be either installing and using Sun Grid Engine (What we use on Emu and Roadrunner), or ensuring that Open Grid Engine (What comes installed on AWS instances) is compatible with SGE and then setting that up.
SGE would be the obvious answer, but in someone’s infinite wisdom, they decided the configuration program
amon should be a nice, pretty GUI. All well and good until you have to set it up over SSH.
I’ll try and download a couple of files, and someone’s reference proteome tonight, and will test both solutions tomorrow, and hopefully have someone that we can transfer over to a large server pretty easily.
2/22-3/10: Pecan ran for nearly 3 weeks, and although it appeared to have been functioning correctly Sean discovered that there was a problem: not enough memory to save all the feature files (there should be 80 per sample; 1 per isolation window). It would simply move on to the next sample, and thus I wasn’t getting all the peptides analyzed. Check out Sean’s notebook entry for more details.
3/9: Sean did a test run on Roadrunner with one data file and just 3 isolation windows using 3 of the 16 logical cores (we used 14 logical cores during the long run), and it worked fine, completing with a .blib file and everything in 10 hours!
Friday 3/10: We killed the long pecan run (“Pecan3”) – good news is that all the blanks worked so I won’t need to re-run them – and restarted pecan using 4 logical cores on Emu, using the 80 isolation windows, with only 4 data files queued. I selected the 4 data files so I wouldh have results from two sites, one each with both eelgrass and no-eelgrass geoduck represented. The estimated time was ~13.5 days to complete 2 data files (aka 2 samples). This is assuming that only using 3 isolation windows and 3 logical cores = 10 hours per sample file, and that increasing the logical cores from 3 to 4 results in a linear decrease in time/isolation window. With these settings I should have had data from 1 sample file done in a week, which I’ll plan on using for the poster. This Pecan run was called “Pecan4.” Check out GitHub Issue 526
Monday 3/13 Unfortunately we ran into the same memory issue as before, so killed the Pecan4 run on the morning of Monday 3/13. I then changed the settings to request 10GB memory, and only use 3 of Emu’s logical cores, and restarted Pecan with the same inputs.
Here’s a breakdown of how I made the adjustments (with help from Sean):
Logged in as Sean to decrease the logical cores used:
- in Terminal typed `sudo qmon” to open the QMON Main Control GUI
- Selected the “Queue Controls” button, and then highlighted the main queu line and selected the “modify” button:
- In the “Slots” cell clicked the down arrow to decrease slots from 4 to 3, cliked “Ok” (note: you must use the arrows to change slot numbers; it didn’t work if you highlighted and changed the number manually).
Logged back in as srlab to kill Pecan:
qstat -f to view in-progress and queued job numbers
qdel [job#], e.g.
qdel 601, for each job. Now Pecan has stopped running.
- Navigated to the directory with my Pecan input files, ~/Documents/Laura/DNR_geoduck/Pecan-inputs2/
- Re-ran Pecan (bolded inputs modified):
pecanpie **-o ~/Documents/Laura/DNR_geoduck/Pecan5_output/** -s laurageo -n DNR_geoduck_SpLibrary **--pecanMemRequest 10** /home/srlab/Documents/Laura/DNR_geoduck/Pecan-inputs2/DNR_Geoduck_mzMLpath.txt /home/srlab/Documents/Laura/DNR_geoduck/Pecan-inputs2/DNR_Geoduck_DatabasePath.txt /home/srlab/Documents/Laura/DNR_geoduck/Pecan-inputs2/DNR_Geoduck_IsolationScheme.csv --fido --jointPercolator --isolationSchemeType BOARDER --overwrite
Made the percolator .job and pecan2blib .job files executable (Sean found that Pecan isn’t doing this automatically, so need to do it manually):
- Navigate to /percolator/ directory, and type
xmod +x [.job] for all .job files.
- Navigate to /pecan2blib/ directory, and type
xmod +x pecan2blib.job (there’s only one .job file in this directory).
Checked out the job queue by typing
qstat -f in any terminal window:
How, you might ask, did Sean know that Pecan wasn’t correctly running all isolation windows? From Sean: “I looked at the number of .feature files (in the /Pecan4_output/pecan/ directory) compared to the run number it was on. When I looked yesterday it was processing run number 60, but there was only like 10 feature files.” Meaning, there should be the same # of feature files as the number of runs (aka isolation windows).
from The Shell Game http://ift.tt/2mDftqw