BWA Aligner on Hyak w/ Cod data.

Got a new test for hyak, running BWA on some cod data. I’m doing this all in a terminal window, so I’ll copy output here for posterity, as well as saving it to a text file.

First, copy a bunch of files over with wget. An example would be like below. These are fairly large files.

file 1

file 2

file 3

file 4

reference genome

wget http://de.cyverse.org/dl/d/EC35A828-1A13-4B61-9CE7-67939C4E648B/GGTCAGTT_6.1_trimmed.fastq
--2017-05-05 08:27:05--  http://de.cyverse.org/dl/d/EC35A828-1A13-4B61-9CE7-67939C4E648B/GGTCAGTT_6.1_trimmed.fastq
Resolving de.cyverse.org (de.cyverse.org)... 128.196.254.62
Connecting to de.cyverse.org (de.cyverse.org)|128.196.254.62|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://de.cyverse.org/dl/d/EC35A828-1A13-4B61-9CE7-67939C4E648B/GGTCAGTT_6.1_trimmed.fastq [following]
--2017-05-05 08:27:05--  https://de.cyverse.org/dl/d/EC35A828-1A13-4B61-9CE7-67939C4E648B/GGTCAGTT_6.1_trimmed.fastq
Connecting to de.cyverse.org (de.cyverse.org)|128.196.254.62|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘GGTCAGTT_6.1_trimmed.fastq’

    [            <=>                     ] 14,786,883,414 10.0MB/s 

After I moved everything, I unpacked and built the reference genome from the supplied .fa file with bwa index. I did this on an interactive execute node, because it wouldn’t be very time consuming.

[seanb80@n2049 cod]$ bwa index -p atl_cod_4_2017 -a bwtsw Gadus_morhua.gadMor1.dna.toplevel.fa > bwa_index.txt
[bwa_index] Pack FASTA... 7.67 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=1664229176, availableWord=129101124
[BWTIncConstructFromPacked] 10 iterations done. 99999992 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999992 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 299999992 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 399999992 characters processed.
[BWTIncConstructFromPacked] 50 iterations done. 499999992 characters processed.
[BWTIncConstructFromPacked] 60 iterations done. 599999992 characters processed.
[BWTIncConstructFromPacked] 70 iterations done. 699999992 characters processed.
[BWTIncConstructFromPacked] 80 iterations done. 799999992 characters processed.
[BWTIncConstructFromPacked] 90 iterations done. 899999992 characters processed.
[BWTIncConstructFromPacked] 100 iterations done. 999911096 characters processed.
[BWTIncConstructFromPacked] 110 iterations done. 1092851208 characters processed.
[BWTIncConstructFromPacked] 120 iterations done. 1175452616 characters processed.
[BWTIncConstructFromPacked] 130 iterations done. 1248864968 characters processed.
[BWTIncConstructFromPacked] 140 iterations done. 1314110040 characters processed.
[BWTIncConstructFromPacked] 150 iterations done. 1372096008 characters processed.
[BWTIncConstructFromPacked] 160 iterations done. 1423630072 characters processed.
[BWTIncConstructFromPacked] 170 iterations done. 1469429672 characters processed.
[BWTIncConstructFromPacked] 180 iterations done. 1510132424 characters processed.
[BWTIncConstructFromPacked] 190 iterations done. 1546305128 characters processed.
[BWTIncConstructFromPacked] 200 iterations done. 1578451496 characters processed.
[BWTIncConstructFromPacked] 210 iterations done. 1607019224 characters processed.
[BWTIncConstructFromPacked] 220 iterations done. 1632406280 characters processed.
[BWTIncConstructFromPacked] 230 iterations done. 1654966296 characters processed.
[bwa_index] 531.38 seconds elapse.
[bwa_index] Update BWT... 3.67 sec
[bwa_index] Pack forward-only FASTA... 4.85 sec
[bwa_index] Construct SA from BWT and Occ... 188.46 sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa index -p atl_cod_4_2017 -a bwtsw Gadus_morhua.gadMor1.dna.toplevel.fa
[main] Real time: 736.824 sec; CPU: 736.036 sec

Then I used picard to create a sequence dictionary via

[seanb80@n2049 cod]$ java -jar picard.jar CreateSequenceDictionary REFERENCE=Gadus_morhua.gadMor1.dna.toplevel.fa OUTPUT=Gadus_morhua.gadMor1.dna.toplevel.dict
[Fri May 05 16:17:45 UTC 2017] picard.sam.CreateSequenceDictionary REFERENCE=Gadus_morhua.gadMor1.dna.toplevel.fa OUTPUT=Gadus_morhua.gadMor1.dna.toplevel.dict    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri May 05 16:17:45 UTC 2017] Executing as seanb80@n2049.hyak.local on Linux 3.10.0-327.36.3.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15; Picard version: 2.9.1-SNAPSHOT
[Fri May 05 16:17:55 UTC 2017] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.16 minutes.

It looks like the next step will be alignment, so that will have to be done via sbatch.

Advertisements