Sam’s Notebook: Data Wrangling – FastA Splitting With faSplit

Steven posted an issue on GitHub regarding splitting a FastA file into multiple sequences. Specifically, he wanted a single, large FastA sequence (~89Mbp) split into smaller FastAs for BLASTing.

I downloaded the FastA he provided (https://d.pr/f/UlzHLR) and split the sequence into 2000bp chunks using the faSplit program (https://ift.tt/2Znl0nP

 faSplit \ size \ 20190731_faSplit_PGA-scaffold1_splits_2000bp/ \ 2000  

Sam’s Notebook: Data Summary – P.generosa Transcriptome Assemblies Stats

In our continuing quest to wrangle the geoduck transcriptome assemblies we have, I was tasked with compiling assembly stats for our various assemblies. The table below provides an overview of some stats for each of our assemblies. Links within the table go to the the notebook entries for the various methods from which the data was gathered. In general:

  • Genes/Isoforms stats come directly from the Trinity assembly stats output file.
  • transdecoder_pep is a count of headers in the Transdecoder FastA output file, transdecoder_pep.
  • CD-Hit is a count of headers in the CD-Hit-est FastA output file.
Assembly Genes Isoforms transdecoder_pep CD-Hit
ctenidia [216248(https://ift.tt/2ZlNCh7] 349773 72274 325783
gonad 151263 198748 31706 189378
Juvenile (EPI 115) 199765 320691 78149 297848
Juvenile (EPI 116) 268476 434877 99089 408498
Juvenile (EPI 123) 196131 303568 67398 284852
Juvenile (EPI 124) 255277 421670 93285 395527
Larvae (EPI 99) 249799 425165 77694 379210
MEANS 219566 350642 74228 325871

Sam’s Notebook: Transcriptome Compression – P.generosa Transcriptome Assemblies Using CD-Hit-est on Mox

In continued attempts to get a grasp on the geoduck transcriptome size, I decided to “compress” our various assemblies by clustering similar transcripts in each assembly in to a single “representative” transcript, using CD-Hit-est. Settings use to run it were taken from the Trinity FAQ regarding “too many transcripts”.

A bash script was used to rsync files to Mox and then execute the SBATCH script.

Bash script (GitHub):

 #!/usr/bin/bash # Script to retrieve geoduck Trinity assemblies # Assemblies will be used in SBATCH script called at end of this script. # Script needs to be run within same directory as SBATCH script. # Exit if any command fails set -e # Set rsync remote path gannet="gannet:/volume2/web/Atumefaciens" owl="owl:/volume1/web/Athaliana" # Create array of directories for storing Trinity assemblies assembly_dirs_array=( /gscratch/srlab/sam/data/P_generosa/transcriptomes/20180827_assembly /gscratch/srlab/sam/data/P_generosa/transcriptomes/ctenidia /gscratch/srlab/sam/data/P_generosa/transcriptomes/gonad /gscratch/srlab/sam/data/P_generosa/transcriptomes/heart /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI115 /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI116 /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI123 /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI124 /gscratch/srlab/sam/data/P_generosa/transcriptomes/larvae/EPI99) # Array of Trinity assemblies remote paths for rysnc-ing assemblies_array=( 20180827_trinity_geoduck_RNAseq/Trinity.fasta 20190409_trinity_pgen_ctenidia_RNAseq/trinity_out_dir/Trinity.fasta 20190409_trinity_pgen_gonad_RNAseq/trinity_out_dir/Trinity.fasta 20190215_trinity_geoduck_heart_RNAseq/trinity_out_dir/Trinity.fasta 20190409_trinity_pgen_EPI115_RNAseq/trinity_out_dir/Trinity.fasta 20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta 20190409_trinity_pgen_EPI123_RNAseq/trinity_out_dir/Trinity.fasta 20190409_trinity_pgen_EPI124_RNAseq/trinity_out_dir/Trinity.fasta 20190409_trinity_pgen_EPI99_RNAseq/trinity_out_dir/Trinity.fasta) # Retrieve FastA files via rsync for index in "${!assemblies_array[@]}" do # Remove everything after first slash assembly=$(echo "${assemblies_array[index]%%/*}") echo "Preparing to download ${assembly}..." if [ "${assembly}" = "20180827_trinity_geoduck_RNAseq" ]; then echo "Now syncing ${assembly} to ${assembly_dirs_array[index]}" rsync \ --archive \ --progress \ "${owl}/${assemblies_array[index]}" \ "${assembly_dirs_array[index]}" else echo "Now syncing ${assembly} to ${assembly_dirs_array[index]}" rsync \ --archive \ --progress \ "${gannet}/${assemblies_array[index]}" \ "${assembly_dirs_array[index]}" fi done # Start SBATCH script to run CD-Hit on all transcriptome assemblies sbatch 20190729_cdhit-est_pgen_transcriptomes.sh  

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=cdhit_pgen ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190729_cdhit-est_pgen_transcriptomes # This script is called by 20190729_cdhit_pgen_trinity_assemblies.sh. # That script uses rsync to transfer files to Mox via the login node. # This is required because Mox execute nodes don't have internet access. # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log # Set CPU threads threads=27 # Program paths cd_hit_est="/gscratch/srlab/programs/cd-hit-v4.8.1-2019-0228/cd-hit-est" # Create assembly paths array assembly_dirs_array=( /gscratch/srlab/sam/data/P_generosa/transcriptomes/20180827_assembly /gscratch/srlab/sam/data/P_generosa/transcriptomes/ctenidia /gscratch/srlab/sam/data/P_generosa/transcriptomes/gonad /gscratch/srlab/sam/data/P_generosa/transcriptomes/heart /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI115 /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI116 /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI123 /gscratch/srlab/sam/data/P_generosa/transcriptomes/juvenile/EPI124 /gscratch/srlab/sam/data/P_generosa/transcriptomes/larvae/EPI99) # Run cd-hit-est on each assembly for index in "${!assembly_dirs_array[@]}" do # Store individual sample name by removing # everything up to and including the last slash in path sample_name=$(echo "${assembly_dirs_array[index]##*/}") # Run cd-hit-est "${cd_hit_est}" \ -o "${sample_name}".cdhit \ -c 0.98 \ -i "${assembly_dirs_array[index]}"/Trinity.fasta \ -p 1 \ -d 0 \ -b 3 \ -T "${threads}" \ -M 0 done  

Shelly’s Notebook: Wed. Jul. 24, 2019 Pt. Whitney Juv. Geoduck resistance to stress plans

experimental plans

uc?export=view&id=1Bt66XE4MpND6zPW8NF8JkxBy3fPSF5ac

biological response measurement ideas

options for low pH treatment:

  1. piggy-back on Sam’s treatment: constant low pH 7.0 total scale
  2. variable diel low pH to simulate lagoon conditions

options for high temperature treatment:

  1. constant high temperature of 20C
  2. constant high temperature of 29C (this will likely kill them before 1 month)
  3. variable diel high temperature of 29C to simulate lagoon conditions

Space:

  • There is space avaiable across from our current heath stacks where the totes to put another heath stack in.
  • Matt should have space for 4 trays in some other heath stacks to house any left over animals if needed

NEXT STEPS:

  • begin heath stack set up directly across from heath stacks
    • test out heat control, flow, etc.

Equipment update

  • Thermometer for discrete temp measurements errored out. See slack post. It may just be the probe, but need to test this.
  • Apex status:
    • Controller not lighting up at all when plugged into everything. Need to call support and get help troubleshooting this
    • Power strip lights up fine
    • Aqua buses:
      • how many can be linked to 1 Apex controller?
    • Probes:
      • ideally need 12 pH + temp sets
    • brought Controller back to UW to troubleshoot
  • Have 20 total heath trays plumbed for downwelling
  • Can use 1 micron screen mesh tray inserts with large grain sand (Matt has extra).
    • NEED to check:
      • # of good quality inserts we have (ideally 10)
      • # of inserts that need dividers added
        • can cut dividers from black plexiglass upstairs in the hatchery
  • Controlled heating source:
    • NEED 2 heat rods that can be plugged into the apex
      • have these at UW, need to test
  • Flow:
    • if doing constant pH, NEED 4 pumps with adjustable flow if using same header conicals as Sam
    • if doing diel variable pH, use different conical and don’t need adjustable flow
      • extra pumps at hatchery (imagitarium secondary pumps used for circulation in broodstock experiment)
      • would need additional probe set
  • More equipment will free up when Sam is done by the end of August

Checked on animals

uc?export=view&id=1Pe-WvIdQ4T7YklqcTAnnKUcnp6suxDSZ

Size difference between H2T3 animals and H2T1 animals from cascade feeding uc?export=view&id=1D_90h2WL1ItCFDRABYHSHBRVeJdi6lrL

In-flow hose was not going directly into the heath tray insert in H2T1: uc?export=view&id=1igdszlxJwQZ6NFHv9yTogQA8WAY3lwNt

Tried to break up algae and clear mesh to improve flow of food and shuffled trays around to new order:

  1. H2T1
  2. H2T5
  3. H2T7
  4. H2T3
  5. H2T6 (extras)
  6. H2T2 (low density)

*only swapped H2T3 with H2T1 because these trays showed the biggest size differences.

*Left H2T6 and H2T2 in same positions because they had a lot of die off and aren’t going to be part of the experiment.

Size data (photos) for all animals are here. Size ranges for all are about 2-6mm. I will do individual measurements with image J

Survival and size data for larval rearing through now I think is too convoluted to draw conclusions from parental exposure.

Prior to pH x temp experiment, I can size select to start with animals of similar sizes across the board.

from shellytrigg https://ift.tt/2ynMaPM
via IFTTT

Laura’s Notebook: Oly RNA isolation – larvae

July 23, 2019

Homogenized and frozen larvae to prep for RNA isolation, and aliquoted larve in ethanol for imaging/measuring at a later date. These larvae were collected during spring 2017, captured on screens upon release from mother, concentrated into microcentrifuge tubes, and placed immediately into a -80 freezer to preserve. Larvae in the same tube are likely to be full siblings or half siblings; the number of larvae collected on that day / from that collection bucket is a good indication of whether more than one female released brood that day (we expect ~200k larvae/female).

First step is homogenization, which is particularly necessary due to larvae having shells that need to be broken up. As with the ctenidia samples, I used mortar+pestle and liquid nitrogen. I pre-portioned 1 mL of RNAzol into microcentrifuge tubes, and transferred up to 100 mg homogenized larvae into chilled RNAzol. NOTE: I tried to get as much tissue as possible up to 100 mg, since the larval shells likely comprise a significant percentage of the mass.

In total I have 83 larval samples. I isolated RNA from 14 of these samples in spring 2018 for the test QuantSeq round. A few resulted in too little RNA, 2 of which had sufficient larvae remaining for another attempt to isolate RNA. So, I homogenized a total of 70 samples into RNAzol in 10 batches (~4 days of work), with an additional 2 controls (just liquid nitrogen ground into mortar+pestle).

July 24, 2019

Did a first round of RNA isolation (n=12, batch #1) to test protocol. Followed RNAzol protocol like with the ctenidia samples, with two exceptions: 1) after addition of DEPC-treatetd water (0.2 mL), centrifuged at 16,000 rcf (not 12,000); 2) added 100 uL DEPC-treated water in last step to dissolve RNA. Samples in this 1st batch: ABCDEFG

Also of note: after precipitating the DNA/proteins, the supernatant retained a black/gray color. Then, after adding isopropanol, the RNA pellet was predominantly black in 10 of the 12 samples (the 2 samples with small, white pellets were X and Y). Then, upon dissolving RNA in the DEPC-treated water and vortexing for 5 minutes, there was a black substance that settled into the bottom of the tubes. A black substance occurred before, in 2018 when I did a preliminary round of RNA isolation from the same study’s Oly larvae (see notebook post); this didn’t seem to interfere with the successful QuantSeq run.

I quantified RNA concentration in this first batch, using 1 uL of the RNA solution, pulled from the surface of the solution to avoid the black substance. Measured RNA in all samples, at a concentration of between 30-90 ng/uL.

Image of a few tubes showing that RNA pellete included black substance: IMG_8758

Image of RNA dissolved in water, showing a black substance settled at the bottom of the tubes. IMG_8757

July 26, 2019

Big day of RNA isolation – did 4 batches of 12, but processed 2 batches simultaneously, offset by about 10 minutes. This was feasible because there is down time during reactions/centrifuging.

Batches #2 and #3 (run simultaneously offset), followed the same protocol as with batch #1 (above), with one exception: added 75 uL of DEPC-treated water in final step to dissolve RNA.

Batch #2 samples: 402, 412, 421, 431, 432, 442, 452, 461, 462, 482, 532, 542

Batch #3 samples: 403, 404, 472, 473, 483, 484, 491, 522, 533, 552, 562, 571

Batches #4 and #5 (run simultaneously, offset), followed same protocol as batches 2 & 3, with one exception: did not vortex homogenate prior to transferring 500 uL to new tubes with the 200 uL DEPC-treated water. The resulting RNA dissolved in water had significantly less black substance, likely due to this adjustment.

I quantified RNA isolated from all samples, shown in the below table, and the spreadsheet is saved in the repo. A handful of samples resulted in relatively poor yield (452, 461, 462, 472, 533, 552), and several samples had a substantial amount of black substance mixed into the RNA solution (404, 431, 432, 442, 443(ish), 461, 462, 472, 486(ish), 521(ish), 522(ish), 552). Some samples in these these two categories overlapped – 461, 462, 472, 552. I will process some of the remaining homogenate (250 uL) to try to get a cleaner batch of RNA.

Date larvae collected Cohort Treatment TISSUE SAMPLE # HOMOG. TUBE # VOL RNAzol (mL) MASS TISSUE (mg) DATE HOMOG. HOMOG. BATCH RNA ISOLATION DATE RNA ISOLATION BATCH Total RNA volume remaining, uL (RNA + H2O) [RNA] ng/uL Amount of RNA (ng) Volume needed for 500 ng RNA Notes
5/24/17 Dabob Bay 10 Ambient 14-A 401 1 100 19-Jul 1 24-Jul 1 99 52.0 5,148 9.62
5/31/17 Dabob Bay 10 Ambient 31-A 402 1 10 20-Jul 3 26-Jul 2 74 140.0 10,360 3.57
6/19/17 Dabob Bay 10 Ambient 75-A 403 1 40 22-Jul 5 26-Jul 3 74 148.0 10,952 3.38
6/29/17 Dabob Bay 10 Ambient 80-A 404 1 110 22-Jul 6 26-Jul 3 74 95.2 7,045 5.25
5/26/17 Dabob Bay 10 Low 23-A 411 1 10 19-Jul 1 24-Jul 1 99 57.2 5,663 8.74
5/27/17 Dabob Bay 10 Low 27-A 412 1 10 20-Jul 3 26-Jul 2 74 60.8 4,499 8.22
6/10/17 Dabob Bay 10 Low 58-A 413 1 80 22-Jul 7 26-Jul 4 74 91.4 6,764 5.47
6/12/17 Dabob Bay 10 Low 60-A 414 1 20 23-Jul 8 26-Jul 4 74 136.0 10,064 3.68
6/12/17 Dabob Bay 6 Ambient 59-A 421 1 10 20-Jul 2 26-Jul 2 74 43.0 3,182 11.63
6/7/17 Dabob Bay 6 Low 51-A 431 1 20 20-Jul 2 26-Jul 2 74 43.4 3,212 11.52
6/17/17 Dabob Bay 6 Low 72-A 432 1 50 22-Jul 4 26-Jul 2 74 47.6 3,522 10.50
6/17/17 Dabob Bay 6 Low 73-A 433 na na na na na na na na na na
6/19/17 Dabob Bay 6 Low 74-A 434 1 60 23-Jul 9 5 74 67.8 5,017 7.37
5/25/17 Fidalgo Bay 10 Ambient 20-A 441 1 70 19-Jul 1 24-Jul 1 99 46.0 4,554 10.87
6/3/17 Fidalgo Bay 10 Ambient 38-A 442 1 80 20-Jul 3 26-Jul 2 74 LOW
6/7/17 Fidalgo Bay 10 Ambient 53-A 443 1 60 22-Jul 6 26-Jul 4 74 58.4 4,322 8.56
6/14/17 Fidalgo Bay 10 Ambient 63-A 444 1 80 22-Jul 7 26-Jul 4 74 91.2 6,749 5.48
6/15/17 Fidalgo Bay 10 Ambient 65-A 445 1 40 23-Jul 8 26-Jul 5 74 132.0 9,768 3.79
5/24/17 Fidalgo Bay 10 Low 16-A 451 1 70 19-Jul 1 24-Jul 1 99 68.4 6,772 7.31
5/24/17 Fidalgo Bay 10 Low 18-A 452 1 80 20-Jul 3 26-Jul 2 74 37.8 2,797 13.23 poor yield b/c black substance w/ RNA?
6/3/17 Fidalgo Bay 10 Low 36-A 453 1 80 23-Jul 9 27-Jul 6 74 85.2 6,305 5.87
5/26/17 Fidalgo Bay 6 Ambient 22-A 461 1 100 20-Jul 2 26-Jul 2 74 31.0 2,294 16.13 poor yield b/c black substance w/ RNA?
5/29/17 Fidalgo Bay 6 Ambient 29-A 462 1 60 22-Jul 4 26-Jul 2 74 29.8 2,205 16.78 poor yield b/c black substance w/ RNA?
5/25/17 Fidalgo Bay 6 Low 19-A 471 1 100 20-Jul 2 24-Jul 1 99 33.8 3,346 14.79
5/26/17 Fidalgo Bay 6 Low 21-A 472 1 70 22-Jul 4 26-Jul 3 74 8.3 613 60.39 poor yield b/c black substance w/ RNA?
6/5/17 Fidalgo Bay 6 Low 46-A 473 1 50 22-Jul 5 26-Jul 3 74 120.0 8,880 4.17
6/5/17 Fidalgo Bay 6 Low 47-A 474 1 50 22-Jul 6 26-Jul 4 74 81.0 5,994 6.17
6/6/17 Fidalgo Bay 6 Low 50-A 475 1 80 22-Jul 7 26-Jul 4 74 110.0 8,140 4.55
6/10/17 Fidalgo Bay 6 Low 54-A 476 1 80 23-Jul 8 26-Jul 5 74 43.4 3,212 11.52
6/19/17 Fidalgo Bay 6 Low 76-A 477 1 40 23-Jul 9 27-Jul 6 74 HIGH
5/20/17 Oyster Bay C1 10 Ambient 02-A, 02-B 481 1 40 19-Jul 1 24-Jul 1 99 64.4 6,376 7.76
5/20/17 Oyster Bay C1 10 Ambient 04-A, 04-B 482 1 60 20-Jul 3 26-Jul 2 74 67.2 4,973 7.44
5/21/17 Oyster Bay C1 10 Ambient 03-A, 03-B 483 1 110 22-Jul 4 26-Jul 3 74 95.8 7,089 5.22
5/23/17 Oyster Bay C1 10 Ambient 09-A 484 1 40 22-Jul 5 26-Jul 3 74 66.2 4,899 7.55
6/1/17 Oyster Bay C1 10 Ambient 34-A 485 1 20 22-Jul 6 26-Jul 4 74 156.0 11,544 3.21
6/3/17 Oyster Bay C1 10 Ambient 39-A 486 1 90 22-Jul 7 26-Jul 4 74 41.0 3,034 12.20
6/3/17 Oyster Bay C1 10 Ambient 40-A 487 1 30 23-Jul 8 26-Jul 5 74 112.0 8,288 4.46
6/4/17 Oyster Bay C1 10 Ambient 44-A 488 1 70 23-Jul 9 26-Jul 5 74 57.2 4,233 8.74
6/6/17 Oyster Bay C1 10 Ambient 49-A 489 1 10 23-Jul 9 27-Jul 6 74 57.2 4,233 8.74
6/14/17 Oyster Bay C1 10 Ambient 64-A 490 1 70 22-Jul 7 27-Jul 6 74 58.6 4,336 8.53
6/15/17 Oyster Bay C1 10 Ambient 66-A 491 1 20 22-Jul 5 26-Jul 3 74 126.0 9,324 3.97
7/6/17 Oyster Bay C1 10 Ambient 81-A 492 1 70 22-Jul 6 26-Jul 5 74 122.0 9,028 4.10
5/21/17 Oyster Bay C1 10 Low 06-A, 06-B 501 1 <10 19-Jul 1 na na na na na na
5/23/17 Oyster Bay C1 10 Low 08-A 502 na na na na na na na na na na
5/26/17 Oyster Bay C1 10 Low 24-A 503 na na na na na na na na na na
5/27/17 Oyster Bay C1 10 Low 26-A 504 na na na na na na na na na na
5/31/17 Oyster Bay C1 10 Low 32-A 505 na na na na na na na na na na
6/14/17 Oyster Bay C1 10 Low 62-A 506 1 80 20-Jul 2 24-Jul 1 99 63.8 6,316 7.84
6/15/17 Oyster Bay C1 10 Low 67-A 507 na na na na na na na na na na
6/24/17 Oyster Bay C1 10 Low 79-A 508 na na na na na na na na na na
5/23/17 Oyster Bay C1 6 Ambient 10-A 511 na na na na na na na na na na
6/3/17 Oyster Bay C1 6 Ambient 37-A 512 na na na na na na na na na na
6/5/17 Oyster Bay C1 6 Ambient 45-A 513 30 23-Jul 10 26-Jul 5 74 156.0 11,544 3.21
6/6/17 Oyster Bay C1 6 Ambient 48-A 514 na na na na na na na na na na
6/15/17 Oyster Bay C1 6 Ambient 69-A 515 na na na na na na na na na na
6/17/17 Oyster Bay C1 6 Ambient 71-A 516 na na na na na na na na na na
6/19/17 Oyster Bay C1 6 Ambient 77-A 517 na na na na na na na na na na
5/21/17 Oyster Bay C1 6 Low 01-A, 01-B- 01-C 521 1 70 20-Jul 2 24-Jul 1 99 54.4 5,386 9.19
5/22/17 Oyster Bay C1 6 Low 07-A 522 1 20 22-Jul 5 26-Jul 3 74 60.8 4,499 8.22
5/27/17 Oyster Bay C1 6 Low 25-A 523 1 30 22-Jul 6 26-Jul 4 74 60.6 4,484 8.25
5/27/17 Oyster Bay C1 6 Low 28-A 524 1 80 22-Jul 7 26-Jul 4 74 80.8 5,979 6.19
5/29/17 Oyster Bay C1 6 Low 30-A 525 1 30 23-Jul 8 26-Jul 5 74 128.0 9,472 3.91
5/31/17 Oyster Bay C1 6 Low 33-A 526 1 30 23-Jul 9 27-Jul 6 74 65.2 4,825 7.67
6/14/17 Oyster Bay C1 6 Low 61-A 527 1 90 22-Jul 6 27-Jul 6 74 81.4 6,024 6.14
6/15/17 Oyster Bay C1 6 Low 68-A 528 1 30 23-Jul 8 26-Jul 5 74 162.0 11,988 3.09
6/17/17 Oyster Bay C1 6 Low 70-A 529 1 70 23-Jul 9 26-Jul 5 74 73.4 5,432 6.81
5/24/17 Oyster Bay C2 10 Ambient 17-A 531 1 60 19-Jul 1 24-Jul 1 99 88.2 8,732 5.67
6/3/17 Oyster Bay C2 10 Ambient 42-A 532 1 40 20-Jul 3 26-Jul 2 74 158.0 11,692 3.16
6/10/17 Oyster Bay C2 10 Ambient 56-A 533 1 <10 22-Jul 4 26-Jul 3 74 7.4 548 67.57 poor yield b/c very little tissue (likely)
5/23/17 Oyster Bay C2 10 Low 12-A 541 1 40 10-Jul 1 24-Jul 1 99 45.6 4,514 10.96
5/24/17 Oyster Bay C2 10 Low 13-A 542 1 30 20-Jul 3 26-Jul 2 74 82.0 6,068 6.10
6/4/17 Oyster Bay C2 10 Low 43-A 543 1 80 22-Jul 7 27-Jul 6 74 61.4 4,544 8.14
6/1/17 Oyster Bay C2 6 Ambient 35-A 551 1 30 20-Jul 2 24-Jul 1 99 86.0 8,514 5.81
6/3/17 Oyster Bay C2 6 Ambient 41-A 552 1 80 22-Jul 5 26-Jul 3 74 17.5 1,295 28.57 poor yield b/c black substance w/ RNA?
6/10/17 Oyster Bay C2 6 Ambient 55-A 553 1 30 23-Jul 8 26-Jul 4 74 200.0 14,800 2.50
6/20/17 Oyster Bay C2 6 Ambient 78-A 554 1 30 23-Jul 9 26-Jul 5 74 156.0 11,544 3.21
5/21/17 Oyster Bay C2 6 Low 05-A 561 1 40 20-Jul 2 24-Jul 1 99 43.4 4,297 11.52
5/23/17 Oyster Bay C2 6 Low 11-A 562 1 90 22-Jul 5 26-Jul 3 74 106.0 7,844 4.72
5/24/17 Oyster Bay C2 6 Low 15-A 563 1 50 22-Jul 6 26-Jul 4 74 84.6 6,260 5.91
6/7/17 Oyster Bay C2 6 Low 52-A 564 1 not recorded 22-Jul 7 27-Jul 6 74 HIGH
6/10/17 Oyster Bay C2 6 Low 57-A 565 1 10 23-Jul 8 27-Jul 6 74 31.8 2,353 15.72
NA RNA Control RNA Control 571 1 10 22-Jul 4 26-Jul 3 74 LOW NA NA
NA RNA Control RNA Control 572 1 10 23-Jul 10 26-Jul 5 74 LOW NA NA
NA RNA Control RNA Control 574 NA NA NA NA 27-Jul 6 74 LOW NA NA

from The Shell Game https://ift.tt/2GCINc3
via IFTTT

Shelly’s Notebook: Tues. Jul. 23, 2019 Salmon + sea lice methylomes and Oyster Proteomics

Oyster Proteomics

Salmon + sea lice methylomes

  • Still running TrimGalore! (probably will be around 30 hours to complete)
  • Prepared Bismark genomes on Mox:
    • Salmon genome prep
      • script location: /gscratch/srlab/strigg/jobs/BuildSalmo_BmrkGenome.sh
      • bismark genome location: /gscratch/srlab/strigg/data/Ssalar/GENOMES
    • Sea lice genome prep
      • script location: /gscratch/srlab/strigg/jobs/BuildCalig_BmrkGenome.sh
      • bismark genome location: /gscratch/srlab/strigg/data/Caligus/GENOMES
  • Determine bismark alignment settings to use

from shellytrigg https://ift.tt/2y8ReXS
via IFTTT

Sam’s Notebook: Genome Annotation – Pgenerosa_v074 Hisat2 Transcript Isoform Index

Essentially, the steps below (which is what was done here) are needed to prepare files for use with Stringtie:

  1. Create GTF file (basically a GFF specifically for use with transcripts – thus the “T” in GTF) from input GFF file. Done with GFF utilities software.
  2. Identify splice sites and exons in newly-created GTF. Done with Hisat2 software.
  3. Create a Hisat2 reference index that utilizes the GTF. Done with Hisat2 software.

This was run on Mox.

The SBATCH script has a bunch of leftover extraneous steps that aren’t relevant to this step of the annotation process; specifically the FastQ manipulation steps. This is due to a copy/paste from a previous Hisat2 run that I neglected to edit out and I didn’t want to edit the script after I actually ran it, so have left it in here.

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=oly_hisat2 ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190723_hisat2-build_pgen_v074 # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log threads=28 genome_index_name="Pgenerosa_v074" # Paths to programs gffread="/gscratch/srlab/programs/gffread-0.11.4.Linux_x86_64/gffread" hisat2_dir="/gscratch/srlab/programs/hisat2-2.1.0" hisat2_build="${hisat2_dir}/hisat2-build" hisat2_exons="${hisat2_dir}/hisat2_extract_exons.py" hisat2_splice_sites="${hisat2_dir}/hisat2_extract_splice_sites.py" # Input/output files fastq_dir="/gscratch/scrubbed/samwhite/data/P_generosa/RNAseq" genome_dir="/gscratch/srlab/sam/data/P_generosa/genomes" genome_gff="${genome_dir}/Pgenerosa_v074_genome_snap02.all.renamed.putative_function.domain_added.gff" exons="hisat2_exons.tab" genome_fasta="${genome_dir}/Pgenerosa_v074.fa" splice_sites="hisat2_splice_sites.tab" transcripts_gtf="Pgenerosa_v074_genome_snap02.all.renamed.putative_function.domain_added.gtf" ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() # Create array of fastq R1 files for fastq in "${fastq_dir}"/*R1*.gz do fastq_array_R1+=("${fastq}") done # Create array of fastq R2 files for fastq in "${fastq_dir}"/*R2*.gz do fastq_array_R2+=("${fastq}") done # Create array of sample names ## Uses parameter substitution to strip leading path from filename ## Uses awk to parse out sample name from filename for R1_fastq in "${fastq_dir}"/*R1*.gz do names_array+=($(echo "${R1_fastq#${fastq_dir}}" | awk -F"[_.]" '{print $1 "_" $5}')) done # Create list of fastq files used in analysis ## Uses parameter substitution to strip leading path from filename for fastq in "${fastq_dir}"/*.gz do echo "${fastq#${fastq_dir}}" >> fastq.list.txt done # Create transcipts GTF from genome FastA "${gffread}" -T \ "${genome_gff}" \ -o "${transcripts_gtf}" # Create Hisat2 exons tab file "${hisat2_exons}" \ "${transcripts_gtf}" \ > "${exons}" # Create Hisate2 splice sites tab file "${hisat2_splice_sites}" \ "${transcripts_gtf}" \ > "${splice_sites}" # Build Hisat2 reference index using splice sites and exons "${hisat2_build}" \ "${genome_fasta}" \ "${genome_index_name}" \ --exon "${exons}" \ --ss "${splice_sites}" \ -p "${threads}" \ 2> hisat2_build.err # Copy Hisat2 index files to my data directory rsync -av "${genome_index_name}"*.ht2 "${genome_dir}"