Shelly’s Notebook: Fri. Jul. 19, 2019 Salmon sea lice methylomes

Received salmon sea lice methylome data!!

copy data to Gannett metacarcinus folder

  • I received this url from the sequencing core
  • I mounted gannet via finder -> connect to server
  • I installed Globus Connect Personal on Ostrich and gave the Globus app writing permissions to my metacarcinus folder on Gannet
  • I navigated to my Globus account file manager section via the provided url and could see my sequencing data “UW_Trigg_190718.tar.gz” 148.22GB.
  • I selected the “Transfer of Sync to…” option, entered the computer name “Ostrich” which I entered when setting up Globus Connect Personal app, entered the path “/Volumes/web/metacarcinus/Salmo_Calig/FASTQS/”. Then clicked start. img
  • I navigated to the Activity section of Globus and could see the info being transferred (26GB in 10min) img
  • after a couple hours, I received an email notification and could see in the status section that the transfer had completed successfully img

copy data from Gannett to mox

ran this command:

 rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Volumes/web/metacarcinus/Salmo_Calig/FASTQS/UW_Trigg_190718.tar.gz /gscratch/srlab/strigg/data/Salmo_Calig/FASTQS/  

I decided to make two species folders rather than have them combined, and wanted to remove the data that was not Salmon or sea lice fastqs, so ran the following commands:

 tar -xvf UW_Trigg_190718.tar.gz cd UW_Trigg_190718_done/ rm -r Reports/ rm -r Stats/ rm -r Undetermined_S0_L00* cd ../../../ mkdir Caligus mv Salmo_Calig/ Ssalar cd Ssalar/ cd FASTQS/ cd .. cd Caligus/ mkdir FASTQS mv ../Ssalar/FASTQS/UW_Trigg_190718_done/Sealice_F* . mv *.gz FASTQS/ rm UW_Trigg_190718.tar.gz  

copy data from Gannett to owl

run fastqc and trim

  • prepared scripts (Salmon: /gscratch/srlab/strigg/jobs/20190719_FASTQC_ADPTRIM_Ssalar.sh and lice: /gscratch/srlab/strigg/jobs/20190719_FASTQC_ADPTRIM_Caligus.sh) to do this on Mox but didn’t work because multiqc and cutadapt programs could not be found as I had them. See github issue #712

from shellytrigg https://ift.tt/2M5nkMt
via IFTTT

Sam’s Notebook: Genome Annotation – O.lurida 20190709-v081 Transcript Isoform ID with Stringtie on Mox

Earlier today, I generated the necessary Hista2 index, which incorporated splice sites and exons, for use with Stringtie in order to identify transcript isoforms in our 20190709-Olurida_v081 annotation. This annotation utilized tissue-specific transcriptome assemblies provided by Katherine Silliman.

I used all the trimmed FastQ files from the 20180827 Trinity transcriptome assembly, as this utilized all of our existing RNAseq data.

Command to pull trimmed files (Trimmomatic) out of the Trinity output folder that is a gzipped tarball:

 tar -ztvf trinity_out_dir.tar.gz \ | grep -E "*P.qtrim.gz" \ && tar -zxvf trinity_out_dir.tar.gz \ -C /home/sam/Downloads/ \ --wildcards "*P.qtrim.gz"  

This was run locally on my computer (swoose) and then rsync‘d to Mox.

NOTE: The “P” in the *.P.qtrim.gz represents trimmed reads that are properly paired, as determined by Trimmomatic/Trinity. See the fastq.list.txt for the list of FastQ files used as input. For more info on input FastQ files, refer to the Nightingales Google Sheet.

Here’s the quick rundown of how transcript isoform annotation with Stringtie runs:

  1. Use Hisat2 reference index with identified splice sites and exons (this was done yesterday).
  2. Use Hisat2 to create alignments from each pair of trimmed FastQ files. Need to use the --downstream-transcriptome-assembly option!!!
  3. Use Stringtie to create a GTF for each alignment.
  4. Use Stringtie to create a singular, merged GTF from all of the individual GTFs.

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=oly_stringtie ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190716_stringtie_20190709-olur-v081 # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log threads=27 genome_index_name="20190709-Olurida_v081" # Paths to programs hisat2_dir="/gscratch/srlab/programs/hisat2-2.1.0" hisat2="${hisat2_dir}/hisat2" samtools="/gscratch/srlab/programs/samtools-1.9/samtools" stringtie="/gscratch/srlab/programs/stringtie-1.3.6.Linux_x86_64/stringtie" # Input/output files genome_gff="/gscratch/srlab/sam/data/O_lurida/genomes/Olurida_v081/20190709-Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff" genome_index_dir="/gscratch/srlab/sam/data/O_lurida/genomes/Olurida_v081" fastq_dir="/gscratch/srlab/sam/data/O_lurida/RNAseq/" gtf_list="gtf_list.txt" ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() names_array=() # Copy Hisat2 genome index rsync -av "${genome_index_dir}"/${genome_index_name}*.ht2 . # Generate checksum of GFF file for backtracking to original # Original named: Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff # Created in 20190709 Olurida_v081 annotation - renamed to avoid filename clashes with previous annotations. md5sum "${genome_gff}" > genome_gff.md5 # Create array of fastq R1 files for fastq in "${fastq_dir}"*R1*.gz do fastq_array_R1+=("${fastq}") done # Create array of fastq R2 files for fastq in "${fastq_dir}"*R2*.gz do fastq_array_R2+=("${fastq}") done # Create array of sample names ## Uses parameter substitution to strip leading path from filename ## Uses awk to parse out sample name from filename for R1_fastq in "${fastq_dir}"*R1*.gz do names_array+=($(echo "${R1_fastq#${fastq_dir}}" | awk -F"[_.]" '{print $1 "_" $5}')) done # Create list of fastq files used in analysis ## Uses parameter substitution to strip leading path from filename for fastq in "${fastq_dir}"*.gz do echo "${fastq#${fastq_dir}}" >> fastq.list.txt done # Hisat2 alignments for index in "${!fastq_array_R1[@]}" do sample_name=$(echo "${names_array[index]}") "${hisat2}" \ -x "${genome_index_name}" \ --downstream-transcriptome-assembly \ -1 "${fastq_array_R1[index]}" \ -2 "${fastq_array_R2[index]}" \ -S "${sample_name}".sam \ 2> "${sample_name}"_hisat2.err # Sort SAM files, convert to BAM, and index "${samtools}" view \ -@ "${threads}" \ -Su "${sample_name}".sam \ | "${samtools}" sort - \ -@ "${threads}" \ -o "${sample_name}".sorted.bam "${samtools}" index "${sample_name}".sorted.bam # Run stringtie on alignments "${stringtie}" "${sample_name}".sorted.bam \ -p "${threads}" \ -o "${sample_name}".gtf \ -G "${genome_gff}" \ -C "${sample_name}.cov_refs.gtf" # Add GTFs to list file echo "${sample_name}.gtf" >> "${gtf_list}" done # Create singular transcript file, using GTF list file "${stringtie}" --merge \ "${gtf_list}" \ -p "${threads}" \ -G "${genome_gff}" \ -o "${genome_index_name}".stringtie.gtf # Delete unneccessary index files rm "${genome_index_name}"*.ht2 # Delete unneded SAM files rm ./*.sam  

Sam’s Notebook: Genome Annotation – Pgenerosa_v070 and v074 Top 18 Scaffolds Feature Count Comparisons

After annotating Pgenerosa_v074 on 20190701, we noticed a large discrepancy in the number of transcripts that MAKER identified, compared to Pgenerosa_v070. As a reminder, the Pgenerosa_v074 is a subset of Pgenerosa_v070 containing only the top 18 longest scaffolds. So, we decided to do a quick comparison of the annotations present in these 18 scaffolds Pgenerosa_v070 and Pgenerosa_v074.

Briefly, used grep to pull out features identified in the same 18 scaffolds in the Pgenerosa_v074 assembly from Pgenerosa_v070 annotated GFF from 20190228 and then counted the number of features identified in this newly subsetted GFF. It’s all documented in the Jupyter Notebook below.

Jupyter Notebook (GitHub):

[code]#!/bin/bash ## Job Name -...

#!/bin/bash
## Job Name - can be changed
#SBATCH --job-name=bs-geo
## Allocation Definition - confirm correctness
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes (often you will only use 1)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=30-00:00:00
## Memory per node
#SBATCH --mem=100G
## email notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --workdir= /gscratch/scrubbed/sr320/0719/
# Exit script if a command fails
# set -e

##########################
# This is a script written to assess bisulfite sequencing reads
# using Bismark. The user needs to supply the following:
# 1. A single directory location contaning BSseq reads.
# 2. BSseq reads need to be gzipped FastQ and end with .fq.gz
# 3. A bisulfite-converted genome, produced with Bowtie2.
# 4. Indicate if deduplication should be performed (whole genome sbator reduced genome sequencing)
#
# Set these values below



### USER NEEDS TO SET VARIABLES FOR THE FOLLOWING:
# Set --workdir= path in SBATCH header above.
#
# Full path to directory with sequencing reads
reads_dir="/gscratch/srlab/strigg/data/Pgenr/FASTQS"

# Full path to bisulftie-converted genome directory
genome_dir="/gscratch/srlab/sr320/data/geoduck/v074"

# Enter y (for yes) or n (for no) between the quotes.
# Yes - Whole genome bisulfite sequencing, MBD.
# No - Reduced genome bisulfite sequencing (e.g. RRBS)
deduplicate="No"

# Run Bismark on desired number of reads/pairs subset
# The default value is 0, which will run Bismark on all reads/pairs
subset="-u 0"

####################################################
# DO NOT EDIT BELOW THIS LINE
####################################################




# Evaluate user-edited variables to make sure they have been filled
[ -z ${deduplicate} ] \
&& { echo "The deduplicate variable is not defined. Please edit the SBATCH script and add y or n to deduplicate variable."; exit 1; }

[ -z ${genome_dir} ] \
&& { echo "The bisulfite genome directory path has not been set. Please edit the SBATCH script."; exit 1; }

[ -z ${reads_dir} ] \
&& { echo "The reads directory path has not been set. Please edit the SBATCH script."; exit 1; }



# Directories and programs
wd=$(pwd)
bismark_dir="/gscratch/srlab/programs/Bismark-0.21.0_dev"
bowtie2_dir="/gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/"
samtools="/gscratch/srlab/programs/samtools-1.9/samtools"
threads="28"
reads_list="input_fastqs.txt"

## Concatenated FastQ Files
R1=""
R2=""

# Initialize arrays
R1_array=()
R2_array=()

# Create list of input FastQ files for easier confirmation.
for fastq in ${reads_dir}/*.fq.gz
do
  echo ${fastq##*/} >> ${reads_list}
done

# Check for paired-end
# Capture grep output
# >0 means single-end reads
# set +e/set -e prevents error >0 from exiting script
set +e
grep "_R2_" ${reads_list}
paired=$?
set -e

# Confirm even number of FastQ files
num_files=$(wc -l < ${reads_list})
fastq_even_odd=$(echo $(( ${num_files} % 2 )) )


## Save FastQ files to arrays
R1_array=(${reads_dir}/*_R1_*.fq.gz)
## Send comma-delimited list of R1 FastQ to variable
R1=$(echo ${R1_array[@]} | tr " " ",")

# Evaluate if paired-end FastQs
# Run Bismark as paired-end/single-end based on evaluation
if [[ ${paired} -eq 0 ]]; then
  # Evaluate if FastQs have corresponding partner (i.e. R1 and R2 files)
  # Evaluated on even/odd number of files.
  if [[ ${fastq_even_odd} -ne 0 ]]; then
    { echo "Missing at least one FastQ pair from paired-end FastQ set."; \
      echo "Please verify input FastQs all have an R1 and corresponding R2 file.";
      exit 1; \
    }
  fi
  ## Save FastQ files to arrays
  R2_array=(${reads_dir}/*_R2_*.fq.gz)
  ## Send comma-delimited list of R2 FastQ to variable
  R2=$(echo ${R2_array[@]} | tr " " ",")
  # Run bismark using bisulftie-converted genome
  # Generates a set of BAM files as outputs
  # Records stderr to a file for easy viewing of Bismark summary info
  ${bismark_dir}/bismark \
  --path_to_bowtie2 ${bowtie2_dir} \
  --genome ${genome_dir} \
  --samtools_path=${samtools} \
  --non_directional \
  --score_min L,0,-0.6 \
  ${subset} \
  -p ${threads} \
  -1 ${R1} \
  -2 ${R2} \
  2> bismark_summary.txt
else
  # Run Bismark single-end
  ${bismark_dir}/bismark \
  --path_to_bowtie2 ${bowtie2_dir} \
  --genome ${genome_dir} \
  --samtools_path=${samtools} \
  --non_directional \
  ${subset} \
  -p ${threads} \
  ${R1} \
  2> bismark_summary.txt
fi



# Determine if deduplication is necessary
# Then, determine if paired-end or single-end
if [ ${deduplicate} == "y"  ]; then
  # Sort Bismark BAM files by read names instead of chromosomes
  find *.bam \
  | xargs basename -s .bam \
  | xargs -I bam_basename \
  ${samtools} sort \
  --threads ${threads} \
  -n bam_basename.bam \
  -o bam_basename.sorted.bam
  if [ ${paired} -eq 0 ]; then
    # Deduplication
    find *sorted.bam \
    | xargs basename -s .bam \
    | xargs -I bam_basename \
    ${bismark_dir}/deduplicate_bismark \
    --paired \
    --samtools_path=${samtools} \
    bam_basename.bam
  else
    find *sorted.bam \
    | xargs basename -s .bam \
    | xargs -I bam_basename \
    ${bismark_dir}/deduplicate_bismark \
    --single \
    --samtools_path=${samtools} \
    bam_basename.bam
  fi
  # Methylation extraction
  # Extracts methylation info from deduplicated BAM files produced by Bismark
  # Options to created a bedgraph file, a cytosine coverage report, counts, remove spaces from names
  # and to use the "scaffolds" setting.
  ${bismark_dir}/bismark_methylation_extractor \
  --bedGraph \
  --cytosine_report \
  --genome_folder ${genome_dir} \
  --gzip
  --counts \
  --scaffolds \
  --remove_spaces \
  --multicore ${threads} \
  --buffer_size 75% \
  --samtools_path=${samtools} \
  *deduplicated.bam
  # Sort deduplicated BAM files
  find *deduplicated.bam \
  | xargs basename -s .bam \
  | xargs -I bam_basename \
  ${samtools} sort \
  --threads ${threads} \
  bam_basename.bam \
  -o bam_basename.sorted.bam
  # Index sorted files for IGV
  # The "-@ ${threads}" below specifies number of CPU threads to use.
  find *deduplicated.sorted.bam \
  | xargs -I sorted_bam \
  ${samtools} index \
  -@ ${threads} \
  sorted_bam
else
  # Methylation extraction
  # Extracts methylation info from BAM files produced by Bismark
  # Options to created a bedgraph file, a cytosine coverage report, counts, remove spaces from names
  # and to use the "scaffolds" setting.
  ${bismark_dir}/bismark_methylation_extractor \
  --bedGraph \
  --cytosine_report \
  --genome_folder ${genome_dir} \
  --gzip \
  --counts \
  --scaffolds \
  --remove_spaces \
  --multicore ${threads} \
  --buffer_size 75% \
  --samtools_path=${samtools} \
  *.bam

  # Sort BAM files
  find *.bam \
  | xargs basename -s .bam \
  | xargs -I bam_basename \
  ${samtools} sort \
  --threads ${threads} \
  -o bam_basename.sorted.bam
  # Index sorted files for IGV
  # The "-@ ${threads}" below specifies number of CPU threads to use.
  find *sorted.bam \
  | xargs -I sorted_bam \
  ${samtools} index \
  -@ ${threads} \
  sorted_bam
fi


# Bismark processing report
# Generates HTML reports from previously created files
${bismark_dir}/bismark2report

#Bismark summary report
# Generates HTML summary reports from previously created files
${bismark_dir}/bismark2summary

#bismark, #sbatch

Laura’s Notebook: Oly OA RNA isolation – Ctenidia

July 12, 2019

Homogenized the rest of the Oly ctenidia tissues, as described in the previous post. For most samples I acheived at minimum 30 mg tissue. The scale was not very sensitive, so I’m estimating an error of +/- 20mg.

Tested the RNAzol isolation process on 6 samples, one from each cohort x treatment. Followed the RNAzol® RT RNA Isolation Reagent protocol for Total RNA isolation, using half of my homogenate, so 0500 uL.

RNAzol Steps:

  • Labeled 1.7 mL tubes with sample numbers, 2 tubes per sample
  • Added 200 uL DEPC-treated water to one set of tubes
  • Transferred 500 uL of homogenate to tubes with water, returned rest of homogenate (500 uL) to -20 freezer.
  • Vortexed homogenate + water vigorously 15 seconds
  • Held mixture for 15 minutes at room temperature
  • Centrifuged at 12,000 g (aka rcf) for 15 minutes. NOTE: for this first batch of 6 samples, I accidentally centrifuged at 1,200 for 15 minutes, and didn’t catch the error until later.
  • Transfered supernatant to new tubes (used 2nd set of labeled tubes). DNA/proteins/polysaccharides remain in the bottom of the tubes. I added the label “DNA + Pr.” to these tubes, and saved in the -20 freezer.
  • Added 0.75 mL isopropanol to precipitate RNA
  • Vortexed vigorously for 15 seconds. Held for 10 minutes at room temperature
  • Centrifuged at 12,000 rcf for 10 minutes
  • Discarded supernatant. A white RNA pellet was visible in samples (which is good)
  • Added 500 uL 75% ethanol, which I prepared using the 200 proof ethanol and DEPC-treated water.
  • Mixed by hand, made sure the pellete was in the ethanol, then centrifuged at 8,000 rcf for 3 minutes.
  • Discarded ethanol supernatant, repeated previous step, then discarded ethanol again.
  • Added 50 uL DEPC-treated water, vortexted by hand for 10 seconds.
  • Stored RNA dissolved in water in the -20 freezer for now.

July 15, 2019

Another batch of RNA isolation today. I ran 24 samples at once, which I won’t do again, as it takes too long per step. Otherwise, things went well and there were no changes to the above protocol. One note: sample 295 had no visible RNA pellet, so I don’t expect I’ll find any RNA when I quantify. This was probably due to the small size of the initial tissue piece, and loss during mortar/pestle homogenization.

July 16, 2019

Finished isolating ctenidia RNA today. Ran 2 batches of 12 samples, which allowed me to finish more quickly, and ultimately didn’t take much longer tha doing 24 in one batch. Good to know for the future. Again, no changes to protocol, but some notes:

  • The last batch accidentally sat for 25 minutes during the initial homogenate + water step (should have been 15 minutes) – timer malfunction :/
  • Dissolved RNA pellet in 150 uL DEPC treated water, not 50 uL. See notes below.

RNA Quantification, round 1

Quantifed 18 samples, 3 from each treatment x cohort. Used 5 uL (of 50 uL), and the Qubit RNA assay kit (high sensitivity). All except 1 sample (#302) read “TOO HIGH”. I chatted with Sam, and based on the size of my RNA pellets he suggested dissolving RNA in 150 uL water, not 50 uL, and using 1 uL with quantifying.

Based on the above, I added 100 uL of DEPC-treated water to all samples that I had previously processed. NOTE: this means that the 18 samples that I already quantified now have a total volume of ~145 uL, while the rest will have 150 uL, with a few exceptions:

  • I did not add more water to sample 302, as it quantified in the first round.
  • Samples 295, X and Y: only added 50 uL water, as no pellets were visible.

RNA Quantification, round 2

Quantified all ctenidia samples (except for #302) using Qubit HS RNA assay. Prepared the 53 samples, plus the two standards. Thawed samples, then transferred 1 uL into Qubit working solution (prepared using 60 uL reagent + 11,940 uL buffer). Good news is that I have enough RNA in all samples – see table in repo, also pasted below – need to dilute a few samples and re-quantify before proceeding to the QuantSeq library prep.

Cohort pCO2 HOMOGENATE TUBE # TUBE COLOR TISSUE TYPE TISSUE SAMPLE # VOL RNAzol MASS TISSUE (mg) DATE HOMOGENIZED Homogenization Batch DATE RNA ISOLATED RNA Isolation Batch Total volume, uL (RNA + H2O) [RNA] ng/uL Amount of RNA (ng) Volume needed for 500 ng RNA Notes
Dabob Bay High 291 PURPLE CTENIDIA HL10-10 1 mL 80-100 7/10/19 1 7/12/19 1 144 200 28,800 2.50
Dabob Bay High 292 PURPLE CTENIDIA HL10-11 1 mL 10-30 7/10/19 1 7/15/19 2 144 47.8 6,883 10.46
Dabob Bay High 293 PURPLE CTENIDIA HL10-12 1 mL 10-30 7/10/19 1 7/15/19 2 144 47.2 6,797 10.59
Dabob Bay High 294 PURPLE CTENIDIA HL6-10 1 mL 80-100 7/10/19 4 7/15/19 2 149 108 16,092 4.63
Dabob Bay High 295 PURPLE CTENIDIA HL6-11 1 mL 10-30 7/10/19 4 7/15/19 2 49 49.6 2,430 10.08
Dabob Bay High 296 PURPLE CTENIDIA HL6-12 1 mL 60-80 7/12/19 5 7/16/19 3 149 198 29,502 2.53
Dabob Bay High 297 PURPLE CTENIDIA HL6-13 1 mL < 10 7/12/19 5 7/16/19 3 149 4.2 626 119.05
Dabob Bay High 298 PURPLE CTENIDIA HL6-14 1 mL 80-100 7/12/19 7 7/16/19 4 149 76.4 11,384 6.54 Black substance in homogenate – maybe from tube lid?
Dabob Bay High 299 PURPLE CTENIDIA HL6-15 1 mL 10-30 7/12/19 7 7/16/19 4 149 56.8 8,463 8.80
Dabob Bay Ambient 301 RED CTENIDIA HL10-19 1 mL 10-30 7/10/19 1 7/12/19 1 144 82.6 11,894 6.05
Dabob Bay Ambient 302 RED CTENIDIA HL10-20 1 mL 10-30 7/10/19 1 7/15/19 2 45 36.8 1,656 13.59
Dabob Bay Ambient 306 RED CTENIDIA HL10-21 1 mL 50-70 7/10/19 2 7/15/19 3 144 200 28,800 2.50
Dabob Bay Ambient 304 RED CTENIDIA HL6-19 1 mL 50-70 7/10/19 1 7/15/19 2 149 TOO HIGH >28,000 #VALUE!
Dabob Bay Ambient 305 RED CTENIDIA HL6-20 1 mL 20-40 7/10/19 1 7/15/19 2 149 82 12,218 6.10
Dabob Bay Ambient 303 RED CTENIDIA HL6-21 1 mL 80-100 7/12/19 1 7/16/19 2 149 120 17,880 4.17
Dabob Bay Ambient 307 RED CTENIDIA HL6-16 1 mL 70-100 7/12/19 4 7/16/19 3 149 95.4 14,215 5.24
Dabob Bay Ambient 308 RED CTENIDIA HL6-17 1 mL 70-90 7/12/19 4 7/16/19 4 149 63.6 9,476 7.86
Dabob Bay Ambient 309 RED CTENIDIA HL6-18 1 mL 80-100 7/12/19 6 7/16/19 4 149 200 29,800 2.50
Oyster Bay High 311 ORANGE CTENIDIA SN6-16 1 mL 80-100 7/10/19 2 7/12/19 1 144 174 25,056 2.87
Oyster Bay High 312 ORANGE CTENIDIA SN6-17 1 mL 80-100 7/10/19 2 7/15/19 2 144 97.4 14,026 5.13
Oyster Bay High 313 ORANGE CTENIDIA SN6-18 1 mL Not recorded 7/10/19 3 7/15/19 2 144 76.2 10,973 6.56
Oyster Bay High 314 ORANGE CTENIDIA SN6-19 1 mL Not recorded 7/10/19 3 7/15/19 2 149 49.6 7,390 10.08
Oyster Bay High 315 ORANGE CTENIDIA SN6-20 1 mL 70-100 7/10/19 4 7/15/19 2 149 TOO HIGH >28,000 #VALUE!
Oyster Bay High 316 ORANGE CTENIDIA SN6-21 1 mL 100 + 7/12/19 4 7/16/19 3 149 TOO HIGH >28,000 #VALUE!
Oyster Bay High 317 ORANGE CTENIDIA SN6-22 1 mL 70-90 7/12/19 5 7/16/19 3 149 TOO HIGH >28,000 #VALUE!
Oyster Bay High 318 ORANGE CTENIDIA SN6-23 1 mL 60-80 7/12/19 6 7/16/19 4 149 182 27,118 2.75
Oyster Bay High 319 ORANGE CTENIDIA SN6-24 1 mL 20-40 7/12/19 7 7/16/19 4 149 61.8 9,208 8.09
Oyster Bay Ambient 321 YELLOW CTENIDIA SN6-25 1 mL 60-80 7/10/19 2 7/12/19 1 144 188 27,072 2.66
Oyster Bay Ambient 322 YELLOW CTENIDIA SN6-26 1 mL 20-40 7/10/19 2 7/15/19 2 144 63.4 9,130 7.89
Oyster Bay Ambient 323 YELLOW CTENIDIA SN6-27 1 mL 20-40 7/10/19 3 7/15/19 2 144 110 15,840 4.55
Oyster Bay Ambient 324 YELLOW CTENIDIA SN6-28 1 mL 40-60 7/10/19 3 7/15/19 2 149 164 24,436 3.05
Oyster Bay Ambient 325 YELLOW CTENIDIA SN6-29 1 mL 70-100 7/10/19 5 7/15/19 2 149 200 29,800 2.50
Oyster Bay Ambient 326 YELLOW CTENIDIA SN6-30 1 mL 70-100 7/12/19 5 7/16/19 3 149 168 25,032 2.98
Oyster Bay Ambient 327 YELLOW CTENIDIA SN6-31 1 mL 40-60 7/12/19 6 7/16/19 3 149 94.6 14,095 5.29
Oyster Bay Ambient 328 YELLOW CTENIDIA SN6-32 1 mL 50-70 7/12/19 6 7/16/19 4 149 176 26,224 2.84
Oyster Bay Ambient 329 YELLOW CTENIDIA SN6-33 1 mL 70-100 7/12/19 7 7/16/19 4 149 150 22,350 3.33
Fidalgo Bay High 331 GREEN CTENIDIA NF6-16 1 mL 40-60 7/10/19 2 7/12/19 1 144 56.4 8,122 8.87
Fidalgo Bay High 332 GREEN CTENIDIA NF6-17 1 mL 10-40 7/10/19 2 7/15/19 2 144 63.4 9,130 7.89
Fidalgo Bay High 333 GREEN CTENIDIA NF6-18 1 mL 40-60 7/10/19 3 7/15/19 2 144 82.8 11,923 6.04
Fidalgo Bay High 334 GREEN CTENIDIA NF6-19 1 mL 30-50 7/10/19 3 7/15/19 2 149 66.2 9,864 7.55
Fidalgo Bay High 335 GREEN CTENIDIA NF6-20 1 mL 90-110 7/10/19 4 7/15/19 2 149 TOO HIGH >28,000 #VALUE!
Fidalgo Bay High 336 GREEN CTENIDIA NF6-21 1 mL 90-110 7/12/19 4 7/16/19 3 149 130 19,370 3.85
Fidalgo Bay High 337 GREEN CTENIDIA NF6-22 1 mL 100 + 7/12/19 6 7/16/19 3 149 198 29,502 2.53
Fidalgo Bay High 338 GREEN CTENIDIA NF6-23 1 mL 20-40 7/12/19 6 7/16/19 4 149 83.8 12,486 5.97
Fidalgo Bay High 339 GREEN CTENIDIA NF6-24 1 mL 10-30 7/12/19 7 7/16/19 4 149 78.6 11,711 6.36
Fidalgo Bay Ambient 341 BLUE CTENIDIA NF6-25 1 mL 20-50 7/10/19 2 7/12/19 1 144 92.4 13,306 5.41
Fidalgo Bay Ambient 342 BLUE CTENIDIA NF6-26 1 mL 80-100 7/10/19 3 7/15/19 2 144 178 25,632 2.81
Fidalgo Bay Ambient 343 BLUE CTENIDIA NF6-27 1 mL 50-70 7/10/19 3 7/15/19 2 144 146 21,024 3.42
Fidalgo Bay Ambient 344 BLUE CTENIDIA NF6-28 1 mL 40-70 7/10/19 5 7/15/19 2 149 28.4 4,232 17.61
Fidalgo Bay Ambient 345 BLUE CTENIDIA NF6-29 1 mL 70-100 7/10/19 5 7/15/19 2 149 TOO HIGH >28,000 #VALUE!
Fidalgo Bay Ambient 346 BLUE CTENIDIA NF6-30 1 mL 20-40 7/12/19 5 7/16/19 3 149 74 11,026 6.76
Fidalgo Bay Ambient 347 BLUE CTENIDIA NF6-31 1 mL 10-30 7/12/19 6 7/16/19 3 149 89.8 13,380 5.57
Fidalgo Bay Ambient 348 BLUE CTENIDIA NF6-32 1 mL 50-70 7/12/19 6 7/16/19 4 149 51.4 7,659 9.73
Fidalgo Bay Ambient 349 BLUE CTENIDIA NF6-33 1 mL 10-40 7/12/19 7 7/16/19 4 149 55.4 8,255 9.03

from The Shell Game https://ift.tt/2Gj7aeJ
via IFTTT

Sam’s Notebook: Genome Annotation – O.lurida 20190709-v081 Hisat2 Transcript Isoform Index

Last week I re-annotated our Olurida_v081 genome using tissue-specific transcriptomes. The MAKER annotations don’t yield transcript isoforms, so this is the first part of the process in identifying/annotating different isoforms within the transcriptome.

Essentially, the steps below (which is what was done here) are needed to prepare files for use with Stringtie:

  1. Create GTF file (basically a GFF specifically for use with transcripts – thus the “T” in GTF) from input GFF file. Done with GFF utilities software.
  2. Identify splice sites and exons in newly-created GTF. Done with Hisat2 software.
  3. Create a Hisat2 reference index that utilizes the GTF. Done with Hisat2 software.

This was run on Mox.

The SBATCH script has a bunch of leftover extraneous steps that aren’t relevant to this step of the annotation process; specifically the FastQ manipulation steps. This is due to a copy/paste from a previous Hisat2 run that I neglected to edit out and I didn’t want to edit the script after I actually ran it, so have left it in here.

SBATCH script (GitHub):

 #!/bin/bash ## Job Name #SBATCH --job-name=oly_hisat2 ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190716_hisat2-build_20190709-olur_v081 # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log threads=27 genome_index_name="20190709-Olurida_v081" # Paths to programs gffread="/gscratch/srlab/programs/gffread-0.11.4.Linux_x86_64/gffread" hisat2_dir="/gscratch/srlab/programs/hisat2-2.1.0" hisat2_build="${hisat2_dir}/hisat2-build" hisat2_exons="${hisat2_dir}/hisat2_extract_exons.py" hisat2_splice_sites="${hisat2_dir}/hisat2_extract_splice_sites.py" # Input/output files fastq_dir="/gscratch/srlab/sam/data/O_lurida/RNAseq/" genome_dir="/gscratch/srlab/sam/data/O_lurida/genomes/Olurida_v081" genome_gff="${genome_dir}/20190709-Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff" exons="hisat2_exons.tab" genome_fasta="${genome_dir}/Olurida_v081.fa" splice_sites="hisat2_splice_sites.tab" transcripts_gtf="20190709-Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gtf" ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() # Create array of fastq R1 files for fastq in "${fastq_dir}"/*R1*.gz do fastq_array_R1+=("${fastq}") done # Create array of fastq R2 files for fastq in "${fastq_dir}"/*R2*.gz do fastq_array_R2+=("${fastq}") done # Create array of sample names ## Uses parameter substitution to strip leading path from filename ## Uses awk to parse out sample name from filename for R1_fastq in "${fastq_dir}"/*R1*.gz do names_array+=($(echo "${R1_fastq#${fastq_dir}}" | awk -F"[_.]" '{print $1 "_" $5}')) done # Create list of fastq files used in analysis ## Uses parameter substitution to strip leading path from filename for fastq in "${fastq_dir}"*.gz do echo "${fastq#${fastq_dir}}" >> fastq.list.txt done # Create transcipts GTF from genome FastA "${gffread}" -T \ "${genome_gff}" \ -o "${transcripts_gtf}" # Create Hisat2 exons tab file "${hisat2_exons}" \ "${transcripts_gtf}" \ > "${exons}" # Create Hisate2 splice sites tab file "${hisat2_splice_sites}" \ "${transcripts_gtf}" \ > "${splice_sites}" # Build Hisat2 reference index using splice sites and exons "${hisat2_build}" \ "${genome_fasta}" \ "${genome_index_name}" \ --exon "${exons}" \ --ss "${splice_sites}" \ -p "${threads}" \ 2> hisat2_build.err # Copy Hisat2 index files to my data directory rsync -av "${genome_index_name}"*.ht2 "${genome_dir}"  

Creating a plot with a gapped axis

I used the plotrix package to add an axis break to a grouped bar plot.

R Studio notes: https://rstudio-pubs-static.s3.amazonaws.com/235467_5abd31ab564a43c9ae0f18cdd07eebe7.html

Artificially grouping bars to create a grouped bar plot with an axis break: https://stackoverflow.com/questions/24202245/grouped-barplot-with-cut-y-axis

My final code: https://github.com/eimd-2019/NIX-project/blob/master/scripts/Population-versus-Year%20Graph.R

Shelly’s Notebook: Fri. Jul. 12, 2019 Salmon-Sea lice Zymo Pico Methyl prep QC and sequencing

Zymo Pico Methyl kit prep QC

Bioanalyzer

I used the Chip Priming station, Agilent chip votex, and Bioanalyzer 2100 in the Seebs lab.

PROCEDURE:

  • I ran the Bioanalyzer 2100 high sensistivity DNA assay on all the samples. To do this, I followed the quick start guide.
    • I first rinsed the electrodes with an electrode cleaning chip following protocol on page 118 of the maintenance manual
    • I did not adjust the base plate of the chip priming station since it was already set at C and it did not appear that anyone had done that before.
    • I wasted my first chip because I did not prime it correctly:
      • I got the error message listed on page 73 of the troubleshooting manual so I followed the directions on page 139 to perfom the seal test in the priming station and I realized the lock hadn’t latched with an audible click when I first did it!
      • This was for the better anyways because when I vortexed this chip at 2400rpm as the quick guide specifies, I thought I saw a little liquid spraying so I turned it down to 2200rpm NOTE FOR FUTURE: 2200rpm is enough
    • It took about 6 minutes for the software to start showing the traces from the ladder, then each subsequent sample took about 3 minutes. There were a total of 11 samples (samples 1-11) on one chip.
    • Right before running the next chip with samples 12-20
      • rinsed the electrodes again with the same cleaning chip following page 118 of the maintenance manual
      • the software froze and would only reopen in demo mode so I restarted the laptop, reopened the program, and it finally behaved. This took about 10 minutes, and the manual says to use the chip within 5 minutes. But it still ran fine judging by the ladder and how the samples looked similar to the first run.

RESULTS:

img

uc?export=view&id=1Lju2UMLOLjqbwJsVMX2zr8XuzNLo2Rd8

CONCLUSIONS:

  • Similarity between lanes 20 and 22 confirms samples 21 and 20 labels did get swapped as previously suspected. So sample #20 is Sea lice Female 1.
  • Gel images generally look in the size range of the example in the zymo pico methylseq manual so that’s encouraging.

uc?export=view&id=1y9QVMr6eXW140pNjZH2dxRqp4CZxXJsc

Qubit

I measured concentrations with Qubit HS DNA assay and recorded concentrations here under the ‘sequencing’ tab.

CONCLUSIONS:

  • The yields ranged from 25-134ng, which suggest the library prep worked, considering I only started with between 20-50ng of DNA and went through all those steps.
  • Bioanalyzer and Qubit concentrations generally agree, but in the past I’ve relied on Qubit and gel to determine the amount of DNA in the library, so I used those to calculate the nanomolar concentrations.

Pooling samples

  • I calculated nanomolar concentrations using the calculator under the tools menu in the application EnzymeX
    • you can enter in the average size(bp) of your library and 10nM (it says pmol, but it’s the same for nM if you consider the weight to be ng/uL instead of ug). This is the protocol followed by the Ecker lab.
    • Example: for a 240bp library, a 1.584ng/uL concentration is a 10nM concentration. uc?export=view&id=1X0RSaGGf0pUokXWjMZfBCn96RgSmWIsH
  • I entered these concentrations into this spreadsheet under the column ‘Conc_for_10nM(based on Qubit and gel avg)’.
  • I calculated dilutions volumes in the same spreadsheet under columns ‘vol_dna_for_10nM’ and ‘vol_water_for_10nM’ and prepared these dilutions to get every sample at 10nM.
  • Because I wanted each salmon sample to get 3.5% of reads and each sea lice sample to get 15% of reads, I made a 30uL pool by adding 1.05uL of each salmon sample and 4.5uL of each sea lice sample.

Submitting samples to NW Genomics Core

  • I submitted a signed quote for a NovaSeq SP 300 cycle flowcell (to get 1.6B 150bp PE reads) and a filled out library submission form
  • I dropped off my 30uL sample pool with Dolores at NWGC (ground floor of Genome Sciences).
    • She said the kit is coming next Tuesday and she will try to start it next week if the machine is available
    • She is going to load my library at 270pM
    • It only takes 1.5 days to run so she’s hoping to send the data by the end of the week after next.

FINGERS CROSSED!!!

from shellytrigg https://ift.tt/2XNynfB
via IFTTT

Shelly’s Notebook: Thur. Jul. 11, 2019 Salmon-Sea lice Zymo Pico Methyl prep

Zymo Pico Methyl kit prep

  • aliquoted sea lice DNA + water into the 48-well plate (wells 21 and 22) using the same quantities as posted yesterday

Following Zymo pico methyl kit protocol section 1:

  • Added 130uL of lightening conversion reagent to all wells and incubated at 98C 8 min, 54C 1 hour, hold @4C (program ZYM1 in STRIGG folder on PCR machine in 209 with heated lid)
  • followed the rest of section 1 as it’s written, except I used heated elution buffer (55C) and used a set of clean collection tubes to elute the samples in

For section 2:

  • On ice, I prepared the priming reaction mix (enough for 23 reactions, for 22 samples + 1 extra reaction for loss from pipetting) as follows:
    • 46uL PrepAmp Buffer (5x)
    • 23uL PrepAmp Primer (40uM)
    • I added 3uL to each well of the second half of the 48-well PCR plate
    • I added 7uL of purified Bisulfite-converted DNA from each sample to each well *** I accidentally added sample 21 DNA to well 20 which already had sample 20 DNA in it! So I discarded all contents from these wells
  • On ice, I prepared in 500uL tube the PrepAmp mix (enough for 22 reactions since I did not end up needing the extra reaction in the priming reaction mix):
    • 22uL PrepAmp Buffer (5x)
    • 82.5uL PrepAmp Pre-mix
    • 6.6uL PrepAmp Polymerase
  • I ran the thermocycler program specified in section 2 (program ZYM2 in STRIGG folder on PCR machine in 209 without heated lid)
    • after the first denaturation step, I added 5.05 uL of PrepAmp mix to all wells (except wells 20 and 21 which had no samples)
  • While ZYM2 was running, I prepared aliquoted samples 20 and 21 again:
    • For sample 20, I combined the size-selected gDNA digests from 7/9 (5uL of 4ng/uL = 20ng) and 7/10 (4.5uL of 3ng/ul = 13.5ng) + 10.5uL H2O for a total of 33.5ng to be on par with the other samples
    • For sample 21 (sea lice female #1): I combined 0.75uL + 19.25uL H2O (same quantities as previous
  • after the second denaturation step, I added 0.3uL of PrepAmp polymerase to all wells (except wells 20 and 21 which had no samples) *** this was quite the challenge. I tried using my P2, but I couldn’t get consistent quantities. I mainly eyeballed the volume using a P10 after first determining what 0.5uL looked like.
  • While ZYM2 continued running, I prepared bisulfite conversion reactions for samples 20 and 21 in 200uL PCR tubes:
    • 130uL of lightning conversion reagent
    • 20uL DNA
    • I ran ZYM1 on these tubes on block A of the PCR machine
  • After ZYM2 finished, I purified the PCR products following section 3, except I used heated elution buffer (55C), spun for 1 min during the wash and elution steps instead of 30 sec.
  • I transfered the purified PCR products to a new 48-well plate *** I only had ~9uL of each sample, not 11.5uL which the manual says I should have. So I added 2.5uL H2O to each well for a total sample volume of 11.5uL. I kept this plate on ice while waiting for samples 20 and 21 to catch up
  • After ZYM1 finished for samples 20 and 21, I followed the section 1 column clean up, I prepared the priming reactions in 200uL PCR tubes and rans ZYM2 on block A of the PCR machine. After the first denaturation step, I added the remaining PrepAmp mix (that I used for the other samples and had kept on ice), 5.05uL / PCR tube. During the 2nd denaturation step, I added 0.3uL PrepAmp polymerase to each tube.
  • After ZYM2 finished for samples 20 and 21, I followed the section 3 column clean up *** except I likely added sample 21 to the column labeled 20 (got distracted by Frida Taub). So the labels on these samples may be switched, but I’m keeping them as they are until I can confirm by the bioanalyzer (they should look different since one sample is undigested genomic DNA and the other sample is size selected digested genomic DNA. I transfered the purified PCR products to the plate (with 2.5uL H2O added to get a final sample volume of 11.5uL)

For section 4:

  • On ice, I prepared in a 500uL PCR tube amplification mix for 22 reactions as follows:
    • 275uL of LibraryAmp Master Mix(2x)
    • 22uL of LibraryAmp primers (10uM)
    • I added 13.5uL of amplification mix to each well containing 11.5uL of sample
  • I ran the thermocycler program specified in section 4 of the manual except I did a total 8 cycles since I had between 10-50ng starting DNA (specified in the appendix). This is the ZYM4 program on the PCR machine in room 209.
  • When ZYM4 finished, I followed the section 3 column clean up, except I used 1.7mL tubes as collection tubes for samples 11-22. *** this ended up being a mistake because in step 2 of section 3 because you don’t discard the wash buffer flow-through, there was residual wash buffer/ethanol in my elutions of these samples so I had to repeat the column clean up for these samples. I originally eluted in 14uL heated elution buffer (55C) to get about 12uL of product, however for samples 11-22 I ended up with ~20uL. Luckily, I had the columns and was able to use the same ones. In the end, I got 12uL of each sample from eluting with 14uL.

For section 5:

  • To the second half of the 48-well plate, I added:
    • 12.5uL LibraryAmp Master Mix (2x)
    • 1uL Index primers (10uM)
      • I used all 6 that were provided with the kit and used all 16 that we ordered through IDT. Samples 1-22 correspond to Index 1-22
    • 12uL of sample
  • I ran the thermocycler program specified in section 5 of the manual. This is the ZYM5 program on the PCR machine in room 209.
  • After ZYM5 finished, I followed the section 3 column clean up and had to break into the DNA clean-up and concentrator kit for more columns and buffer, but there are still plenty extra. I again did 1 minute spin time for wash and elution steps, and eluted in 14 uL of prewarmed elution buffer (55C).
  • I labeled samples ZMP Lib. 1-22 and stored in 209 -20C bottom drawer in box labeled “gDNA Salmon Skin sea lice infection (from Christian) S.Trigg”

All in all: I started around 9:30am and finished around 9:00pm. Without the mistakes/repeated steps, this would have likely taken 8 hours. Either way, awesome that it’s all done and hoping for a good QC tomorrow.

from shellytrigg https://ift.tt/2jGn36V
via IFTTT

Shelly’s Notebook: Wed. Jul. 10, 2019 Salmon-Sea lice Zymo Pico Methyl prep + RRBS digest repeat

Zymo Pico Methyl kit prep

  • set up programs on PCR machine in 209 (called Zym1-4 under STRIGG folder)
  • Into a 48-well plate, I aliquoted out 20-50ng DNA and up to 20uL of nanopure H2O according to this sheet
  • I added 50ng of sea lice DNA:
    • Sea lice Female 1 = 67.2ng/uL (qubit HS, 7/10)
      • added 0.75uL + 19.25uL nanopure H2O
    • Sea lice Female 2 = 17ng/ul (qubit HS, 7/10)
      • added 3uL + 17uL nanopure H2O

Following Zymo pico methyl kit protocol section 1:

  • Added 130uL of lightening conversion reagent to all wells and incubated at 98C 8 min, 54C 1 hour, hold @4C
  • I combined all samples + 600uL of ***DNA binding buffer into Zymo DNA concentrator columns, spun, discarded supernatent
  • added 200uL desulphonation buffer to all columns and incubated at RT 20 min.

*** I realized during the incubation that instead of DNA binding buffer, this was suppose to be M-binding buffer!!! So I called Zymo tech support and was advised to not proceed with the kit because the yield could be poor in quantity and size and Zymo couldn’t guarentee the libraries would be representative. DNA binding buffer is not optimized for single stranded DNA (which after the conversion reagent, the DNA is). The DNA binding buffer can be used at a 6-7:1 ratio with ssDNA but that I not what I did. If I had kept the flow through I could have attempted to re-bind it to the column with the m-binding buffer.

SO, I took Zymo’s advice, and ordered more lightening conversion reagent to attempt the preps tomorrow. But I needed to repeat the digests to have enough DNA to start the preps with. BUMMER! But better that than get crappy data.

Repeating RRBS

  • Followed digest and size selection plan exactly as outlined here with the following exceptions:
    • I prepared reactions in a 48-well PCR plate
      • I first prepared the MSPI master mix and added it to all wells on ice. Then added DNA and water if needed
    • I created a program on the PCR machine in 209 for the incubations (called “RRBS” under the folder STRIGG).
    • MSPI digest incubated @37C for 1 hour
    • I paused the program to add 10uL of TAQ-a1 master mix/well
    • Then resumed the program to incubate at 65C for 30 min.
  • I measured DNA concentrations (S2 read 9.98ng/uL at the beginning and then 9.88 at the end)
    • DNA concentrations are here
  • I aliquoted 22-50ng salmon DNA into a 48-well plate following the volumes listed here and froze the plate at -20C

####plan for tomorrow

  • aliquot sea lice DNA into the plate
  • attempt all of Zymo pico methyl kit prep

from shellytrigg https://ift.tt/2Y73S8I
via IFTTT