Differential Gene Expression – P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox

Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.

For this pipeline run, I made some modifications to the genome GFF input file used. First, I attempted to create a “gene_biotype” description for the pipeline to use to get some visualizations of read assignments to different components of the genome. I did that in the following fashion:

# Copies header to new GFF
awk 'NR < 4 {print $0}' Panopea-generosa-v1.0.gff > Panopea-generosa-v1.0_biotype.gff
# Adds "gene_biotype" to end of line that matches feature field ($3)
awk 'NR > 3 {print $0";gene_biotype="$3}' Panopea-generosa-v1.0.gff >> Panopea-generosa-v1.0_biotype.gff

Then, modified it further to convert tRNA strand to + instead of . in order to avoid RSEM errors regarding strand info and removed RNAmmer features to also avoid RSEM strand errors.

# Converts strand field ($7) to `+` instead of `.`.
# Works just on tRNA entries
awk '$2 == "GenSAS_5d82b316cd298-trnascan" {$7="+"}1' Panopea-generosa-v1.0.a4_biotype.gff > Panopea-generosa-v1.0.a4_biotype-trna_strand_converted.gff
# Prints all lines which are not rRNA
awk '$2 != "RNAmmer-1.2"' Panopea-generosa-v1.0.a4_biotype-trna_strand_converted.gff > Panopea-generosa-v1.0.a4_biotype-trna_strand_converted-no_RNAmmer.gff

Then, this was all run on Mox.

SBATCH script (GitHub):

#!/bin/bash
## Job Name
#SBATCH --job-name=20220323-pgen-nextflow_rnaseq-tissues
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=17-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20220323-pgen-nextflow_rnaseq-tissues

# Script to run Nextflow NF Core RNAseq pipeline for RNAseq analysis of P.generosa, per this GitHub Issue:
# https://github.com/RobertsLab/resources/issues/1423

# See variable assignments below for input files used: genome, GFF, transcriptome
# List of input FastQs will be generated during run in: sample_sheet-"${SLURM_JOB_ID}".csv
# Custom config file for maximum memory and CPU thread setttings

# Outputs explanations are here: https://nf-co.re/rnaseq/3.6/output
# Input paramaeter explanations are here: https://nf-co.re/rnaseq/3.6/parameters

###################################################################################

# These variables need to be set by user

## Assign Variables

## PROGRAMS ##
# NF Core RNAseq workflow directory
nf_core_rnaseq="/gscratch/srlab/programs/nf-core-rnaseq-3.6/workflow"

# NF Core RNAseq custom config file
nf_core_rnaseq_config=/gscratch/srlab/programs/nf-core-rnaseq-3.6/configs/conf/base-srlab_500GB_node.config

## FILES AND DIRECTORIES ##
# Wordking directory
wd=$(pwd)

# RNAseq FastQs directory
reads_dir=/gscratch/srlab/sam/data/P_generosa/RNAseq

# Genome FastA
genome_fasta=/gscratch/srlab/sam/data/P_generosa/genomes/Panopea-generosa-v1.0.fa

# Genome GFF3
# This was manually modified by me to add gene_biotype to end of each entry.
# Improves NF-Core RNAseq pipeline analysis/visualiztion to have this info present.
genome_gff=/gscratch/srlab/sam/data/P_generosa/genomes/Panopea-generosa-v1.0.a4_biotype-trna_strand_converted-no_RNAmmer.gff

## INITIALIZE ARRAYS ##
# Leave empty!!
R1_array=()
R2_array=()
R1_uncompressed_array=()
R2_uncompressed_array=()

###################################################################################

# Exit script if a command fails
set -e

# Load Anaconda
# Uknown why this is needed, but Anaconda will not run if this line is not included.
. "/gscratch/srlab/programs/anaconda3/etc/profile.d/conda.sh"

# Activate NF-core conda environment
conda activate nf-core_env

# Load Singularity Mox module for NF Core/Nextflow
module load singularity


# NF Core RNAseq sample sheet header
sample_sheet_header="sample,fastq_1,fastq_2,strandedness"
printf "%s\n" "${sample_sheet_header}" >> sample_sheet-"${SLURM_JOB_ID}".csv

# Create array of original uncompressed fastq R1 files
# Set filename pattern
R1_uncompressed_array=("${reads_dir}"/*_1.fastq)

# Create array of original uncompressed fastq R2 files
# Set filename pattern
R2_uncompressed_array=("${reads_dir}"/*_2.fastq)

# Check array size to confirm it has all expected samples
# Exit if mismatch
if [[ "${#R1_uncompressed_array[@]}" != "${#R2_uncompressed_array[@]}" ]]
then
  echo ""
  echo "Uncompressed array sizes don't match."
  echo "Confirm all expected FastQs are present in ${reads_dir}"
  echo ""

  exit
fi

# Create list of original uncompressed fastq files
## Uses parameter substitution to strip leading path from filename
for fastq in "${!R1_uncompressed_array[@]}"
do
  # Strip leading path
    no_path=$(echo "${R1_uncompressed_array[${fastq}]##*/}")

  # Grab SRA name
  sra=$(echo "${no_path}" | awk -F "_" '{print $1}')

  # Only gzip matching FastQs
  # Only generate MD5 checksums for matching FastQs
  if [[ "${sra}" == "SRR12218868" ]] \
    || [[ "${sra}" == "SRR12218869" ]] \
    || [[ "${sra}" == "SRR12226692" ]] \
    || [[ "${sra}" == "SRR12218870" ]] \
    || [[ "${sra}" == "SRR12226693" ]] \
    || [[ "${sra}" == "SRR12207404" ]] \
    || [[ "${sra}" == "SRR12207405" ]] \
    || [[ "${sra}" == "SRR12227930" ]] \
    || [[ "${sra}" == "SRR12207406" ]] \
    || [[ "${sra}" == "SRR12207407" ]] \
    || [[ "${sra}" == "SRR12227931" ]] \
    || [[ "${sra}" == "SRR12212519" ]] \
    || [[ "${sra}" == "SRR12227929" ]] \
    || [[ "${sra}" == "SRR8788211" ]]
  then
    echo ""
    echo "Generating MD5 checksums for ${R1_uncompressed_array[${fastq}]} and ${R2_uncompressed_array[${fastq}]}."
    echo ""
    # Generate MD5 checksums of uncompressed FastQs
    {
      md5sum "${R1_uncompressed_array[${fastq}]}"
      md5sum "${R2_uncompressed_array[${fastq}]}"
    } >> uncompressed_fastqs-"${SLURM_JOB_ID}".md5

    # Gzip FastQs; NF Core RNAseq requires gzipped FastQs as inputs
    echo "Compressing FastQ files."
    if [ ! -f "${R1_uncompressed_array[${fastq}]}.gz" ]
    then 
      gzip --keep "${R1_uncompressed_array[${fastq}]}"
      gzip --keep "${R2_uncompressed_array[${fastq}]}"
    else 
      echo "${R1_uncompressed_array[${fastq}]}.gz already exists. Skipping."
    fi
    echo ""



  fi
done


# Create array of fastq R1 files
# Set filename pattern
R1_array=("${reads_dir}"/*_1.fastq.gz)

# Create array of fastq R2 files
# Set filename pattern
R2_array=("${reads_dir}"/*_2.fastq.gz)

# Check array sizes to confirm they are same size
# Exit if mismatch
if [[ "${#R1_array[@]}" != "${#R2_array[@]}" ]]
  then
    echo ""
    echo "Read1 and Read2 compressed FastQ array sizes don't match."
    echo "Confirm all expected compressed FastQs are present in ${reads_dir}"
    echo ""

    exit
fi

# Create list of fastq files used in analysis
## Uses parameter substitution to strip leading path from filename
for fastq in "${!R1_array[@]}"
do
  echo ""
  echo "Generating MD5 checksums for ${R1_array[${fastq}]} and ${R2_array[${fastq}]}."
  echo ""
  # Generate MD5 checksums for compressed FastQs used in NF Core RNAseq analysis
  {
    md5sum "${R1_array[${fastq}]}"
    md5sum "${R2_array[${fastq}]}"
  } >> input_fastqs-"${SLURM_JOB_ID}".md5

  # Strip leading path
    no_path=$(echo "${R1_array[${fastq}]##*/}")

  # Grab SRA name
  sra=$(echo "${no_path}" | awk -F "_" '{print $1}')

  # Set tissue type
  if [[ "${sra}" == "SRR12218868" ]]
  then
    tissue="heart"

    # Add to NF Core RNAseq sample sheet
    printf "%s,%s,%s,%s\n" "${tissue}" "${R1_array[${fastq}]}" "${R2_array[${fastq}]}" "reverse" \
    >> sample_sheet-"${SLURM_JOB_ID}".csv

  elif [[ "${sra}" == "SRR12218869" ]] \
    || [[ "${sra}" == "SRR12226692" ]]

  then
    tissue="gonad"

    # Add to NF Core RNAseq sample sheet
    printf "%s,%s,%s,%s\n" "${tissue}" "${R1_array[${fastq}]}" "${R2_array[${fastq}]}" "reverse" \
    >> sample_sheet-"${SLURM_JOB_ID}".csv

  elif [[ "${sra}" == "SRR12218870" ]] \
    || [[ "${sra}" == "SRR12226693" ]]
  then
    tissue="ctenidia"

    # Add to NF Core RNAseq sample sheet
    printf "%s,%s,%s,%s\n" "${tissue}" "${R1_array[${fastq}]}" "${R2_array[${fastq}]}" "reverse" \
    >> sample_sheet-"${SLURM_JOB_ID}".csv

  elif [[ "${sra}" == "SRR12207404" ]] \
    || [[ "${sra}" == "SRR12207405" ]] \
    || [[ "${sra}" == "SRR12227930" ]] \
    || [[ "${sra}" == "SRR12207406" ]] \
    || [[ "${sra}" == "SRR12207407" ]] \
    || [[ "${sra}" == "SRR12227931" ]]
  then
    tissue="juvenile"

    # Add to NF Core RNAseq sample sheet
    printf "%s,%s,%s,%s\n" "${tissue}" "${R1_array[${fastq}]}" "${R2_array[${fastq}]}" "reverse" \
    >> sample_sheet-"${SLURM_JOB_ID}".csv

  elif [[ "${sra}" == "SRR12212519" ]] \
    || [[ "${sra}" == "SRR12227929" ]] \
    || [[ "${sra}" == "SRR8788211" ]]
  then
    tissue="larvae"

    # Add to NF Core RNAseq sample sheet
    printf "%s,%s,%s,%s\n" "${tissue}" "${R1_array[${fastq}]}" "${R2_array[${fastq}]}" "reverse" \
    >> sample_sheet-"${SLURM_JOB_ID}".csv
  fi

done

echo "Beginning NF-Core RNAseq pipeline..."
echo ""
# Run NF Core RNAseq workflow
nextflow run ${nf_core_rnaseq} \
-profile singularity \
-c ${nf_core_rnaseq_config} \
--input sample_sheet-"${SLURM_JOB_ID}".csv \
--outdir ${wd} \
--multiqc_title "20220321-pgen-nextflow_rnaseq-tissues-${SLURM_JOB_ID}" \
--fasta ${genome_fasta} \
--gff ${genome_gff} \
--save_reference \
--gtf_extra_attributes gene_name \
--gtf_group_features gene_id \
--featurecounts_group_type gene_biotype \
--featurecounts_feature_type exon \
--trim_nextseq 20 \
--save_trimmed \
--aligner star_salmon \
--pseudo_aligner salmon \
--min_mapped_reads 5 \
--save_align_intermeds \
--rseqc_modules bam_stat,\
inner_distance,\
infer_experiment,\
junction_annotation,\
junction_saturation,\
read_distribution,\
read_duplication

##############################################################
# Copy config file for later reference, if needed
cp "${nf_core_rnaseq_config}" .

# Document programs in PATH (primarily for program version ID)
{
  date
  echo ""
  echo "System PATH for $SLURM_JOB_ID"
  echo ""
  printf "%0.s-" {1..10}
  echo "${PATH}" | tr : \\n
} >> system_path.log

echo "Finished logging system PATH"

RESULTS

Runtime was surprisingly fast, at just a bit over 2.3 days…

Screencap of NF-Core RNAseq runtime on Mox showing runtime of 2 days, 9hrs and some change

There is a ton of data here to unpack, so I’ll just link to some of the most useful files.

Refer to the NF-Core/RNAseq “Output docs” for all the things generated, as well as a thorough explanation of the MultiQC Report:

Also, the NF-Core/RNAseq pipeline provides a nice progress report to show you which options are running/completed. This screenshot is from after the pipeline finished successfully:

Screencap of NF-Core/RNAseq pipeline upon completion. Shows percentages and checkboxes to indicate each process completion. Also provides a list of samples passing STAR mapping threshold and pipeline runtime.

Output folder:

from Sam’s Notebook https://ift.tt/7YAu6j0
via IFTTT

#ifttt, #sams-notebook

Nextflow – Trials and Tribulations of Installing and Using NF-Core RNAseq

INSTALLATION

For some reason, I struggled to get things installed correctly. Installing Nextflow was straightforward and didn’t have any issues. First bump in the road came from the [installation directions(https://ift.tt/YwEzc3j] for NF-Core RNAseq. The instructions say “Download the pipeline and test it..”. Well, that doesn’t indicate how/where to download. To add to the confusion (will be evident how in a bit), the previous step says to “only use Conda as a last resort”. Okay, so how do I get the NF-Core RNAseq? I downloaded the latest release from the GitHub Release page. After doing that, I couldn’t figure out how to just get a basic command (e.g. nf-core download) to run. I managed to discover that I likely needed to install nf-core Tools. However, doing that meant that I would have to use either Conda or Docker. Docker seemed overly complicated (didn’t want to have to deal with re-learning container/image usage and within the confines of a SLURM environment like Mox – this whole thing was supposed to be quick and easy!) and using Conda was strongly discouraged for the NF-Core RNAseq. For “simplicity” I went the Conda route. This is where everything went awry…

The pipeline is designed to have internet access to update and download newer config files automatically. Knowing that running SLURM job wouldn’t have internet access, I followed the instructions for downloading a pipeline for offlne use. Running the nf-core download command initiated an interactive mode to specify things, like which version to download, whether or not to provide a directory for $NXF_SINGULARITY_CACHEDIR, and whether or not to compress the Singularity container that would be downloaded. Setting $NXF_SINGULARITY_CACHEDIR seemed to cause issues. Additionally, the Singulairity download was surprisingly fast.

As it all turns out, I think the issue was that I had downloaded NF-Core RNAseq from the GitHub release and installed via Conda. Essentially, Conda creates the same directory structure in the same location as the NF-Core RNAseq install from the GitHub release. And, this fact, I think broke everything.

To fix the installation, I uninstalled all things related to NF-Core RNAseq, but I left the Nextflow installation. Then, I installed nf-core Tools (and, subsequently the RNAseq package) using only the Conda installation. Once I did that (and activated the nf-core/rnaseq conda environment) and ran the nf-core download, I noticed that the Singularity download took significantly longer than previous attempts, which led me to think that things were working, finally.

RUNNING

After finally getting things installed properly, I encountered a number of problems just trying to get the pipeline to run successfully. There were a number of small issues that lead to a lot of troubleshooting:

Error:

Transcript names not matching GFF.

Solution:

If supplying a transcriptome, the names in the FastA description lines have to match those in the GFF. Since we de novo assembled our transcriptome using Trinity, it has Trinity IDs which do not correspond to the genome assembly/annotation. So, I dropped the transcriptome option for the run.

Error:

Unable to parse config file: '/gscratch/srlab/programs/nf-core-rnaseq-3.6/workflow/nextflow.config'

  Cannot compare java.lang.Boolean with value 'true' and java.lang.Integer with value '0'

Solution:

--trim_nextseq option actually requires an integer. Documentation didn’t indicate such at the time, but may now though as I brought this to the attention of the developers via their Slack group and the discussion indicated they should add this info to the websit.

Error:

Unable to parse config file: '/gscratch/srlab/programs/nf-core-rnaseq-3.6/workflow/nextflow.config'

Solution:

When using a custom config file, one cannot use the check_max(). I had just been modifying the values for RAM, CPUs, run times that were in the base.config file. The base.confi file utilizes the check_max() function. After a bunch of sleuthing on the NF-Core RNAseq GitHub and Slack groups, I finally discovered this solution. Here’s what the custom, functional config looks like:

  • /gscratch/srlab/programs/nf-core-rnaseq-3.6/configs/conf/base-srlab_500GB_node.config
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    nf-core/rnaseq Nextflow base config file modified by Sam White on 20220321 for use
    on Roberts Lab 500GB Mox node.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    A 'blank slate' config file, appropriate for general use on most high performance
    compute environments. Assumes that all software is installed and available on
    the PATH. Runs in `local` mode - all jobs will be run on the logged in environment.
----------------------------------------------------------------------------------------
*/

process {

    cpus   = 28
    memory = 500.GB
    time   = 30.d

    errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
    maxRetries    = 1
    maxErrors     = '-1'

    // Process-specific resource requirements
    withLabel:process_low {
        cpus   = 28
        memory = 500.GB
        time   = 30.d
    }
    withLabel:process_medium {
        cpus   = 28
        memory = 500.GB
        time   = 30.d
    }
    withLabel:process_high {
        cpus   = 28
        memory = 500.GB
        time   = 30.d
    }
    withLabel:process_long {
        time   = 30.d
    }
    withLabel:process_high_memory {
        memory = 500.GB
    }
    withLabel:error_ignore {
        errorStrategy = 'ignore'
    }
    withLabel:error_retry {
        errorStrategy = 'retry'
        maxRetries    = 2
    }
    withName:CUSTOM_DUMPSOFTWAREVERSIONS {
        cache = false
    }
}

Error:

Error Message: Strand is neither '+' nor '-'!

ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/Panopea-generosa-v1.0.fa)'
Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/Panopea-generosa-v1.0.fa)` terminated with an error exit status (255)
Command executed:
  rsem-prepare-reference \
      --gtf Panopea-generosa-v1.0_genes.gtf \
      --num-threads 28 \
       \
      rsem/Panopea-generosa-v1.0.fa \
      rsem/genome
  
  cp rsem/genome.transcripts.fa .
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA":
      rsem: $(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
      star: $(STAR --version | sed -e "s/STAR_//g")
  END_VERSIONS
Command exit status:
  255
Command output:
  rsem-extract-reference-transcripts rsem/genome 0 Panopea-generosa-v1.0_genes.gtf None 0 rsem/Panopea-generosa-v1.0.fa
  "rsem-extract-reference-transcripts rsem/genome 0 Panopea-generosa-v1.0_genes.gtf None 0 rsem/Panopea-generosa-v1.0.fa" failed! Plase check if you provide correct parameters/options for the pipeline!
Command error:
  The GTF file might be corrupted!
  Stop at line : Scaffold_02    GenSAS_5d82b316cd298-trnascan   transcript      27467   27541   38.1    .       .       transcript_id "21513.GS22252506.PGEN_.tRNA00000001"; gene_id "21513.GS22252506.PGEN_.tRNA00000001"; Name "Thr"; anti_codon "CGT"; gene_no "1376";

Solution:

As it turns out, this wasn’t an issue with the pipeline, it’s really an issue with how RSEM handles GTF strand parsing. Even though the GFF spec indicates that strand column can be one of +, -, ., or ?, RSEM only parses for + or -. And, as it turns out, our genome GFF has some . for strand info. Looking through our “merged” GenSAS GFF, it turns out there are two sets of annotations that only have . for strand info (GenSAS_5d82b316cd298-trnascan & RNAmmer-1.2). So, the decision needed to be made if we should convert these sets strands to an “artificial” value (e.g. set all of them to +), or eliminate them from the input GFF. I ended up converting GenSAS_5d82b316cd298-trnascan strand to + and eliminated RNAmmer-1.2 from the final input GFF.

Overall, it was a bit of an arduos process, but it’s all running now… Will update if I encounter any more hurdles.

from Sam’s Notebook https://ift.tt/d8t17hW
via IFTTT

#ifttt, #sams-notebook

Data Wrangling – P.generosa Genomic Feature FastA Creation

Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta. I took advantage of being able to create a custom name column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.

This was all documented in a Jupyter Notebook:

GitHub:

NB Viewer:


RESULTS

Output folder:


CDS FastA description lines look like this:

  • >PGEN_.00g000010.m01.CDS01|PGEN_.00g000010.m01|PGEN_.00g000010::Scaffold_01:2-125

Explanation for CDS:

  • PGEN_.00g000010.m01.CDS01: Unique sequence ID.
  • PGEN_.00g000010.m01: “Parent” ID. Corresponds to unique mRNA ID.
  • PGEN_.00g000010: “Parent” ID. Corresponds to unique gene ID.
  • Scaffold_01: Originating scaffold.
  • 2-125: Sequence coordinates from scaffold mentioned above.

mRNA FastA description looks like this:

  • PGEN_.00g000030.m01|PGEN_.00g000030::Scaffold_01:49248-52578

Explanation for mRNA:

  • PGEN_.00g000030.m01: Unique sequence ID.
  • PGEN_.00g000030: “Parent” ID. Corresponds to unique gene ID.
  • Scaffold_01: Originating scaffold.
  • 49248-52578: Sequence coordinates from scaffold mentioned above.

from Sam’s Notebook https://ift.tt/8IljR1e
via IFTTT

#ifttt, #sams-notebook

Fri. Feb. 18, 2022

NOPP-gigas-ploidy-temp analysis – Part 1 –
Project name: NOPP-gigas-ploidy-temp Funding source: National Oceanographic Partnership Program Species: Crassostrea gigas variable: ploidy, elevated seawater temperature, desiccation previous notebook entry next notebook entry Background We sent 72 samples for RNA sequencing (TagSeq) to the Genomic Sequencing and Analysis Facility at the University of Texas at Austin. Here is a…

from Matthew N. George, Ph.D. https://ift.tt/CqRAV3Z
via IFTTT

#ifttt, #matthew-n-george, #ph-d

Fri. Mar. 19, 2022

Honeywell UDA2182 controller setup –
Background Pictures of a working setup for the Honeywell UDA2182 dual analyzer that is controlling to a pH set point after a durafet probe and CO2/solenoids are installed. CO2 is injected into a header tank using a ozone injector. Pictures UDA2182 home screen Relay – connected to solenoid that is…

from Matthew N. George, Ph.D. https://ift.tt/rLdGEl5
via IFTTT

#ifttt, #matthew-n-george, #ph-d

Thu. Mar. 17, 2022

Oyster family transfer –
Background Picked up four C. gigas families (2n/3n within each family) from Pacific Hybreed in Manchester. Parents came from Hood Canal, Baywater. I dropped them off in heath trays at Jamestown S’Klallam’s hatchery at Pt. Whitney. Pictures are provided below. Pictures Health trays w/ labels Tray 1 – F5 (2n)…

from Matthew N. George, Ph.D. https://ift.tt/z8VD3wS
via IFTTT

#ifttt, #matthew-n-george, #ph-d

DNA Extraction Results

Last time, I went over my plan to extract DNA from 4 random samples from SE AK Tanner crab, collected in 2005. This year was chosen because it’s the earliest samples we have – therefore, if they’ve got good DNA, all should have good DNA. Plan was to pellet the hemolymph and then treat as a tissue sample. Samples were randomly chosen and numbered 1-4. A Qiagen DNEasy Blood + Tissue kit was used, followed by running them on the Qubit. Important notes on use of Qiagen kit: Incubated for ~40 min with Protease K (until all was lysed) In the final elution step, used only 50 uL rather than 200 uL, as Shanelle mentioned that she had success with previous Tanner crab samples following that method Samples are currently stored in -20 (plate labeled with my name and date, samples labeled 1, 2, and 4, as 3 was discarded)…

from Aidan F. Coyle https://ift.tt/rTxSfpy
via IFTTT

Extracting DNA from Archived Samples

Well, it’s been quite a while! The good news: it’s because I finally finished my analysis and have been focusing practically entirely on the writeup + future grant applications (which I, maybe wrongly, didn’t feel like was worthy of lab notebook writeups). Anyway, I’m starting to figure out the feasibility of a large-scale Chionoecetes and Hematodinium sequencing project, using Pam’s old samples to track genetic changes over space and time. It seems like it could be extremely neat, and I’ll be writing things up in more detail later. Of course, in order for any population genetics work to succeed, we need to successfully extract DNA from Pam’s samples! This might pose a challenge – they’re hemolymph samples stored in ethanol in capped deep-well plates, and many have been around since 2005. Some could be dried out or just unviable. So…Task 1 is Extract DNA from old samples! I took 4…

from Aidan F. Coyle https://ift.tt/fKJrbyM
via IFTTT

Extracting DNA from Archived Samples

Well, it’s been quite a while! The good news: it’s because I finally finished my analysis and have been focusing practically entirely on the writeup + future grant applications (which I, maybe wrongly, didn’t feel like was worthy of lab notebook writeups). Anyway, I’m starting to figure out the feasibility of a large-scale Chionoecetes and Hematodinium sequencing project, using Pam’s old samples to track genetic changes over space and time. It seems like it could be extremely neat, and I’ll be writing things up in more detail later. Of course, in order for any population genetics work to succeed, we need to successfully extract DNA from Pam’s samples! This might pose a challenge – they’re hemolymph samples stored in ethanol in capped deep-well plates, and many have been around since 2005. Some could be dried out or just unviable. So…Task 1 is Extract DNA from old samples! I took 4…

from Aidan F. Coyle https://ift.tt/4q6I03K
via IFTTT

Sun. Mar. 13, 2022

Monday Meeting Notes –
9am – FISH 441 meeting JupyterHub Link to course site Publish course to get syllabus out there Update all dates to spring Outline activities by week 11am – Journal club 1. 2. 3. 2pm – OSU/WA meeting Submit WRAC pre-proposal. Here is the RFP. Due April 15th. Submit USDA-NIFA proposal….

from Matthew N. George, Ph.D. https://ift.tt/DozRtjc
via IFTTT

#ifttt, #matthew-n-george, #ph-d