Kaitlyn’s notebook: Geoduck RNA extraction

RNA was isolated with a Quick-DNA/RNA Microprep Plus Kit by ZymoResearch according to the manufacturer’s protocol from geoduck samples .

For hemolymph, 150ul of sample was taken and 600ul of lysis buffer was added for prep, except for sample 27 which contained 120ul of sample and had 48-ul of lysis buffer added. All centrifuge steps were done at 16,000 rpm.

The on-column DNase step was done, and the elution volume is 15ul.

Samples were quantified with the hsRNA Assay for Qubit according to manufacturer’s protocol. 1ul of sample was used and 199ul of working solution was used in each assay tube.

The samples are stored in a box in the -80C freezer in 3, 3, 2, labelled “RNA isolations; geoduck 12/17”.

Sample RNA (ng/ul)
Star 11/15 47.2
Chewy 11/15 92.6
Star 11/21 170
Chewy 11/21 180
1 12.5
2 33.8
40 low
62 4.2
58 low
47 low
53 low
50 low
19 4.6
66 4.2
22 low
29 low
32 low
59 100
63 low
27 134
64 low
Standard 1 38.07
Standard 2 395.24

Samples are located in:

  • Box 8,1,1 (2014): Geo- 27, 31, 29
  • Box 8,1,3 (2015): Geo-  50, 58, 62, 19, 64, 40, 59, 47, 53, 63, 22, 66.
20191217_082151

Box 3,1,1 on left and Box 8,1,3 on right separated by Star and Chewy samples. Samples 1 and 2 and Star and Chewy samples are in box 5,3,1.

Sample Geo-30 H did not exist, although a Geo-30 did but it looked like gonad so I did not extract from it. 20191217_082036.jpg

Notes:

Sample 53 was dark green.

 

[code]#!/bin/bash ## Job Name #SBATCH...

#!/bin/bash
## Job Name
#SBATCH --job-name=re-duck
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=30-00:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/sr320/1217/



# Directories and programs
bismark_dir="/gscratch/srlab/programs/Bismark-0.21.0"
bowtie2_dir="/gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/"
samtools="/gscratch/srlab/programs/samtools-1.9/samtools"
reads_dir="/gscratch/srlab/strigg/data/Pgenr/FASTQS/"
genome_folder="/gscratch/srlab/sr320/data/geoduck/v01/"

source /gscratch/srlab/programs/scripts/paths.sh



${bismark_dir}/bismark_genome_preparation \
--verbose \
--parallel 28 \
--path_to_aligner ${bowtie2_dir} \
${genome_folder}


find ${reads_dir}*_R1_001_val_1.fq.gz \
| xargs basename -s _R1_001_val_1.fq.gz | xargs -I{} ${bismark_dir}/bismark \
--path_to_bowtie ${bowtie2_dir} \
-genome /gscratch/srlab/sr320/data/geoduck/v01 \
-p 4 \
-score_min L,0,-0.6 \
-1 /gscratch/srlab/strigg/data/Pgenr/FASTQS/{}_R1_001_val_1.fq.gz \
-2 /gscratch/srlab/strigg/data/Pgenr/FASTQS/{}_R2_001_val_2.fq.gz \




find *.bam | \
xargs basename -s .bam | \
xargs -I{} ${bismark_dir}/deduplicate_bismark \
--bam \
--paired \
{}.bam



${bismark_dir}/bismark_methylation_extractor \
--bedGraph --counts --scaffolds \
--multicore 14 \
--buffer_size 75% \
*deduplicated.bam



# Bismark processing report

${bismark_dir}/bismark2report

#Bismark summary report

${bismark_dir}/bismark2summary



# Sort files for methylkit and IGV

find *deduplicated.bam | \
xargs basename -s .bam | \
xargs -I{} ${samtools} \
sort --threads 28 {}.bam \
-o {}.sorted.bam

# Index sorted files for IGV
# The "-@ 16" below specifies number of CPU threads to use.

find *.sorted.bam | \
xargs basename -s .sorted.bam | \
xargs -I{} ${samtools} \
index -@ 28 {}.sorted.bam





find *deduplicated.bismark.cov.gz \
| xargs basename -s _R1_001_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz \
| xargs -I{} ${bismark_dir}/coverage2cytosine \
--genome_folder ${genome_folder} \
-o {} \
--merge_CpG \
--zero_based \
{}_R1_001_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz


#bismark, #sbatch

[code]#!/bin/bash ## Job Name #SBATCH...

#!/bin/bash
## Job Name
#SBATCH --job-name=el_01
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=3-12:00:00
## Memory per node
#SBATCH --mem=100G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=sr320@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/sr320/1117c/


# Eleni 20191107
# The purpose of this script is to align fastq files to a genome, and output the # alignments as bam files, whose mapping quality is greater than 30. 
# to run this script, place in same folder as the files you want to move and write ./bowtie2_cluster.sh in terminal 

source /gscratch/srlab/programs/scripts/paths.sh



find /gscratch/scrubbed/sr320/eleni/*.fq | xargs basename -s .fq | xargs -I{} bowtie2 \
-x /gscratch/scrubbed/sr320/eleni/GCA_900700415 \
-U /gscratch/scrubbed/sr320/eleni/{}.fq \
-p 28 \
-S /gscratch/scrubbed/sr320/1117c/{}.sam



find /gscratch/scrubbed/sr320/1117/*.sam | \
xargs basename -s .sam | \
xargs -I{} /gscratch/srlab/programs/samtools-1.9/samtools \
view -b -q 30 /gscratch/scrubbed/sr320/1117c/{}.sam -o /gscratch/scrubbed/sr320/1117c/{}.bam


#for file in $files
#do
    #echo ${file} # print the filename to terminal screen
    #bowtie2 -q -x GCA_900700415 -U ${file}.fq|samtools view -b -q 30 > ${file}.bam #conduct the alignment and output the file
#done




#Explanation of terms:
#bowtie2 -q -x <bt2-idx> -U <r> -S <sam>
#-q query input files are in fastq format
#-x <bt2-idx> Indexed "reference genome" filename prefix (minus trailing .X.bt2).
#-U <r> Files with unpaired reads.

# The default of bowtie2, is to write the output of the alignment to the terminal. 
# Also, bowtie does not write BAM files directly, but SAM output can be converted to BAM on the fly by piping bowtie's output to samtools view. 
# samtools options
#  -b       output BAM
# -q <integer> : discards reads whose mapping quality is below this number


#for file in $files
#do
    #echo ${file} # print the filename to terminal screen
    #bowtie2 -q -x GCA_900700415 -U ${file}.fq|samtools view -b -q 30 > ${file}.bam #conduct the alignment and output the file
#done


#q, #u, #x, #bowtie2, #conduct, #do, #done, #echo, #explanation, #for, #sbatch

Sam’s Notebook: Data Wrangling – Olurida_v081 UTR GFFs and Intergenic, Intron BED files

After a meeting last week, we realized we needed to update the paper-oly-mbdbs-gen GitHub repo with the most current versions of feature files we had.

As part of that, we needed a new intron GFF file generated. I also realized that the output from the [MAKER annotation from 20190709] (https://ift.tt/2qXVABc) actually has 3’/5’ UTR features, so I decided to separate those out and create separate GFFs for them, as well.

The process was performed in the following Jupyter Notebook (GitHub):

One thing to note in that Jupyter Notebook. The complementBed command threw an error related to sorting. Two things with this:

  1. I don’t see an issue with the sorting.
  2. It seems to have still run just fine and generated the expected output.