Back to Square One?

My analysis outline? Incorrect? It’s more likely than you think Last time we checked in, I had finally ran GO-MWU on my four analyses, and came up with more or less nothing. However, after running GO-MWU, I realized I had made an error when determining what analyses to perform. My analyses largely involved comparing different temperature treatments (elevated vs. ambient vs. low) by pooling together samples from Day 0 and Day 2. For instance, to compare elevated and low, I pooled all elevated crab from Day 0+2, and compared gene expression against all lowered-temp crab from Day 0+2. However, as it turns out, Day 0 samples were taken prior to any exposure to different temperatures! As a result, I decided to redo my whole analysis from scratch. This wasn’t as big of a downside as you may think – I started this project back when I was a tiny little…

from Aidan F. Coyle

WGBS Analysis Part 11

trimgalore output and fastqc

Last week, I started trimgalore. My mox script finished running, so I wanted to check the output before I started bismark.


I checked the mox directories to start transferring files onto gannet. The trimming worked successfully, but there was no fastqc output! This was weird because the script I used for these samples was the same as what I used for the Hawaii samples. Confused, I started this discussion with my scripts and slurm output to determine why I didn’t get any fastqc output. Sam looked at the slurm output and saw that I had an error associated with my path:

>>> Now running FastQC on the validated data zr3616_8_R1_val_1.fq.gz<<< Can't exec "fastqc": No such file or directory at /gscratch/srlab/programs/TrimGalore-0.6.6/trim_galore line 1525, <IN2> line 5536487816. >>> Now running FastQC on the validated data zr3616_8_R2_val_2.fq.gz<<< Can't exec "fastqc": No such file or directory at /gscratch/srlab/programs/TrimGalore-0.6.6/trim_galore line 1535, <IN2> line 5536487816. Deleting both intermediate output files zr3616_8_R1_trimmed.fq.gz and zr3616_8_R2_trimmed.fq.gz 

I’m not sure why fastqc would disappear from my path after a few weeks. In any case, I used rsync to transfer all the output to this gannet folder, organized into various subfolders. Then, I followed Sam’s advice to run fastqc separately to determine if it was truly a path issue.


I created this script to run fastqc on all my trimmed samples. In the script, I specified the fastqc and multiqc paths, then used the variables throught the script:

# Paths to programs fastqc=/gscratch/srlab/programs/fastqc_v0.11.9/fastqc multiqc=/gscratch/srlab/programs/anaconda3/bin/multiqc 

To run fastqc, I first specified files to analyze by including the absolute path to the directory. I changed the directory path for each trimming iteration:

# Populate array with FastQ files fastq_array=(/gscratch/scrubbed/yaaminiv/Manchester/analyses/trimgalore/*.fq.gz) # Pass array contents to new variable fastqc_list=$(echo "${fastq_array[*]}") 

When running fastqc, I also specified the outdir so the output would be written to the same folder as the trimgalore output.

# Run FastQC # NOTE: Do NOT quote ${fastqc_list} ${fastqc} \ --threads ${threads} \ --outdir /gscratch/scrubbed/yaaminiv/Manchester/analyses/trimgalore \ ${fastqc_list} 

Finally, I created new multiqc reports:

#MultiQC ${multiqc} \ /gscratch/scrubbed/yaaminiv/Manchester/analyses/trimgalore/. 

Unfortunately I didn’t include the -outdir argument so the reports were written to the same directory as the slurm file. Next time! Once the script finished running, I moved all the fastqc and multiqc output files to gannet, included the html reports in this output subdirectory, and my class repository. Tomorrow, I’ll review the output to make sure the trimming went well.

Going forward

  1. Update the repository README files
  2. Check trimming output
  3. Start bismark
  4. Write methods
  5. Write results
  6. Identify DML
  7. Determine if RNA should be extracted
  8. Determine if larval DNA/RNA should be extracted

from the responsible grad student