WGBS Analysis Part 12

Reviewing trimgalore output with multiqc

Yesterday I started running [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) so I could evaluate trimgalore output. After my script finished running, I transferred the files to relevant subdirectories in this gannet folder, and moved the HTML reports to my repository and class repository. Then, I looked at the multiqc reports after the first, second, and third trims.

The main thing I wanted to check was the overrepresented sequences remaining in the analysis files, so I started by checking the summary module in each report:

Screen Shot 2021-02-08 at 2 46 25 PM

Screen Shot 2021-02-08 at 2 46 37 PM

Screen Shot 2021-02-08 at 2 46 51 PM

Figures 1-3. MultiQC status checks after the first, second, and third trims.

All samples passed the overrepresented sequences check! When I dug into the reports further, I found that some files still had adapter sequences after the second trim, but they were gone after the third trim:

Screen Shot 2021-02-10 at 10 49 18 AM

Screen Shot 2021-02-10 at 10 49 35 AM

Screen Shot 2021-02-10 at 10 49 54 AM

Figures 4-6. MultiQC overrepresented sequences for sample 7 read 1

I looked at the rest of the MultiQC modules from the third trim to see if there were any other inconsistencies between samples:

Screen Shot 2021-02-08 at 2 51 49 PM

Screen Shot 2021-02-08 at 2 52 23 PM

Figures 7-8. Modules with inconsistencies

The per sequence GC content had 2 files (sample 2 reads 1 and 2) that did not pass the test. There were no spikes that indicated a poly-G tail, and the distributions weren’t completely different from the other samples, so I’m not concerned. I also had 10 files (samples 1-4 and 6, reads 1 and 2) that didn’t pass sequence duplication levels. Again, the distributions didn’t look too different from the other samples. The one thing that does concern me is that all of these samples are from the same treatment: 3N and high pH. The only other sample in that treatment, sample 5, passed the sequence duplication test. When looking at sample methylation levels in a PCA, I’ll need to check if all six samples cluster together, or if the sequence duplication levels will affect that clustering.

Going forward

  1. Start bismark
  2. Update the repository README files
  3. Write methods
  4. Write results
  5. Identify DML
  6. Determine if RNA should be extracted
  7. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3a7U0AL
via IFTTT