Completed GO-MWU for new analyses

Speeding Things Up As of my last post (yesterday), I was trying to figure out a way to rapidly obtain large numbers of GO IDs. Last time, I used Sam’s shell script by calling it inside this script. However, Sam’s script took an extremely long time to run. A rough ballpark: for my 2 analyses (each with ~120,000 accession IDs), it would take a total of about 8 days on my local machine. That’s a huge pipeline bottleneck, and so we found an alternative. New Script for Uniprot to GO I created an R script for obtaining GO terms from accession IDs. Downside: it requires you to manually download the SwissProt database – with all GO terms included – from https://ift.tt/3p9XlU6. Upside: it’s much, much, much faster. It ran in a few minutes, which by my calculation, is a bit faster than 8 days. After getting GO terms, you need…

from Aidan F. Coyle https://ift.tt/3cXSMd2
via IFTTT

Virginica Gonad DNA Extractions Part 14

Sending samples for sequencing

Quotes obtained? Check. PO numbers generated? Check. Sequence submission forms submitted? Check.

Then there’s nothing left to do but send out the samples! I labelled screw top tubes for all samples with enough DNA and RNA yield based on the extraction metadata. Basically, this meant I labelled DNA and RNA tubes for every sample except for sample 57. I used screw top tubes because they’re more secure and I didn’t want any samples spilling! Once tubes were labelled, I grabbed the samples from the -80 ºC and let them equilibrate to room temperature. I centrifuged each sample to ensure all the liquid could easily be pipetted, and transferred it to the appropriate labelled screw top tube. I then put that tube in a labelled ziploc bag, using separate ones for DNA and RNA. I placed the filled ziploc bags with samples at the bottom of a cooler with an insulating lining. I then filled the lining with dry ice so there was ample ice to keep the samples cold. Once the samples were packaged properly, I sent the package to ZymoResearch via FedEx Overnight!

Going forward

  1. Wait for sequencing data!

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3tJpbdf
via IFTTT

WGBS Analysis Part 13

Starting bismark with Manchester data

After evaluating trimgalore output, I decided it’s time to start [bismark](https://github.com/FelixKrueger/Bismark)! To start, I created this script. It’s the same as the one I used for the Hawaii data, but I updated the paths to all the files. I also moved the bisulfite converted genome from the Hawaii data folder to the Manchester data folder. I transferred the script to mox with rsync and it started running…

…only to fail two seconds later. Looking at the slurm output, I saw that it was unable to navigate to bisulfite converted genome:

Screen Shot 2021-02-10 at 1 00 10 PM

I navigated to the directory and saw that it was empty! The files must have been on the computer for longer than 30 days before I moved the genome folder from the Hawaii to Manchester subdirectories. I used wget (not curl because mox doesn’t like it) to download the bisulfite converted genome Sam created from this link. I then extracted the files:

tar -xvzf Crassostrea_gigas.oyster_v9.dna_sm.toplevel_bisulfite.tar.gz #Extract (x) and decompress (z) files (f) in a verbose (v) manner 

I navigated into the directory to double check that there was actually a bisulfite converted genome inside /gscratch/scrubbed/yaaminiv/Manchester/data/Crassostrea_gigas.oyster_v9.dna_sm.toplevel/. Then, I ran the script. Since I only have eight samples, I’m hoping it’ll finish processing in a week!

Going forward

  1. Update the repository README files
  2. Write methods
  3. Write results
  4. Identify DML
  5. Determine if RNA should be extracted
  6. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3jJEQVl
via IFTTT

WGBS Analysis Part 12

Reviewing trimgalore output with multiqc

Yesterday I started running [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) so I could evaluate trimgalore output. After my script finished running, I transferred the files to relevant subdirectories in this gannet folder, and moved the HTML reports to my repository and class repository. Then, I looked at the multiqc reports after the first, second, and third trims.

The main thing I wanted to check was the overrepresented sequences remaining in the analysis files, so I started by checking the summary module in each report:

Screen Shot 2021-02-08 at 2 46 25 PM

Screen Shot 2021-02-08 at 2 46 37 PM

Screen Shot 2021-02-08 at 2 46 51 PM

Figures 1-3. MultiQC status checks after the first, second, and third trims.

All samples passed the overrepresented sequences check! When I dug into the reports further, I found that some files still had adapter sequences after the second trim, but they were gone after the third trim:

Screen Shot 2021-02-10 at 10 49 18 AM

Screen Shot 2021-02-10 at 10 49 35 AM

Screen Shot 2021-02-10 at 10 49 54 AM

Figures 4-6. MultiQC overrepresented sequences for sample 7 read 1

I looked at the rest of the MultiQC modules from the third trim to see if there were any other inconsistencies between samples:

Screen Shot 2021-02-08 at 2 51 49 PM

Screen Shot 2021-02-08 at 2 52 23 PM

Figures 7-8. Modules with inconsistencies

The per sequence GC content had 2 files (sample 2 reads 1 and 2) that did not pass the test. There were no spikes that indicated a poly-G tail, and the distributions weren’t completely different from the other samples, so I’m not concerned. I also had 10 files (samples 1-4 and 6, reads 1 and 2) that didn’t pass sequence duplication levels. Again, the distributions didn’t look too different from the other samples. The one thing that does concern me is that all of these samples are from the same treatment: 3N and high pH. The only other sample in that treatment, sample 5, passed the sequence duplication test. When looking at sample methylation levels in a PCA, I’ll need to check if all six samples cluster together, or if the sequence duplication levels will affect that clustering.

Going forward

  1. Start bismark
  2. Update the repository README files
  3. Write methods
  4. Write results
  5. Identify DML
  6. Determine if RNA should be extracted
  7. Determine if larval DNA/RNA should be extracted

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/3a7U0AL
via IFTTT

TWIP Episode 02: Prepared to Argue

This week we increase in numbers and almost hear an argument. Plus – Laura’s audio is amazing!