Sam’s Notebook:Read Mapping – Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2

0000-0002-2747-368X

We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.

So, we wanted to get a quick idea of how well our geoduck assemblies are by performing some quick alignments using Bowtie2.

Used the following assemblies as references:

  • sn_ph_01 : SuperNova assembly of 10x Genomics data
  • sparse_03 : SparseAssembler assembly of BGI and Illumina project data
  • pga_02 : Hi-C assembly of Phase Genomics data

The analysis is documented in a Jupyter Notebook.

Jupyter Notebook (GitHub):

NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.

Here’s a brief overview of the process:

  1. Generate Bowtie2 indexes for each of the genome assemblies.
  2. Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:

Grace’s Notebook: May 8, 2018 Bioanalyzer and Pubathon

Bioanalyzer

Last week I tried the bioanalyzer a couple times on tubes numbered 274 and 401.
Things didn’t look so great.

Gel:
img

Electropherogram:
img

GitHub issue

Sam suggested I just try again and also use tubes with very high RNA concentration.

So today I tried with tubes 14 (123.0 ng/mL) and 348 (144.0 ng/mL) (FRP 6144; infected; ambient).

Looks weird again.
Gel:
img

Electropherogram:
img

Looking at it now, I think I maybe messed it up by selecting “mRNA pico” instead of “Eukaryote_total-RNA”. Will ask Sam to see what’s up.

Will re-do tomorrow.

Pub-a-thon

Points – get points by commenting and reviewing others’ papers and repos!

DIA paper – get Emma over for the next Pubathon meeting (once date and time determined) to talk about next steps and moving forward with the paper.

Crab paper – methods and introduction stuff of whatever I can add

DecaPod

Published S1 Ep7 (Crab Meeting #3)

from Grace’s Lab Notebook https://ift.tt/2Ia0jTC
via IFTTT

Yaamini’s Notebook: Gonad Methylation Analysis Part 11

I am Jerry Gergich

jerry

Today I’m really riding the struggle bus! In this issue, I found out that I forgot an important argument in my bismark alignment. I did not indicate that I had paired read data. I now have to invoke my inner Jerry Gergich and redo the work. I specified the -1 and -2 “mates” files and reran the code.

For my subset, I will also re-extract the methylated data and remake my reports. All of the file names will stay the same, so they should still be easy to find. While I wait for the alignment to finish on the subset, I’ll start working with methylKit. Switching out the correct data will be simple!

P.S. Here’s some evidence I’m visibly on that struggle bus ft. Steven’s classic sass. Is the “Yep” referring to me figuring out the problem, or acknowledging that I’m not thinking clearly? I think it’s both:

screen shot 2018-05-08 at 10 51 34 am

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2K4cH8k
via IFTTT

Yaamini’s Notebook: Gonad Methylation Analysis Part 9

The whole enchilada

In this Jupyter notebook, I ran the bismark alignment on the full range of my samples. The .txt file output can be found here. The .bam files produced were several GBs! I saved them in this OWL folder. I also uploaded all .bam output from my file subsets here.

My next step is to run these samples through methylKit in R. I’m going to test this first with the subset data I have, then start plugging along on these large files.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2rtP1DL
via IFTTT

Yaamini’s Notebook: Gonad Methylation Analysis Part 10

Visualizing bismark outputs

I started my IGV trials and tribulations yesterday, and today I’m (maybe) struggling a little less.

My goal was to use IGV to visualize 1) my alignment output and 2) bedGraphs from methylation extraction.

Alignment output

To visualize the alignment output, I first needed to sort and index the .bam output. I downloaded igvtools. In this lab notebook, I applied the sort and index command to one of my .bam files. I then uploaded this file to my IGV file.

untitled

Figure 1. IGV with .bam file.

I had to zoom to the single nucleotide level to view anything!

untitled5

Figure 2. Alignment at single nucleotide level.

I wanted to add in my other alignment files, but I didn’t want to go through the pain of writing individual lines of code to convert the files (Sam tried to help me, but right now there doesn’t seem to be a solution. Steven also mentioned that I should have used deduplicate_bismark after the alignment, which would have sorted and indexed all my files. You can see the issue here).

bedGraphs

I uploaded all of the bedGraphs from the methylation extractor (found here) into my IGV file.

untitled6

Figure 3. bedGraphs showing methylation levels for each file.

There isn’t any apparent difference between the two treatments. I looked at Steven’s lab notebook and he noticed the same thing. He also said that going to 100k made a difference. I have no idea what this means so I’ll have to ask him.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2I0Ebzh
via IFTTT

Sam’s Notebook:BS-seq Mapping – Olympia oyster bisulfite sequencing: TrimGalore > FastQC > Bismark

0000-0002-2747-368X

Steven asked me to evaluate our methylation sequencing data sets for Olympia oyster.

According to our Olympia oyster genome wiki, we have the following two sets of BS-seq data:

All computing was conducted on our Apple Xserve: roadrunner.

All steps were documented in this Jupyter Notebook (GitHub): 20180503_emu_oly_methylation_mapping.ipynb

NOTE: The Jupyter Notebook linked above is very large in size. As such it will not render on GitHub. It will need to be downloaded to a computer that can run Jupyter Notebooks and viewed that way.

Here’s a brief overview of what was done.

Samples were trimmed with TrimGalore and then evaluated with FastQC. MultiQC was used to generate a nice visual summary report of all samples.

The Olympia oyster genome assembly, pbjelly_sjw_01, was used as the reference genome and was prepared for use in Bismark:

  /home/shared/Bismark-0.19.1/bismark_genome_preparation \ --path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \ --verbose /home/sam/data/oly_methylseq/oly_genome/ \ 2> 20180507_bismark_genome_prep.err  

Bismark was run on trimmed samples with the following command:

  /home/shared/Bismark-0.19.1/bismark \ --path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \ --genome /home/sam/data/oly_methylseq/oly_genome/ \ -u 1000000 \ -p 16 \ --non_directional \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/1_ATCACG_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/2_CGATGT_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/3_TTAGGC_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/4_TGACCA_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/5_ACAGTG_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/6_GCCAAT_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/7_CAGATC_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/8_ACTTGA_L001_R1_001_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_10_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_11_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_12_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_13_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_14_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_15_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_16_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_17_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_18_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_1_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_2_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_3_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_4_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_5_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_6_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_7_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_8_s456_trimmed.fq.gz \ /home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_9_s456_trimmed.fq.gz \ 2> 20180507_bismark_02.err  

Results:

TrimGalore output folder:

FastQC output folder:

MultiQC output folder:

MultiQC Report (HTML):

Bismark genome folder: 20180503_oly_genome_pbjelly_sjw_01_bismark/

Bismark output folder:

Yaamini’s Notebook: Gonad Methylation Analysis Part 9

The whole enchilada

In this Jupyter notebook, I ran the bismark alignment on the full range of my samples. The .txt file output can be found here. The .bam files produced were several GBs! I saved them in this OWL folder. I also uploaded all .bam output from my file subsets here.

My next step is to run these samples through methylKit in R. I’m going to test this first with the subset data I have, then start plugging along on these large files.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2jFNvtH
via IFTTT

Laura’s Notebook: Where are they now?

9 months later … where are they now?

I measured four of my Olympia oyster seed batches from the 2017 experiment. This is seed I produced from broodstock that were exposed to two pH treatments (7.3, 7.8) prior to reproductive conditioning (check out this repo README). Measured thus far are 385 oysters from each of the following groups:

  1. North Sound, 6-degree low pH
  2. North Sound, 6-degree ambient pH
  3. Hood Canal, 6-degree low pH
  4. Hood Canal, 6-degree ambient pH

Takeaway – adult oysters exposed to low pH prior to reproductive conditioning produced less viable larvae (measured via survival to post-set), and carry-over effect persists to 9-month juvenile stage as size is significantly lower.

### As a reminder here’s surival for the North Sound and Hood Canal groups:

North Sound Survival Chart

Hood Canal Survival Chart

Here’s the new data, which is size (mm) at ~9 months:

2018-05-04_ns-length

2018-05-04_hc-length

Mean length and anova results:

image

from The Shell Game https://ift.tt/2rs3mAz
via IFTTT

Laura’s Notebook: May 2018 goals

1 month left in Seattle before shipping off to AU. Here are things that need to be done:

Finish DNR paper

  • Finish editing discussion & conclusion
  • With newfound knowledge make sure my stats are sound
  • Finalize map & plots
  • Move all pertinent data/scripts/plots over to new clean repo

2017 Oly paper

  • Finish measuring seed @ dock (time consuming, 2-person job) (2017 Oly project)
  • Make headway

Transgenerational OA Oly project (using 2017 seed):

  • Finish prepping small bags of seed to be deployed in eelgrass bed design
  • Figure out logistics for deploying 2017 seed in eelgrass – when to go out, where, when to retrieve – if prior to late September, need to get someone to do this for me.

2018 Oly project

  • Collect recruitment data from 2018 Oly project (this wil be completed by 5/24) – image for size if possible.
  • Build mesh bags and figure out/execute where to hang my new seed.
  • Send remaining histology cassettes off for processing – ASAP.

Miscellaneous:

  • Prep for committee meeting
  • Send Polydora paper to potential collaborators for feedback

Things that will be on the back burner until I get back, or that an undergrad could do:

2018 Oly project:

  • Analyze 2018 Oly histology slides for sex, stage, and observations
  • Image frozen larvae and measure for size; focus on those groups that I grew (48 samples)
  • Process larval samples for lipid content
  • Process gonad samples for lipid & glycogen

2017 Oly project:

  • Image frozen larvae from this experiment and use ImageJ to measure size upon release – not sure if this is worth our time, but it could be interesting …

from The Shell Game https://ift.tt/2JZfv6U
via IFTTT

Yaamini’s Notebook: Gonad Methylation Analysis Part 8

What are those?!

shuri

What did I generate using bismark alignment and bismark2report? Time to find out.

bismark2report

I thought I’d start with the HTML report since it doesn’t require any additional software finnagling. These reports are really cool! They provide basic statistics like the number of analyzed sequences, the kind of alignments, kind of cytosine methylation, and alignment to individual bisulfite strands. I think the most important information at this stage is the alignment. I summarized the alignment information from each file set:

zr2096_1_s1_R1.fastq.gz and zr2096_1_s1_R2.fastq.gz

  • Multiple alignments: 11%
  • Unique alignments: 30-32%
  • No alignment: 56-58%

zr2096_2_s1_R1.fastq.gz and zr2096_2_s1_R2.fastq.gz

  • Multiple alignments: 19%
  • Unique alignments: 54%
  • No alignment: 25%

zr2096_3_s1_R1.fastq.gz and zr2096_3_s1_R2.fastq.gz

  • Multiple alignments: 24%
  • Unique alignments: 62-64%
  • No alignment: 11-12%

zr2096_4_s1_R1.fastq.gz and zr2096_4_s1_R2.fastq.gz

  • Multiple alignments: 22-23%
  • Unique alignments: 61%
  • No alignment: 15%

zr2096_5_s1_R1.fastq.gz and zr2096_5_s1_R2.fastq.gz

  • Multiple alignments: 22%
  • Unique alignments: 61-62%
  • No alignment: 15-16%

zr2096_6_s1_R1.fastq.gz and zr2096_6_s1_R2.fastq.gz

  • Multiple alignments: 23%
  • Unique alignments: 63-64%
  • No alignment: 12-13%

zr2096_7_s1_R1.fastq.gz and zr2096_7_s1_R2.fastq.gz

  • Multiple alignments: 21%
  • Unique alignments: 63-64%
  • No alignment: 14%

zr2096_8_s1_R1.fastq.gz and zr2096_8_s1_R2.fastq.gz

  • Multiple alignments: 19-20%
  • Unique alignments: 56-57%
  • No alignment: 23%

zr2096_9_s1_R1.fastq.gz and zr2096_9_s1_R2.fastq.gz

  • Multiple alignments: 22-23%
  • Unique alignments: 61-63%
  • No alignment: 13-16%

zr2096_10_s1_R1.fastq.gz and zr2096_10_s1_R2.fastq.gz

  • Multiple alignments: 22-23%
  • Unique alignments: 63-65%
  • No alignment: 12-13%

bismark2summary

All of the above information is nicely summarized in the Bismark Project Summary Report!

untitled2

Figure 1. Collated alignment information for all sequence data. Sample 1 refers to zr2096_1_s1_R1.fastq.gz, sample 2 is zr2096_1_s1_R2.fastq.gz, …, sample 10 is zr2096_10_s1_R2.fastq.gz

untitled3

Figure 2. Percent of calls with CpG methylation for all sequence data.

bismark alignment

The Bismark User Guide suggests using a genome viewer to visualize the output SAM files. I’m a bit rusty on my IGV skills, but hopefully previous exposure in Steven’s 2016 Bioinformatics class will help!

I downloaded the latest version of the Integrative Genomics Viewer (IGV 2.4) at this website. I then uploaded the C. virginica genome I downloaded for bismark into the viewer. It wouldn’t recognize the .fa file I had originally, so I duplicated the file and changed the extension to .fasta. It allowed me to upload the genome.

untitled2

Figure 3. C. virginica genome in IGV.

Now, I needed to add my alignment files. I tried adding in one of the files, but I got the following message:

untitled3

Figure 4. Path to index file request.

I don’t think bismark generated any .bai files, so I’m not sure how to proceed. I saved my IGV file here, then posted this issue to get to the bottom of it.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2HZf1RC
via IFTTT