Yaamini’s Notebook: DML Analysis Part 35

Finalizing C. virginica tracks and revisiting analyses

Generating genome feature tracks

Two weeks ago, I was struggling to generate a feasible intron track. At FROGER, Jon shared the code he used to generate C. virginica feature tracks to the wiki. The code he used to generate tracks was similar to my code, which was comforting. Looking at his code for intron tracks, he used a combination of complementBed and intersectBed, which is what Sam suggested in this issue. I decided to use that approach with my own tracks.

Instead of just creating an intron track, I decided to create all of the additional tracks Jon created: intergenic regions, non-coding sequences, introns, and mtDNA. Before I could create tracks, I needed to create a bedtools friendly genome file. The genome file needed to have chromosome name and lengths. I previously created this document, so I just copied it to my analysis folder (it can be found here). I also isolated just the chromosome names in this document.

To isolate intergenic regions, I first needed to sort the gene track. I used sortBed to organize the listings by chromosme name:

 !{bedtoolsDirectory}sortBed \ -faidx {chromNames} \ -i C_virginica-3.0_Gnomon_gene_yrv.gff3 \ > C_virginica-3.0_Gnomon_gene_sorted_yrv.gff3  

Jon also included a step where he used mergeBed on the sorted file to merge any transcript variants. I skipped this step for now and sorted my exon track. I used complementBed with the sorted gene track to find all possible locations of intergenic regions, then removed sorted exons with subtractBed:

 !{bedtoolsDirectory}complementBed \ -i {geneList} -sorted \ -g {chromLengths} \ | {bedtoolsDirectory}subtractBed \ -a - \ -b {exonList} \ > C_virginica-3.0_Gnomon_intergenic_yrv.gff3  

Non-coding sequences were created using the complement of exons:

 !{bedtoolsDirectory}complementBed \ -i {exonList} \ -g 2018-06-15-bedtools-Chromosome-Lengths.txt \ > C_virginica-3.0_Gnomon_noncoding_yrv.gff3  

I then used intersectBed with non-coding sequences and genes to find introns, since introns are non-coding sequences found in genes:

 !{bedtoolsDirectory}intersectBed \ -a {nonCDS} \ -b {geneList} -sorted \ > C_virginica-3.0_Gnomon_intron_yrv.gff3  

Coding sequences are those that are translated into proteins. By subtracting coding sequences from exons, I also derived untranslated regions of exons:

 !{bedtoolsDirectory}subtractBed \ -a {exonList} \ -b {CDSList} \ -sorted \ -g {chromLengths} \ > C_virginica-3.0_Gnomon_exonUTR_yrv.gff3  

Generating the mitochondrial DNA track was the easiest. The smallest chromosome, NC_007175.2, is mtDNA, so I just needed to isolate genes from that chromosome:

 !grep "NC_007175.2" ref_C_virginica-3.0_top_level.gff3 > C_virginica-3.0_Gnomon_mtDNA_yrv.gff3  

Visualizing tracks in IGV

Of course, no track generation is complete without looking at the products. I perused the tracks in this IGV session and was pretty happy with how they turned out! I am more confident in these tracks than the pre-generated ones because I understand how they were made.

Screen Shot 2019-05-28 at 6 42 36 PM

Screen Shot 2019-05-28 at 6 42 05 PM

Screen Shot 2019-05-28 at 6 41 28 PM

Figures 1-3. All tracks in IGV.

Since my tracks looked good, I posted the .gff3 and .bed verisions here.

Characterizing feature tracks

The reason I started generating new feature tracks is because Steven asked me to do so in this issue. I obtained the length of each feature track and counted overlaps with the pre-generated CG motif track.

Table 1. Length of feature tracks and number of overlaps with CG motifs. intersectBed was used to obtain CG motif overlaps with various feature tracks, defining the CG motif track as -a and the various feature tracks as -b.

Feature Number of Elements CG Motif Overlaps with Feature
Genes 38,929 7,914,842
Intergenic regions 34,557 6,545,363
mRNA 60,201 7,507,167
Exons 731,279 2,330,546
Introns 316,614 5,596,808
Coding sequences 645,335 1,728,032
Non-coding sequences 336,677 12,142,171
Untranslated regions of exons 182,752 602,551
lncRNA 4750 281,715
mtDNA 78 431

I also posted Table 1 in the issue. I noticed that there are more CG motif overlaps in the introns over exons, and non-coding sequences over coding sequences. However, genes have more overlaps than intergenic regions.

Characterizing overlaps with DML

Now that I had better feature tracks, I figured I’d recharacterize my DML overlaps. Although I have more tracks I can use, I decided to focus on intersections with genes, exons, introns, transposable elements, and putative promoters. Anything that did not overlap with these categories could be considered “other” or “intergenic.” I opened this Jupyter notebook, downloaded the new tracks from gannet, and changed the variable paths (again…so. forking. handy). I counted overlaps and wrote out files to this directory. I didn’t want to overwrite any of the files I generated with previous versions of my feature tracks, so I tagged everything with the date (2019-05-29). I also created a new directory for my revised flanking analysis, which can be found here. The previous version of my Jupyter notebook has overlap proportion calculations and extraneous chunks of code that were confusing, so I removed them from the notebook. It’s not much cleaner and easier to follow (I think).

Table 2. Overlaps between DML and various genome feature tracks.

Feature Hypermethylated DML Hypomethylated DML All DML
Genes 289 271 560
Unique Genes 269 241 481
Exons 190 178 368
Introns 99 93 192
Transposable Elements (All) 26 31 57
Transposable Elements (C. gigas only) 16 23 39
Putative promoters 44 23 67
Other 9 6 15

One good thing from Table 2: the counts for the hypermethylated and hypomethylated DML always sum to the counts for all DML. For the most part, there’s an even split between hyper- and hypomethylation. There are twice as many hypermethylated DML in putative promoters than hypomethylated DML. This makes sense since promoters have “housekeeping” functions. There are more DML found in exons than introns, which is also interesting. Exons are the sections of the genes that are most likely retained in coding sequences. There are 481 unique genes with DML, and 15 in other (potentially intragenic) regions.

Characterizing general methylation trends

Keeping the bedtools party going, I decided to recharacterize overlaps of feature tracks with methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs. Again, I generated new files and tagged them with the date (2019-05-29).

Table 3. Overlaps between various genome feature tracks and all 5x CpGs, methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs.

Feature All 5x CpGs Methylated CpGs Sparsely Methylated CpGs Unmethylated CpGs
Genes 3,255,049 2,521,653 317,249 416,147
Unique Genes 33,126 25,496 26,953 27,753
Exons 1,366,779 1,013,691 105,871 247,217
Introns 1,884,429 1,504,791 211,143 168,495
Transposable Elements (All) 1,011,883 755,222 155,293 101,368
Transposable Elements (C. gigas only) 767,604 610,208 108,858 48,538
Putative promoters 203,330 134,468 27,429 41,433
Other 579,159 349,064 81,168 148,927

The main concern Mac brought up in this issue that sent me on a saga of feature track generation was that overlap counts for methylated CpGs (methylatedCpG) were greater than those for CG motifs (totalCpG). That doesn’t seem to be the case this time!

Returning to the chi-squared test

Since I felt better about my overlap counts, I updated this document so I could them for statistical analyses. Mac said she talked to Loveday Conquest about statistical tests, and that after some discussion chi-squared tests were deemed appropriate. I’m just going to continue with the chi-squared tests in this R Markdown document and perhaps ask Katie Lotterhos if she has other ideas or knows of any post-hoc tests during our next meeting.

Table 4. Results of chi-squared tests between various CpG groupings. All CpGs refer to all CG motifs in the C. virginica genome. MBD-Enriched CpGs are those found in C. virginica samples with at least 5x sequencing depth. Of those CpG enriched by MBD treatment, methylated CpGs were those with at least 50% methylation. DML were at lest 50% different between treatment with q-values < 0.01. Comparisons were only conducted with a particular CpG grouping and it’s associated background (i.e. All CpGs vs. MBD-Enriched, MBD-Enriched vs. Methylated CpGs, and Methylated CpGs vs. DML), with N/A referring to tests that were not conducted. Values above the diagonal are χ² statistics with df = 4, and values below the diagonal are associated p-values.

CpG Category All CpGs MBD-Enriched Methylated CpGs DML
All CpGs 906940 N/A N/A
MBD-Enriched < 2.2e-16 15016 N/A
Methylated CpGs N/A < 2.2e-16 356.7
DML N/A N/A < 2.2e-16

Screen Shot 2019-05-30 at 4 18 00 PM

Screen Shot 2019-05-30 at 4 18 15 PM

Screen Shot 2019-05-30 at 4 18 48 PM

Screen Shot 2019-05-30 at 4 18 33 PM

Figures 4-7. Proportion CpGs located in various genomic features for different CpG group comparisons. The figure comparing MBD-Enriched, methylated, sparsely methylated, and unmethylated proportions may be overkill.

All comparisons were significant, indicating that the distribution of CpGs in various genomic features is likely nonrandom. For All CpGs vs. MBD-Enriched, there was a higher proportion of all CpGs located in other regions (likely intergenic), but there was a higher proportion of enriched CpGs in exons and introns. The MBD-Enriched and methylated CpG groups have similar distributions, even though the test said the distriutions were significantly different. My guess is that this was driven by the slight differences between proportion CpGs in exons, promoters, and other regions. There was a much higher proportion of DML in exons and promoters versus methylated CpGs. Higher proportions of methylated CpGs were in introns, transposable elements, and other regions. It’s interesting to see a higher proportion of DML in exons, since those are the regions that could get translated into proteins.

Going forward

  1. Figure out what’s going on with DMR
  2. Look at overlap proportions for DMR
  3. Generate figures for DMR analyses
  4. Conduct a gene enrichment for DML and DMR
  5. Work through gene-level analysis
  6. Update methods and results
  7. Update paper repository
  8. Write the discussion
  9. Write the introduction
  10. Revise my abstract
  11. Share draft paper at the next Eastern Oyster Project Meeting

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student http://bit.ly/2Xf7VMt

Grace’s Notebook: Day 1 Geoduck Stripspawn Project at Taylor Shellfish Hathcery – Quilcene, WA

Today was Day 1 of the Geoduck strip spawn experiment at Taylor Shellfish Hatchery. This wil be one of my side projects for my MS program because I realized I really wanted some hands-on, hatchery work experience! Lots and lots and lots of details (and photos) in the post.

Prep work done yesterday (Tues, May 28, 2019):

(Google doc with thoughts, materials, general plan: here)
Gathered some material from FTR:

  • P1000 pipet (has “Roberts” tape label on it)
  • P1000 tips
  • Label tape (red and blue)
  • Calipers
  • CLear plastic small ruler
  • Medium gloves
  • 10 razor blades
  • large cooler
  • squirt bottle

Made 1L of 2M KCl and DI water stock solution: Dissolved 149.11 g KCl into ~800mL DI water to stir (stir bar)
Added water up to 1L and placed in 1L glass bottle with lid

I also talked the general plan detailed in the google doc linked above with Brent.

Today – Pt Whitney:

Met Brent and Shelly at Pt Whitney at ~7:30am
Got some materials from Shelly:

  • 1 full box of biopsy needles
  • large metal spatulas (wooden handles) to open geoduck with
  • two scalpel handles and a bunch of blades

Brent looked at the geoduck and chose ones that had shells that were squatter (smaller shell length to width ratio = diploid) as those were likely diploid. Did not measure, but chose ones that were clearly not on the cusp of the cutoff of <1.62 ratio length:width.

He then biopsy punched the gonads by putting the needle into their pedal gape, feeling around for the gonad ball, then biopsy-ing a small piece.

Then, he put that small piece of gonad onto a slide, then added a small amount of seawater.

I looked at the slides under the microscope to look for sperm and eggs. Here are some very obvious females:

We identified and saved 4 males and 4 females (green rubberbands) using the biopsy punch method:

With our geoduck cooled in the cooler, and materials from Shelly, we headed over to Taylor Shellfish Farm where these geoduck were first created.

Today – Taylor Hatchery:

We discuessed the plan with Benoit and made adjustments before starting anything.

We started by delegating tasks: Brent strip spawned three females, and I set up the tripours with their KCl _mM dosages. I’ll start with some strips spawning notes, then share how I (with Benoit and Brent’s suggestions) set up the dosages in the tripours.

Female geoduck strip spawning:

Brent opened up the females (as determined by the biopsy punches at Pt Whitney)

He then, one at a time, removed the gonad ball/visceral mass and began creating cross-hatches with the razor blade in order to leach out the eggs. Have to be careful not to cut too deep and risk opening up visceral mass and contaminating the eggs with their digestive stuff.

I didn’t get a picture of this, but he used the back of a serated knife to scrape the eggs into the tripour after using the razor blade to create the cross hatches.


Next, he rinsed the eggs through a 100µm screen to catch all the tissue chunks and other junk (caught eggs on a 20µm screen). Then he rinsed them back into a tripour for hydrating.
There were a lot of eggs, so it took some time to drain the 20µm screen:


He then let those hydrate in the seawater for 30 mins before their exposures to the different doses of KCl.

However, there was one set of triplicates that were not allowed to hydrate. They were dry stripped and Brent put them directly into tripours that I made to be 50mM KCl (~800ml). They sat in the 50mM KCl for 20 minutes. He did not count the eggs – we did this later after we put the eggs back into regular seawater.

Creating the different dosages of KCl

My original calculations in the google doc were for 1L, and also did not take into account the fact that we’d be addng more seawater into the tripours with the eggs (they have to be suspended in some seawater).

After some math happened, we adjusted everything down to having the end desired volume be ~800ml including the eggs.

We had 33 tripours, and in each one we wanted to have 10,000 eggs (10 eggs/ml). Total: 330,000 eggs. Additionally, we had a small vial that was 33ml when full, and decided that that would be what we use to add the eggs to each tripour.

I made the concentrations in large batches, since each one would be separated into triplicates. I made it so that there’d be extra, with end volume of 2.875 L (to be distributed into three tripours ~767ml/tripour).

Concentration mM mL of 2M stock KCl
0 0
10 15
20 30
30 45
40 60
50 75
60 90
70 105
80 120

I made three sets of the 50mM in triplicate (one triplicate for the instant KCl exposure (blue tape), one triplicate as extras to play around with). I placed them all in the bath that they’ll be sitting in for the next couple of days in order to control their temperature. The seawater is at __ ˚C.


As soon as Brent had stripped some eggs and didn’t hydrate them, he placed some in the 50mM KCl tripours that I labeled with blue tape in order to differentiate from the others:

Tripour labeling scheme:

Each tripour has a unique number label and I wrote down what each number corresponds to in my notebook. Here’s a table to make it more clear:


Adding eggs to KCl+seawter tripours

Rinsing and placing in seawater

Stripspawning male geoducks (2)

Adding sperm to tripours

Look at some eggs under microscope: polar bodies already???!!!!

Thoughts on why the polar bodies may be there

from Grace’s Lab Notebook http://bit.ly/2VYQVsc

Yaamini’s Notebook: FROGER Recap


Last week, I hopped over to Rhode Island for the Functional Re-annotation of Oyster Genomes with Epigenetic Resources (FROGER) workshop. Our goals were to 1) use epigenetic resources to annotate the C. virginica genome and 2) compare whole genome bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS), and MBD-BS methods for a single C. virginica sample. We expected to write a methods comparison paper while we were there, but data turned out to be really poor-quality for making any useful comparisons.

To contribute to the first goal, I created a long non-coding RNA track with associated exons. I used code I’d been working with for my own analyses and created this Github-friendly Markdown document with my code. Once I created the lncRNA track, I helped the groups calculating CpGoe ratios for genes, coding sequences, exons, chromatin-associated proteins, 1000 kb windows throughout the genome. I worked with the code this Markdown document and in this gannet directory. I was able to test the different code chunks and run most of the pipeline with all five categories. At one point, I ran into an issue trying to use sed and join functions not available on OSX. Sam suggested I download Homebrew and specify gsed and gjoin. This worked!…but it also caused an issue with gawk as I was generating sample-specific CpGoe counts for genes and windows while testing the concatenation code with the CAP tracks. I wrestled with this for a full day until I realized the code still worked when I changed gawk to awk! Shelly suggested I run the concatenation code with the CAP tracks on mox, so she set that up for me. Hopefully, I can finish the rest of the concatenation on mox in before I leave for Friday Harbor.

While I was toying with CpGoe code, I was also able to participate in discussions about genome feature tracks and bismark and methylKit parameters. Jon Puritz posted his code for creating genome feature tracks here. He used complementBed and intersectBed to create an intron track, which is what Sam suggested in this issue. I’m going to use his code to finalize my C. virginica tracks. When mapping the data we did have to the C. virginica genome, Mac pointed out that the alignment stringency I set for my samples was potentially too loose. She explained that looking at CHC and CHG methylation rates was important in addition to mapping efficiency. Shelly and I tried to figure out why we both decided on L,0,-1.2 instead of L,0,-0.6 (what FROGER used for mapping), but we couldn’t remember our conversations. We’re going to bring this up at the next lab meeting.

I’ve never been a part of a working group before, so FROGER was a great experience! It was interesting to see how other people approach similar analyses and learn from a group of experts. I’m looking forward to getting more data and continuing a virtual FROGER working group to conduct the method comparisons.

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student http://bit.ly/2K93cIs

Sam’s Notebook: Data Wrangling – Create Pgenerosa_v070 GFFs

Since the MAKER genome annotation of our Pgenerosa_v070 genome assembly finally completed last week, I decided to separate out the various features into separate GFF files, as I’m sure we’ll need/want them at some point. This was all done in a Jupyter Notebook on my computer (swoose) – see notebook linked below.

Created GFF files for:

Files were rsync‘d to (each one is linked above):

Jupyter Notebook (GitHub):


Will add these files to our Genomic Resources Wiki.

from Sam’s Notebook http://bit.ly/2X3vZl6

Sam’s Notebook: Metagenomics – BLASTx of Individual Water Sample MEGAHIT Assemblies on Mox

After a meeting on this project yesterda, we decided to try a few things to continue with various approaches to assessing the metagenome. One of the approaches is to run BLASTx on the individual water sample MEGAHIT assemblies from 20190327 and obtain taxonomy info for them, so that’s what I did here.

Samples breakdown like so, for reference:

  • pH=7.1: MG3, MG6, MG7
  • pH=8.2: MG1, MG2, MG5

SBATCH script (GitHub):

  #!/bin/bash ## Job Name #SBATCH --job-name=blastx_metagenomics ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=25-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190516_metagenomics_pgen_blastx # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo "${PATH}" | tr : \\n >> system_path.log threads=28 # Paths to programs blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin" blastx="${blast_dir}/blastx" # Paths to blastdbs blastdb_dir="/gscratch/srlab/blastdbs/ncbi-sp-v5" blast_db="${blastdb_dir}/swissprot_v5" # Input files fasta_dir="/gscratch/srlab/sam/data/metagenomics/P_generosa/assemblies" ## Inititalize arrays fasta_array=() names_array=() # Export BLAST database directory export BLASTDB=${blastdb_dir} # Create array of FastA files # Create array of sample names ## Uses parameter substitution to strip leading path from filename for fasta in ${fasta_dir}/MG*.fa do fasta_array+=(${fasta}) names_array+=($(echo "${fasta#${fasta_dir}/}" | awk -F"." '{print $1}')) done # Loop through arrays to customize sample names # and run BLASTx on each FastA for index in "${!names_array[@]}" do # Loops through sample names and appends appropriate treatment to each sample name sample_name=$(echo "${names_array[index]}") if [ "${sample_name}" == "MG1" ] \ || [ "${sample_name}" == "MG2" ] \ || [ "${sample_name}" == "MG5" ] then sample_name="${sample_name}"_pH82 else sample_name="${sample_name}"_pH71 fi # Create list of input FastA files echo "${fasta}" >> input.fasta.list.txt # Run BLASTx on each FastA ${blastx} \ -query "${fasta_array[index]}" \ -db ${blast_db} \ -max_hsps 1 \ -max_target_seqs 1 \ -outfmt "6 std staxid ssciname" \ -evalue 1e-10 \ -num_threads ${threads} \ > "${sample_name}".blastx.outfmt6 done 

Kaitlyn’s notebook: Quantifying geoduck histology

These are possible measurements that can be taken on the geoduck histology slides based on tools I’ve found:


  • pixel classification: acini vs connective tissue
  • average acinus size (measure widest portion of 25 acini/individual)


  • oocyte length
    • measure only round eggs with visible nucleus
  • average follicle size (measure widest potion of 25 follicles/individual)

Unfortunately I can’t use a pixel classifier for the females because the oocytes and connective tissue stain is very similar in color and there are significant white space between the tissue and in the nucleus. 

I have been using the Padilla-Gamiño micrscope (Nikon Eclipse Ni-U with the Nikon Elements BR [basic research] program). I have been exploring the program to find features useful to our histology slides. I can get the % area using the pixel classifier.



Acini are deeply stained. Using the pixel classifier I can make these pixels red. The percent area for each phase (pixel stain) is shown to the right.


Connective tissue is shown with the green pixels.


A possible third phase in blue…

Alternatively, I can do just two phases-


this is using the manual method (the bayes method has you pick sample pixels for each phase rather than showing a graph for the wavelengths).

However, the inability to remove the phase overlapping on the image makes it very difficult to find another spot to measure on the slide, and I surmise that at least 3-4 measurements throughout the slide should be taken to accurately determine the % area of the acini.

Other possible option are thresholding in Fiji (Fiji is just ImageJ [but updated]), color deconvolution in ImageJ (but I have not been able to find a way for the program to give me % area), or pixel classifier qith Qupath (a new addition to the program that still has soem kinks and I haven’t been able to use yet).

It is easy to take measurments and images with a scale burned onto the image so measuring oocytes, collicles and acini using either this program or ImageJ is easy! I am working on females first, and during any down time, i.e., someone else needs to use the scope, I am trying to learn how to sue Fiji, Qupath, or the Element BR program. 

Sam’s Notebook: Sample Submission – Tanner Crab Infected vs Uninfected RNAseq

After reviewing our options for sample pooling, we decided to do a comparison of Infected vs. Uninfected crabs.

I pooled equal amounts of RNA from each individual in a given set (e.g. Day 9 Infected) to achieve 1000ng. Samples were concentrated using the Friedman Lab SpeedVac to target the “required” concentration specified by the sequencing facility (60ng/uL – UW’s Northwest Genomics Center). I put “required” in quotes because, it turns out, that the amount of total RNA and the concentration are not actually required! Here’s an email exchange when I asked if there was any wiggle room:

Hi Sam,

Thank you for your questions. 1000ng is our “ideal” amount with a built-in 2nd run along the way just in case. We have received some samples with their volume too low to work with (regardless of concentration), but we can take how much ever you can give us.

I also tried to get them to commit to an absolute minimum for input RNA, but the correspondent just kind of talked around the question. Regardless, I submitted the sample manifest (see below) and they accepted it with samples below the “required” concentration and minimum amount of input RNA…

Samples will be sequenced on the NovaSeq S2 200 cycle flow cell with ~50M reads per sample.

Here’s the manifest sample sheet I submitted:

UDF/Investigator Sample Name UDF/Replicate Sample Id UDF/Organism UDF/Gender UDF/Race UDF/Concentration (ng\/uL) UDF/Total Volume (uL)
D9_infected Chionoecetes bairdi 75 10
D9_uninfected Chionoecetes bairdi 81 10
D12_infected Chionoecetes bairdi 57 10
D12_uninfected Chionoecetes bairdi 74 10
D26_infected Chionoecetes bairdi 59 10
D26_uninfected Chionoecetes bairdi 70 10

from Sam’s Notebook http://bit.ly/2Jxr74S

Kaitlyn’s notebook: 2018 geoduck juveniles

Juveniles reared in heath stacks:

ID n Treatment Length Width
H0_T 7 E*A 10.78 7.36
H0_B 9 E*E 12.21 7.50
H1_T 15 E*A 9.71 6.75
H1_B 16 A*A 11.32 7.88
H2_T 14 E*A 9.77 6.84
H2_B 11 A*A 11.20 7.87
H3_T 14 E*A 11.35 7.65
H3_B 13 E*E 11.71 8.19

Checking on outplanted juveniles:

  • -0.76 low tide
    • We planted on very low point of the beach so it goes underwater very quickly. The ferries were very busy so I got out there alter than anticipated. I cut open a quarter of the tubes for thorough inspections. I checked another 50% without cutting open the tubes because the tide was coming in quickly, but it was difficult to see anything with the mesh shadows.
  • 3 zip-ties on each mesh tube were hard to cut off (return with more people!)
  • In the tubes:
    • 3-4 crabs (up to 3 inches big)
      • Many of the tubs had crabs the crabs were crawling in and out of
    • Cockles
      • Difficult to tell if holes are geoduck or cockles
        • Would recommend coming back with the intention of digging them up to be sure of geoduck survival
Tray Treatment Outplant Color # Holes Tube(s)
H0_T E*A Yellow 1 H12
H0_B E*E Grey 1 B1
H1_T E*A Pink 2 A1, B7
H1_B A*A Purple 1 A10
H2_T E*A Green 0
H2_B A*A Red 3 A3, A7, B11
H3_T E*A Dark blue 2 B12, D11
H3_B E*E Orange 1 B5



Shelly’s Notebook: Wed. May 15, 2019 Salmon-sea lice salinity x temp methylomes

gDNA quality from yesterday’s extractions

0.8% agarose 1x TAE (low EDTA) 0.5ug/mL EtBr, 90min. @ 100V


Row 1:

  1. GeneRuler high range ladder 500ng
  2. empty
  3. 16C_32psu (1) 404ng (*stabbed the side of the well so some leaked out during loading)
  4. empty
  5. 16C_32psu (2) 344ng
  6. empty
  7. 16C_32psu (3) 418ng
  8. empty
  9. 16C_32psu (4) 436ng
  10. empty
  11. 8C_32psu (5) 346ng
  12. empty
  13. 8C_32psu (6) 308ng
  14. empty
  15. 8C_32psu (7) 344ng (bubble in well made some sample leak out)
  16. empty
  17. 8C_32psu (8) 306ng
  18. empty
  19. 8C_26psu (9) 314ng
  20. GeneRuler high range ladder 500ng

Row 2:

  1. GeneRuler high range ladder 500ng
  2. 8C_26psu (10) 368ng
  3. 8C_26psu (11) 333ng
  4. 8C_26psu (12) 395ng
  5. empty
  6. 16C_26psu (13) 362ng
  7. 16C_26psu (14) 386ng
  8. 16C_26psu (15) 309ng
  9. 16C_26psu (16) 315ng
  10. empty
  11. empty
  12. empty
  13. empty
  14. empty
  15. empty
  16. CTRL_8C_26psu (17) 432ng
  17. CTRL_8C_26psu (18) 484ng
  18. CTRL_16C_26psu (19) 420ng
  19. CTRL_16C_26psu (20) 433ng
  20. GeneRuler high range ladder 500ng

After 90 min. dye front ~3/4 to the bottom of the gel. Kind of hard to tell, but you can see faint purple right above the red plastic towards the bottom. uc?export=view&id=1d-hiAAUyS_gZ0dvodDzN2k5a1WKjcl6C

Ladder: uc?export=view&id=1PQWdM1cSjCrbN5In3nFU8sKK57UrZT8N

Because I ran a 0.8% the separation of high molec. weight wasn’t really great, but it was enough to see that most of the DNA is above the 4th ladder band (15KB) which is great! uc?export=view&id=1gFfgg9meGnHmyu40kPAbWTfGz9zkU1Ox

There is definitely some degradation () but it’s not too bad. uc?export=view&id=1iN3Uf5jVd_kNk95rA7kX947RqOlBsiLx


It’s probably possible to remove small molec. weight DNA with beads, but not sure if it’s worth trying to clean these samples up and how these small fragments might affect the data.

from shellytrigg http://bit.ly/2Hk0qxX

Yaamini’s Notebook: DML Analysis Part 34

Even more troubleshooting with IGV

Gene background issues

Before I could proceed with a gene enrichment, I needed to sort through my gene background issues. The gene background consists of all loci with 5x coverage in my dataset. Previously, my gene background was “hot flaming garbage” when I looked at it in IGV. Assuming that the gene background was working correctly in methylKit, I returned to my R Markdown file to see if the gene background issues were a result of the way I exported the file.

Looking at the gene background, I noticed that there were columns with coverage metrics, as well as the number of cytosines and thymines at each locus. The start and stop position for each locus was the same, which could be affecting how the visualization occurs in IGV.

Screen Shot 2019-05-15 at 1 56 52 PM

Figure 1. Gene background information in methylKit.

I decided to subset the first four columns for exporting: chromosome (chr), start position (start), stop position (stop), and strand (strand). I did this by creating a new dataframe. I also subtracted 1 from each start position to visualize the data. Finally, I exported the gene background as a BEDfile and looked at it in IGV.

 methylationInformationFilteredCov5DestrandReduced <- data.frame("chr" = methylationInformationFilteredCov5Destrand$chr, "start" = methylationInformationFilteredCov5Destrand$start, "stop" = methylationInformationFilteredCov5Destrand$end, "strand" = methylationInformationFilteredCov5Destrand$strand) #Subset data methylationInformationFilteredCov5DestrandReduced$start <- (methylationInformationFilteredCov5DestrandReduced$start - 1) #Subtract 1 from the start position for visualization write_delim(methylationInformationFilteredCov5DestrandReduced, "2019-05-14-Methylation-Information-Filtered-Destrand-Cov5.bed", delim = "\t", col_names = FALSE) #Write out all methylation information as a background to be used for gene enrichment analyses.  

I opened this IGV session, which had my previous gene background file in it. I added the new file and compared the two versions:

Screen Shot 2019-05-14 at 2 20 13 PM

Screen Shot 2019-05-14 at 2 21 09 PM

Screen Shot 2019-05-14 at 2 22 43 PM

Figures 2-4. Gene background visualization in IGV.

Looking at different sections, I found that the gene background aligned with CG motifs and various DML instead of looking like large regions of information! I feel confident with this new version of the gene background and think I could use it for gene enrichment.

Difference between gene and mRNA tracks

During lab meeting, I pointed out that there were coding sequences included in genes that were not part of the mRNA track. These coding sequences had defined exons. Kaitlyn pointed out that the exons are the elements retained in mRNA, but the exons in coding sequence (CDS) are the parts that get translated. That clears up the question I posed in this issue.

Refining the intron track

Based on this information, I decided to use subtractBed with the gene ane exon tracks to create my intron track. When I looked through that track in IGV, I found that there were some areas where the intron track looked good. Introns from coding sequences were included (unlike the one I generated using mRNA), and the first bp in every exon sequence was no longer being included in the intron track.

Screen Shot 2019-05-15 at 3 23 05 PM

Figure 5 Example of introns generated from the gene and exon tracks that are not included in the intron track I generated from mRNA or the intron track I downloaded from the Genomic Resources wiki.

Screen Shot 2019-05-15 at 3 22 40 PM

Screen Shot 2019-05-15 at 3 21 51 PM

Figure 6-7. Instances were the intron track from the Genomic Resources wiki includes the first bp of each exon, but the intron track I generated does not.

However, there were instances of the intron track I generated still including exons:

Screen Shot 2019-05-15 at 3 28 58 PM

Screen Shot 2019-05-15 at 3 25 57 PM

Figures 8-9. Areas where the intron track still contained exons.

I posted my findings in this comment and asked if it was an artifact of enforcing same strandedness with -s, or if it was worth using subtractBed on the intron and exon tracks to get rid of the overlaps. Since exons and introns are on the same strand, I don’t think -s is the issue, but perhaps there’s a different way I need to code it.

Going forward

  1. Finalize the intron track
  2. Conduct a gene enrichment for DML
  3. Figure out what’s going on with DMR
  4. Work through gene-level analysis
  5. Update paper repository
  6. Update methods and results
  7. Start writing the discussion

// Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student http://bit.ly/2HuOOXP