Yaamini’s Notebook: Final Virginica Tasks

Tying up loose ends for the C. virginica gonad paper

Although I submitted the draft manuscript last month, I told myself I would finish up some small tasks for the C. virginica gonad methylation paper once I got back from vacation. I took care of those tasks over the past week!

The first thing I did was uploaded our submitted manuscript to bioRXiv. The paper, found here, is now a citable product! I also copied the entire repository into this gannet folder. The only other thing I wanted to do was clean up the paper repository. I thought it would just involve adding details to each directory README.md, but was I wrong. Converting a file dump to a usable repository — and chunking out small tasks to complete — took me about a week. It probably would have taken me less time if the work didn’t put me to sleep :grimacing:

Steven posted issues for me to work on, so I started with those issues. I started by updating the genome feature track README.md. Before I could update the README, I wanted to make sure all of the genome feature tracks were represented. I cross-referenced the repository with my large file folder on gannet. I moved over GFFs for each feature track to the repository, and updated the path to this directory in this Jupyter notebook. When updating the README, I included links to each feature track’s GFF and the code used to generate that track.

Next I worked on adding description to the code README.md (also referenced here). Before I updated the code descriptions, I thought I’d check and see if all the code made sense. For all R Markdown files and Jupyter notebooks, I changed file paths to reflect the paper repository’s structure. I also removed lots of redundant code, including those for generating DMR and downloading genome feature tracks. Since the paper repository has a separate folder for genome feature tracks, I didn’t need to keep downloading them. Instead, I made sure the paths for the necessary genome feature tracks were updated to a relative path within the repository. I updated all header information and filenames for code so they accurately reflected what each script did. It’s a good thing I did too, because some descriptions were confusing and filenames were completley incorrect. For example, one chunk of code said it was creating a “scaled DML distribution” instead of a “scaled methylated CpG distribution,” causing me to have a mini freak-out because I thought I used the incorrect figure in my final paper (don’t worry I did not use the wrong figure). I thought about the order the files would be used and changed it a few times before finalizing filenames and the README.

Even though it wasn’t an explicit ask, I thought I would ensure the data README.md was also clear. Locally, I updated links and made sure the file structure made sense. When I tried to commit my changes, I realized that the entire data folder got added to the .gitignore! I’ve igored previously committed files before, and I thought that fixing this would be similar. I couldn’t find a workaround on Google, so I simply deleted my .gitignore, pushed the data subdirectory to my cloud repository, and added the large files back into the .gitignore. An annoying, but easy, fix.

The last major task I had involved cleaning the analyses subdirectory. Steven noted that all of the files in the repository weren’t needed to reproduce the analyses in the paper. Going through each subdirectory within analyses, I removed all files from defunct analyses. These files included anything with DMR, old versions of DML overlaps with various genome feature tracks, and anything that used the C. gigas transposable element track. I had to go back-and-forth between my lab notebook, the paper repository, and my old repository a few times to make sure I didn’t remove files I needed and to add files I missed in my initial file dump. Once I felt like I removed most unecessary files, I started updating the analyses README.md. For each analysis subdirectory, I indicated which scripts the files came from, and highlighted important files with descriptions. As I went through each folder, I found even more redundant files that I no longer needed. For example, there were several GOterm annotation files that I never ended up using once I annotated GO-MWU output with GOSlim terms. I removed these files, as well as the lines of code used to generate them. Additionally, I still had code annotating differentially methylated genes with uncorrected p-values. Since we decided to only use results with corrected p-values, I didn’t need the annotation files or associated code. I promptly removed these files and the code. Once I did this, I realized the only folder that used blastx to GOSlim output from this code was the gene enrichment analysis folder. I moved these files from the differentially methylated gene folder to the gene enrichment folder, and changed all necessary file paths in other scripts.

While updating the analyses README.md, I realized I needed to update all file links in my IGV sessions. I first opened the genome feature track IGV session. I was able to change the path to the genome on gannet in TextWrangler, but I couldn’t change paths in TextWrangler without breaking the entire session. I painstakingly deleted and added new file URLs (there has to be a better way…) for the genome feature tracks on gannet. I wasn’t able to display gffs for the intergenic, intron, or noncoding feature tracks, so I used BEDfiles for those. Since the BEDfiles weren’t in my repository to begin with, I added them to the repository and to gannet. I also added the CG motif track to gannet and included a link in the genome feature track README.md. For the DML verification IGV session, I used a path to the DML list within the repository. I also added the 5x sample bedgraphs to the repository and included repository paths for those files. The only files I had to use gannet links for were the C. virginica genome and CG motif track. I updated the analyses and genome feature track README.md files with information about the file links used for the IGV sessions.

After checking all the README.md files and repository one more time, I copied the entire local repository to gannet. That’s the last of my tasks!

Going forward

…just wait for reviewer comments I guess.

Please enable JavaScript to view the comments powered by Disqus.

from the responsible grad student https://ift.tt/2RisNBJ

Sam’s Notebook: Data Wrangling – Arthropoda and Alveolata Taxonomic RNAseq FastQ Extractions

After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200114, I realized that the FastA headers were incomplete and did not distinguish between paired reads. Here’s an example:

R1 FastQ header:

@A00147:37:HG2WLDMXX:1:1101:5303:1000 1:N:0:AGGCGAAG+AGGCGAAG

R2 FastQ header:

@A00147:37:HG2WLDMXX:1:1101:5303:1000 2:N:0:AGGCGAAG+AGGCGAAG

However, the reads extracted via MEGAN have FastA headers like this:

>A00147:37:HG2WLDMXX:1:1101:5303:1000 SEQUENCE1 >A00147:37:HG2WLDMXX:1:1101:5303:1000 SEQUENCE2 

Those are a set of paired reads, but there’s no way to distinguish between R1/R2. This may not be an issue, but I’m not sure how downstream programs (i.e. Trinity) will handle duplicate FastA IDs as inputs. To avoid any headaches, I’ve decided to parse out the corresponding FastQ reads which have the full header info.

Here’s a brief rundown of the approach:

  1. Create list of unique read headers from MEGAN6 FastA files.
  2. Use list with seqtk program to pull out corresponding FastQ reads from the trimmed FastQ R1 and R2 files.

The entire procedure is documented in a Jupyter Notebook below.

Jupyter notebook (GitHub):


Output folders:

We now have to distinct sets of RNAseq reads to create separate transcriptome assemblies from C.bairdi (Arhtropoda) and Hematodinium (Alveolata)! Will get de novo assemblies with Trinity going on Mox.

from Sam’s Notebook https://ift.tt/2RhkjuM

Shelly’s Notebook: Tues. Jan. 21, 2020 Pt Whitney broodstock conditioning

Strip spawn prep

  1. ID individuals with ripe gonads
    • last year we used punch biopsy tools for this, but decided they were too harmful to the animal. To get the biopsy this way, we had to really scrape and scoop the slippery tissue to prevent it from slipping out of the tool
    • This time, we did the following:
      1. Gently insert cut off 15mL falcon tube (1”) into pedal gape to be able to view gonad
        • had to move foot aside using finger first, then insert tube. uc?export=view&id=1o_4bPBj2SJ8-qBSfApBmgJHwT-fBV0T9
        • We tried a cut off pipette tip first shown here: uc?export=view&id=1pix6VYj70Pb63yig_ZPc02FUJbwiuguM
        • once inserted the animal would typically release water, so holding the tube in place, we would hold the animal with pedal gape aimed towards the floor and let it drain. Sometimes give is a gentle squeeze. We also used a transfer pipette to remove excess water before going in for the biopsy uc?export=view&id=173u5TOk1-TqglXGFBg1-RH_m7mFRL9iv
      2. Use 18 G needle on a 3mL syringe to pierce the gonad and withdraw some tissue. Barely piercing the surface should suffice. Too deep would pierce into digestive tissue. We repeated this procedure 3 times on some individuals to confirm they were indeed not ripe.
      3. Squirt biopsy out onto a glass slide. Sometime a drop of water is necessary to get the tissue from the biopsy out from the needle and onto the slide
      4. Visualize in the microscope under 4-10x magnification
        • eggs or sperm should be visual as shown here: Tank 2 male 10x mag: uc?export=view&id=1Dc6Bfj-wTNlyY9oaqwEsKirFQgps2lhK Tank 2 female 10x mag: uc?export=view&id=1L6R81u7dQTYNtsOzQxmY_Xz2crc1-opF Tank 2 female 4x mag: uc?export=view&id=1bttomsB6GeTyWoMOIVBeuHix3LNx5RfT Tank 4 male 10x mag: uc?export=view&id=1Vu6gIrNytQwD9djHphSJ7yMGSVSFER0o

    RESULTS: We checked a total of 2 individuals in tanks 1 (pH 7.2), 2 (pH amb), 4, (pH amb) and 5 (pH 7.2). We also checked 2 individuals in tank 6 (pH amb) from the broodstock cohert harvested in October and 3 individuals in tank6 from the broodstock cohert harvested in December. We found one ripe male and one ripe female in tank 2 (pH amb), one ripe male from tank 4 (pH amb), and one ripe male from tank 6 (pH amb).

  2. Prepare 50mM KCl solution
    • I made 200mL of 50mM KCl by diluting 5mL 2M stock solution in 195mL 15C? sea water
    • Once we decided to do a priming bath, I made 4L of 50mM KCl by diluting 100mL 2M stock solution in 3.9L 15C? sea water

Strip spawn steps

  1. Strip males and females
    • using a razor blade, score the gonad. Be careful about cutting too deep to avoid digestive tissue
    • Use back of razor blade to gently scrape sperm or eggs from gonad tissue into a clean 1L tripour beaker
      • rinse scored gonads with filtered sea water from squirt bottle to transfer excess sperm and eggs into tripour beaker
      • repeat gentle scraping and rinsing until most sperm or eggs have been extracted while avoiding the transfer of chunks of tissue male gonads that have been scored, scraped and rinsed: uc?export=view&id=1Csz2-VcW27PNmsJ5rAlZ9DlyKYKZsI2i Sperm collected from stripped males: uc?export=view&id=1wLzjRtzUbH51T9XsjdWx62HCraM5giOz
  2. Rinse eggs with warm seawater (15C?)
    • Transfer eggs from tripour to clean 100uM screen stacked on top a clean 20uM screen. The 100uM screen removes the tissue debris transferred during stripping. The eggs are caught on the 20uM screen. uc?export=view&id=1ekFfqNVgG7-TsGDIcvG2UZdiDRXkuVKy
    • rinse tripour and screen with warm filtered seawater (15C?) uc?export=view&id=1_zs5w4cPJVq-nVNExv1yiO2GzkoyIGh5
    • allow screen to drain. Screen should drain easily. If it doesn’t, it is likely clogged with eggs (as pictured below). The jelly coat on the eggs can get stuck in the mesh and clog the screen. To alleviate this, transfer onto a clean 20uM screen and spray the original screen from the back side to release eggs (this helped tremendously). If 2 20uM screens were used during rinsing, transfer eggs back to one screen. Clogged screen: uc?export=view&id=187dpVaR62Zqdcc2FdGIY5bskJBJVc8BU Transferring eggs onto clean 20uM screen: uc?export=view&id=1qz2CTkp06RA78V2c3rIwfziQe4-jZ1C4 Unclogged screen: uc?export=view&id=1FrLFSCdWjEP9mQeBssuFn7Khx039KFGX Screen speckled with eggs: uc?export=view&id=1pM7h95sRTxV7YVol_e2q3-vmsEHXf4f3 Combing eggs after splitting rinse on 2 20uM screens: uc?export=view&id=1YWaW24D8_HmJA2qmf1NUCvP6JgWN6EMr
  3. Prime eggs in KCl
    • Submerge screen with eggs in 50mM KCl solution uc?export=view&id=1sTkdd89Rctx4cxm0EZqB1ABWFjtFfypr
    • allow eggs to incubate in KCl solution for 20 minutes.
  4. Rinse eggs to remove KCl
    • rinse eggs on 20uM screen with warm filtered seawater (15C?) similarly to previous rinse step. After rinsing, transfer eggs to 2L easy-pour skinny beaker with handle and top off with warm filtered seawater (15C?) to a final volume that can be easily divided into the desired number of fertilizations
      • Since we had stripped 3 males, we made a final volume of 1200mL to be able to add 400mL eggs to each fertilization.
  5. Determine number eggs in final egg solution
    • Mix 1200mL egg solution with mixer while sampling for counts (mixer was a PVC pipe glued onto plastic circle studded with 1/2” holes and use almost as if a plunging while rotating it)
    • Microscope counts:
      • on microscope slide added 3 x 20uL egg solution and counted with tally counter
    • Cellometer counts:
      • added 2x 57uL egg solution to PD300 slide and counted cells on Nexcelom cellometer RESULTS:
    • Microscope counts:
      • 8.88 * 10^6 eggs = 148 eggs/20uL = 7400 eggs/mL * 1200mL
    • Cellometer counts:
      • 4.2*10^6 eggs = 3500 eggs/mL * 1200mL
  6. Combine sperm and eggs
    • check quality of sperm under microscope Discard any collections that seem non-viable RESULTS: One male sperm collection was not very active so we did not use it for fertilization
    • Mix eggs and aliquot into tripours for fertilization
      • We ended up splitting the eggs into 2 x 600mL aliquots for 2 fertilizations because one male collection seemed not viable
    • Added 6mL of sperm to each 600mL aliquot (@2:54pm)
    • Inubate for 15-20 minutes and transfer to LRT RESULTS:
    • 20min post-fertilization: sperm is all around the eggs but no polar bodies visible in 20uL drop uc?export=view&id=1mafmUN75wsXTl9P0fKxy3y703tGvvzag
  • 4-5 hours post-fertilization: starting to show cleavage uc?export=view&id=1y73_ofsgCAE-6SJSdliW8pAfsAhvKT_K

from shellytrigg https://ift.tt/2Gdn61H

Kaitlyn’s notebook: 20200121 Geoduck Strip spawn @ Pt. Whitney

Shelly, Brent and I helped Matt with a strip spawn yesterday!

  • Modified and tested 1000ul pipette tip and end of 50ml falcon tube for speculum
    • Inserted speculum into pedal gape
      • Used 18g needle to sample gonad
        • Put on slide and examined at 4X and 10X for sperm or egg

The modified falcon tube worked best because it was slightly larger and remained in place better when the geoduck contracted. Using a headlamp, we could actually see the gonad before taking a sample.

  • Tested 2 geoduck from each tank (1, 2, 4, and 5)
    • Found 3 ripe males and a ripe female
  • Shucked and removed gonad
  • Cross-hatched gonad; added egg and sperm into individual tripours
  • Rinsed eggs in 20um screen
    • lots of clogging from egg jelly –> used 2 screens
  • Primed eggs with ~4000 mL of 50mM KCL
  • Rinsed over 20um screen and added to 2L pitcher
    • Final volume of 1200ml; Cellometer estimated 4.2*10^6 eggs and microscope counts estimated 8.88*10^6 eggs

Checked on quality of sperm. 2/3 males had high quality sperm based on appearance and activity.

  • 600mL of eggs in 2 tripours
    • Added 6mL of sperm
  • Checked on scope for egg:sperm; over 30 egg to sperm egg-sperm
  • Added sperm to LRT (5000L tank) after 5 minutes.

The LRT was at 10C which may slow down the fertilization and development process. A sample after ~40 minutes showed some polar body formation.


Matt sent over a picture of polar bodies from the LRT later that day: