Roberto’s Notebook: Differential expression Analysis

After finish the analyses using Hisat program and complemented with stringtie, the differential expression analysis in R using the libraries:
library(ballgown)
library(RSkittleBrewer)
library(genefilter)
library(dplyr)
library(devtools)
and command lines like:

pheno_data = read.csv(“/Volumes/toaster/roberto/phenodata_day30.csv”)
bg_Cragi = ballgown(dataDir = “/Volumes/toaster/roberto/Hisat_results/stringtie_results/Est_abundance/ballgown30/”, samplePattern = “Os”, pData=pheno_data)
bg_Cragi_filt = subset(bg_Cragi,”rowVars(texpr(bg_Cragi)) >1″,genomesubset=TRUE)

#To look for diff expr between thermal tolerance:

results_transcripts = stattest(bg_Cragi_filt, feature=”transcript”,covariate=”thermal.tolerance”,adjustvars = c(“family”), getFC=TRUE, meas=”FPKM”)
results_genes = stattest(bg_Cragi_filt, feature=”gene”, covariate=”thermal.tolerance”, adjustvars = c(“family”), getFC=TRUE, meas=”FPKM”)

#This is to add gene names and gene IDs to the results_transcripts data frame

results_transcripts = data.frame(geneNames=ballgown::geneNames(bg_Cragi_filt), geneIDs=ballgown::geneIDs(bg_Cragi_filt), results_transcripts)

#Then, to sort the results from the smallest P value to the largest:

results_transcripts = arrange(results_transcripts,pval)
results_genes = arrange(results_genes,pval)

#To write the results to a csv file:

write.csv(results_transcripts, “Cragi_transcript_results_D30.csv”, row.names=FALSE)
write.csv(results_genes, “Cragi_gene_results_D30.csv”, row.names=FALSE)

#To identify transcripts and genes with a q value subset(results_transcripts,results_transcripts$qvalsubset(results_genes,results_genes$qvaltropical= c(‘darkorange’, ‘dodgerblue’, ‘hotpink’, ‘limegreen’, ‘yellow’)
palette(tropical)

fpkm = texpr(bg_Cragi,meas=”FPKM”)
fpkm = log2(fpkm+1)
boxplot(fpkm,col=as.numeric(pheno_data$thermal.tolerance),las=2,ylab=’log2(FPKM+1)’)

Screen Shot 2018-08-07 at 3.24.25 PM

##As an example, the observation of differential expression of particular gene (with its isoforms):

ballgown::transcriptNames(bg_Cragi)[4295]
plot(fpkm[4295,] ~ pheno_data$thermal.tolerance, border=c(1,2), main=paste(ballgown::geneNames(bg_Cragi)[4295],’ : ‘, >ballgown::transcriptNames(bg_Cragi)[4295]),pch=19, xlab=”thermal.tolerance”, ylab=’log2(FPKM+1)’)
points(fpkm[4295,] ~ jitter(as.numeric(pheno_data$thermal.tolerance)), col=as.numeric(pheno_data$thermal.tolerance))
Gene_expression

#Then, the observation of this gene differentially expressed in a single sample:

plotTranscripts(ballgown::geneIDs(bg_Cragi)[4295], bg_Cragi, main=c(‘Gene MSTRG.3053 in sample Os15’), sample=c(‘Os15’))
Gene_and_isoforms
For this it was necessary to look for the gene id (MSTRG.3053) in matrix file and add it to the command line.

#As a las step. The visualization of expression of isoforms between resistant and susceptible families and the position of these in the genome:

plotMeans(‘MSTRG.3053’, bg_Cragi_filt,groupvar=”thermal.tolerance”,legend=FALSE)
Gene_expressed_resistantvssusceptible

I am still working in the comparison of data obtained with Hisat/stringtie and the data from Trinity.