While the exact details for the fitting could be found in linked articles at the end of the blog, the end result is coefficients that represent expression change between conditions/samples (i.e log fold changes) (Formula 2, 3).įormula 1 : The read count K ij (for gene i in sample j) is fitted to a Negative Binomial GLM with two parameters mean expression value and dispersion (which we have calculated and estimated in step 1)įormula 2 : An easier interpretation of the Negative Binomial GLM in Formula 1 with y is the gene expression (obtained from the mean and dispersion), x i is the design factor i (e.g control = 0, case = 1), ẞ i is the coefficient for design factor i (explained in detail here ). Second : with the dispersion estimated, the count value of each gene is fitted to a Negative Binomial Generalized Linear Model (Formula 1). Due to the nature of single-cell RNA seq data, the ratio of mean and variance shifts when the expression level of a gene changes rather than remains constant, thus it is important to accurately estimate the relationship between these two statistics via dispersion for subsequent model fitting step. The highlight of this step is the estimation of dispersion, a statistic reflecting the relationship between mean and variance. Here’s an oversimplified description of what DESeq2 does stepwise and what each step means.įirst : calculate the mean and estimate the dispersion of gene expression for each and every gene. What makes DESeq2 a widely popular tool in single-cell analysis? In this article, we’ll break down the basics of DESeq2 and highlight 2 advantages of DESeq2 in differential gene expression for single-cell RNA-seq. This time, we’d like to discuss a frequently used tool – DESeq2 ( Love, Huber, & Anders, 2014).Īccording to Squair et al., (2021), in 500 latest scRNA-seq studies, only 11 methods made up to over 80% of DE analysis. In our previous post, we have given an overview of differential expression analysis tools in single-cell RNA-Seq. Volcanoplot(resdata, lfcthresh=1, sigthresh=0.05, textcx=.8, xlim=c(-2.Differential expression analysis is a common step in a Single-cell RNA-Seq data analysis workflow. With(subset(res, padj lfcthresh), points(log2FoldChange, -log10(pvalue), labs=Gene, cex=textcx. With(subset(res, padj lfcthresh), points(log2FoldChange, -log10(pvalue), pch=20, col="green". Volcanoplot lfcthresh), points(log2FoldChange, -log10(pvalue), pch=20, col="orange". # Volcano plot with "significant" genes labeled Setwd ( "~/Atrx/" ) dat <- read.table ( "At_count.txt", header = T, quote = "", row.names = 1 ) # Convert to matrix dat <- as.matrix ( dat ) head ( dat ) # Assign condition (first three are WT, next three are mutants) condition <- factor ( c ( rep ( "WT", 3 ), rep ( "Mut", 3 ))) condition = relevel ( condition, ref = "WT" ) # Create a coldata frame: its rows correspond to columns of dat (i.e., matrix representing the countData) coldata <- ame ( row.names = colnames ( dat ), condition ) head ( coldata ) # condition # S293 WT # S294 WT # S295 WT # S296 Mut # S297 Mut # S298 Mut # DESEq pipeline, first the design and the next step, normalizing to model fitting dds <- DESeqDataSetFromMatrix ( countData = dat, colData = coldata, design =~ condition ) dds <- DESeq ( dds ) # Plot Dispersions: png ( "qc-dispersions.png", 1000, 1000, pointsize = 20 ) plotDispEsts ( dds, main = "Dispersion plot" ) dev.off ()
0 Comments
Leave a Reply. |