RNA Sequencing For Differential Gene Expression Analysis

Next-generation sequencing (NGS) is a powerful technique to perform genome-wide transcriptional analysis of any biological organism (often also called RNA-Seq). By comparing two ore more conditions RNA-Seq permits to find differentially expressed genes – genes that are up- or down-regulated under  specific conditions. Typical examples include the comparison of transcription profiles from normal tissues versus cancer tissues, cells in high versus low nutrient environments, unstressed versus stressed cells or from distinct developmental stages of an organism. A prerequisite for any RNA-Seq study is the availability of an annotated reference genome or a reference transcriptome (see also the application note "Illumina RNA-Seq" under "Related Downloads").

 

Why RNA-Seq?

The advantage of RNA-Seq  over conventional microarray studies is that (i) no prior knowledge about gene models is necessary and (ii) an increased dynamic range is observed with overall  higher sensitivity, reliability and reproducibility levels. In addition, many RNA-Seq protocols allow to analyze both the sense as well as the natural antisense transcripts (NATs) of genes. NATs are widespread in eukaryotic and procaryotic genomes and are now acknowledged as important modulators of gene expression.

 

Microsynth Competences and Services

Experimental Design: As an expert in the area of RNA-Seq, Microsynth is able to provide a full service (from experimental design consulting up to bioinformatics analysis). Should you not involve Microsynth in your experimental design, please consider the importance of the number of biological replicates. To  finally obtain statistical significance for your differential gene expression analysis, we usually advise to include at least 3 biological replicates per condition.
RNA Isolation: Either you leave it up to Microsynth or you use a commercial kit to isolate total RNA.
Library Preparation and Sequencing: Following a quality check of your samples, Microsynth will perform a mRNA enrichment or a rRNA depletion depending on the studied organism. This step is essential because the fraction of rRNA is high and sequencing should be restricted to mRNA (or miRNA). Illumina cDNA library is generated by reverse-transcription including specific sequencing adaptors with barcodes. Finally, the libraries are pooled and sequenced on the Illumina machine. The envisaged number of reads per library depends on the organism under study and the desired sensitivity. Whereas the benchmark for complex eukaryotic genomes (e.g. human, rat, mouse) requires 100-150 M reads (high sensitivity) and 20-30 M reads (low sensitivity), a 10-fold less amount of reads is required for bacteria. 
Bioinformatics Analysis: Reads derived from the sequencing are mapped against the reference genome of the organism under study using the Bowtie2 and TopHat software. TopHat primarily addresses the difficulty of mapping spliced reads in eukaryotic genomes (i.e. reads spanning two exons). Finally the reads per gene are counted and used as input for statistical analysis. Specific statistical packages co-des are used to seek for differentially expressed genes. These packages first normalize the data, then calculate the variance based on the replicates for each condition and finally compute statistical tests to find differentially expressed genes. 
Provided Output Files:
You will receive a report with following content:
  • raw counts of the mapping
  • differential analysis results
  • heatmap with top 30 genes
  • sample clustering 

Besides, raw sequence data, BAM mapping files and a brochure describing some statistical details, will be provided.

 

Examples for Most Important Output Files Provided by Microsynth


Figure 1: Small section from an Excel spread-sheet containing all the raw read counts of a gene expression study comprising 6 samples (two conditions, 3 replicates for each condition). As can be seen, for each sample investigated the number of reads assigned to the various genes of the studied organism is listed.

Figure 2: Small section from an Excel spread-sheet summarizing the main differential analysis results of another gene expression study. Under „Read Counts“ the column baseMean lists the average read counts for condition A and B whereas baseMeanA and baseMeanB indicate the average counts for either condition. Under „Fold Change“ the conventional fold change as well as the logarhythmic fold change between condition A and B are listed. Under „Statistics“ the customer will find the p value as well as the adjusted p value for multiple testing. Under „Significant (>0.05)“ differentially expressed genes based on the adjusted p value are flagged.

Figure 3: Heatmap of up- and down-regulated genes. For each studied condition, customers will obtain a heatmap where the 30 top up- and downregulated genes are clustered and displayed.

Figure 4: Example for a sample clustering analysis. Customers will also receive a clustered heatmap showing the sample-to-sample distances for a given condition. This analysis is helpful in detecting possible outliers from a sample pool.
rechte sp
Contact Form
Interested to discuss your NGS project with an expert or to receive an offer? Then, please fill in our NGS contact form

Related Downloads
AppNote_Illumina_RNASeq



rechte sp
to the top