Normalization against reference housekeeping genes which are expected to be expressed consistently across all samples
After data extraction from the FASTQ file, reads for each gene need to be normalized for analysis. We recommend normalizing the read counts for each gene amplicon against endogenous housekeeping genes to enable an accurate comparison of expression levels across the series of samples. Housekeeping genes are expected to be expressed relatively consistently across all biological samples and, therefore, provide a reasonably consistent standard with which to compare counts of the regulated genes of interest between samples. Using the procedure outlined below, the reads can be normalized relative to any set of genes for which expression levels are not expected to change across all the samples. Please refer to Appendix D. Housekeeping Control Genes for the list of the housekeeping genes we recommend using for this analysis.
For housekeeping gene normalization, calculate the average of the geometric means of read counts for all housekeeping genes across all samples and then use this as the reference to normalize each individual sample as follows:
- For each sample, calculate the geometric mean of the read counts of the housekeeping control genes.
- Calculate the average of the geometric means across all samples.
- Divide this average by the geometric mean in each sample to get a sample-specific normalization factor.
- Multiply all DriverMap gene counts in a sample by that sample’s normalization factor.
After normalization measures have been applied, standard statistical tests can be used to analyze the data and identify differentially expressed genes. For normally distributed data, a paired t-test (for two conditions) or one-way ANOVA (three or more conditions) is recommended for this analysis. The Benjamini & Hochberg FDR-controlling procedure (or similar multiple testing approach) is suggested to correct and adjust p-values.
Need more help with this?