Unambiguous Sequenceable Barcodes
Quality Control of the libraries and the final screening representation analysis are greatly facilitated by the incorporation of easily sequenced barcodes in each shRNA construct. The barcodes enable unambiguous identification of each shRNA species with Next-Generation Sequencing (NGS). Depending on the shRNA library that you have chosen, the barcodes and flanking primers will vary. For example, the barcodes in the DECIPHER shRNA Libraries are 18 nucleotides long, while the barcodes in the Human Genome-Wide shRNA Library are 22 nucleotides long. Upon lentiviral transduction, barcodes integrate into the genomic DNA along with the shRNA expression cassette and are permanently present but not expressed in the cell. Lastly, some libraries contain clonal barcodes, which enable tracking of individual cell clones expressing specific shRNA sequences. These allow for a wider variety of screening protocols that involve cell proliferation, differentiation, migration, metastasis, or apoptosis in specific clones.
Please refer to the User Manual and Product Certificate (PC or PAC) for the specific library you are using for detailed information on the barcodes. An example barcode structure is shown below.
Representation Levels of Individual shRNA Sequences
Cellecta specifically designs and constructs pooled shRNA libraries using proven library construction procedures, not by re-amplifying and mixing pre-made individual shRNA constructs. As a result, it is possible to obtain a narrow representation of virtually all shRNA. The use of our optimized and unambiguous barcodes in combination with NGS enables Cellecta to ensure that more than 99% of shRNA encoding inserts are present in every library and that the representation frequencies of 80-90% of them fall within a 10-fold range.
In the shRNA Representation Histogram figure below, the upper panel shows a pooled library of 27,000 shRNAs with very good representation. Virtually all the shRNAs are seen between between 100 copies and 1,000 copies in 20 million reads. Thus, there is just a 10-fold difference between the most represented and least for about 90% of the shRNAs. The library has a relatively balanced representation of all shRNAs. On the other hand, the lower panel shows a poor library where almost half of the shRNAs are present at less than 100 copies whereas the others are very highly represented. Overall, the distribution is very broad. It is only possible to get readable signals for about half the shRNAs using the library in the lower panel.
This definitive representation data at the start of a screening provides a starting point for the analysis to find shRNAs that significantly increase or decrease during screening, indicating relevant targets. With a poorly defined distribution, it is difficult to differentiate signal vs. noise in any screening assays—or even which shRNA is actually missing in the screen. In other words, you need this data to know what is truly being screened.
Quantifiable Next-Generation Sequencing (NGS)
Next-Gen Sequencing (NGS) significantly outperforms the hybridization-based approach for identification of individual shRNA species based on the high-quality “digital” expression data generated by using barcodes. Even using optimized barcode sequences, array hybridization suffers from a limited dynamic range of approximately 2 orders of magnitude which results in a loss of as much as 30% of the signals that fall outside their effective range. Also, spot-to-spot cross hybridization on arrays results in significant noise that does not occur with NGS where virtually every shRNA in the population is detected and counted, from those present in only a few copies to those present in several million. Differences in shRNA species between control and test populations are very easily detected and statistically analyzed, so that hits can be confidently identified.
shRNA Sequence Design (Purposeful Mismatches)
Cellecta has developed its own in-house shRNA design algorithm that makes use of internal studies primarily focused on the most functionally effective structural features (e.g. length, loop size, mismatches, etc.), combined with published information regarding sequence preferences, and known sequences that have been shown effective for a particular target.
Need more help with this?