Massively parallel genome engineering followed by pooled growth selections for rapid target discovery in microbes
Tyson Shepherd, INSCRIPTA
Aug 12, 2021
The ability to engineer the entire genome of microorganisms at a massively parallel scale using the Onyx platform will reshape the manner in which researchers perform pooled cultivation experiments (such as ALE) and generate data sets. Here, Onyx technology was used to rapidly generate genomewide knockout and promoter-ladder libraries in E. coli. The resulting libraries were applied to pooled growth cultivations in the presence of four inhibitory compounds common to biomass hydrolysates to quickly generate a rich data set of both validating and novel gene function, illustrating just one of many ways by which an investigator will be able to leverage the benefit of the Onyx genome engineering platform.
Among proven investigative approaches for understanding complex biological systems, pooled cultivation strategies have been useful for interrogating large and diverse populations of genetic variants against a range of experimentally relevant growth conditions and evolutionary scenarios. Pooled cultivation methods, such as adaptive laboratory evolution (1), chemogenomic profiling (2, 3) and evolutionary engineering (4), have been used successfully for target discovery in both basic and applied research. However, outcomes can be constrained by limited access to genetic diversity, either in the initial population, or from intrinsically low mutagenic rates during cultivation. As new and powerful high-throughput genome engineering tools emerge, the scope and pace of target discovery using pooled growth selection approaches will undoubtedly accelerate (5). One such genome engineering tool is the Onyx™ platform from Inscripta™, which enables genome-wide CRISPR editing at scale in an automated benchtop device. Here we describe application of the Onyx platform for massively parallel and targeted strain engineering in E. coli to generate a genetically diversified seed population for pooled cultivations under selective pressure (Figure 1). The use of this extensive and precise strain engineering strategy enables rapid discovery and ranking of loci both sensitive and resistant to the applied cultivation conditions. In the example presented here, engineered strain populations were grown in the presence of each of four known growth-inhibitory compounds commonly occurring in biomass hydrolysates: furfural, hydroxymethyl-furfural, vanillin, and syringic acid (6).
Reagents and Equipment
Figure 1. Microbial genome engineering workflow using the Onyx benchtop platform. Design: edit library is designed using InscriptaDesigner™ Engineer cells: cells are engineered using Onyx system and Onyx genome engineering chemistry Genotype: cells are genotyped using Onyx assays Analyze and learn: results are analyzed using InscriptaResolver™
Rapid Generation of Precision-Engineered E. coli Populations
Genome-wide engineering was performed on E. coli strain MG1655 using Onyx technology. A total of six individual strain engineering oligonucleotide libraries were designed to target all 4,336 annotated protein encoding genes in the E. coli genome for knock-out by premature stop codon insertion, or for expression modulation by insertion of five small constitutive synthetic promoters of defined expression strengths (7). During the design process only high quality designs were retained which resulted in 3,676 genes with promoter designs and 3,966 knock-out designs (Table 1). Each library design construct included an edit-specific trackable barcode sequence. The six engineered cell libraries were constructed individually using Onyx technology and banked as individual glycerol stocks.
Table 1. Genome-wide engineering designs.
Pooled Growth Selections in the Presence of Inhibitory Compounds
The engineered E. coli libraries were combined in equal amounts to generate a starting inoculum for pooled cultivation experiments. Baffled 250 ml shake flasks were prepared with 50 ml sterile supplemented M9 medium containing 1 g/L of each test compound in triplicate. Three negative control flasks were prepared as well, each containing supplemented M9 media only. All flasks were seeded with a 1:1000 starting volume of the pooled cell library inoculum and incubated at 30o C shaking (250 rpm). After 48 hours of growth, five aliquots from each flask were banked as glycerol stocks for later analysis and characterization.
Post-Growth Selection Barcode Sequencing
Plasmid DNA was isolated from the 15 glycerol stocks from pooled cultivation samples using the QIAprep® miniprep kit (QIAGEN®). Extracted plasmid pools were PCR-amplified across the region containing the sample barcode. The amplicon was sequenced using three runs on an Illumina® MiSeq® sequencer according to the manufacture’s guidelines (1 x 75 bp read per cluster, one sequencing run per biological replicate for each set of five experimental conditions). Sequencing reads were mapped back to individual editing designs in the original six libraries using the sample barcodes to establish the relative abundance of every engineered strain variant in the population within each experimental shake flask.
Correlation of the counts of editing cassettes from barcode sequencing of strain variants across three biological replicates indicates high experimental reproducibility (Figure 2A). Principal component analysis of the populations by treatment condition supports this reproducibility while demonstrating meaningful separation across test conditions relative to both the untreated control and one another (Figure 2B).
Figure 2. Data quality and biological signal. (A) Correlation of the counts of editing cassettes of strain variants across three biological replicates indicates high reproducibility. (B) Principal component analysis of the populations by treatment condition shows high reproducibility while demonstrating meaningful separation across test conditions relative to both the untreated control and one another.
Enrichment and Depletion Profiles of Engineered Strain Variants in the Presence of Biomass Growth Inhibitors
Comparison of the normalized relative abundance of each strain variant under untreated and test conditions enabled calculation of a log2-fold change value for all variants across the four test cultivation conditions. Analysis of the strain variant enrichment and depletion profiles across knockout and promoter library types, as well as by biomass hydrolysate inhibitory compound, allowed for parsing the behavior of engineered strain variants within the experimental condition (Figure 3).
A total of 2,404 promoter edits and 1,405 knockouts across 1,313 and 1,076 genes respectively were significantly enriched or depleted in response to furfural. This broad response indicates that perturbations to at least a quarter of all genes in the genome show a significant response, consistent with previous work in yeast showing that nearly all genes affect at least one phenotype (8). For vanillin, 716 promoter edits and 1,109 knockout edits across 484 and 924 genes respectively are significantly enriched or depleted, implicating a similarly broad swath of genes in vanillin tolerance.
Figure 3. Global enrichment and depletion profiles. (A) Enrichment or depletion of individual cassettes under furfural selective pressure. Fold changes are calculated relative to an untreated negative control. Significantly enriched or depleted cassettes are highlighted. Promoter edits and knockouts show a similar trend of overall fitness increase. (B) Gene set enrichments for promoter edits and knockouts show a pattern of increased tolerance under fermentative metabolism with concomitant knockdown of ATP synthesis in aerobic metabolism. (C) Enrichment or depletion of individual cassettes under vanillin selective pressure. Knockouts show a distinctively negative enrichment pattern with only a handful of them conferring vanillin resistance. (D) Promoter edits confer tolerance via overexpression of sugar transport and components of iron homeostasis. Knockouts are broadly deleterious, hitting non-essential but conditionally restrictive components of the translation machinery and cell wall biosynthesis.
Genes previously identified for their roles in furfural tolerance (9, 10, 11) or vanillin response (12) were observed among the edited loci in strains responding to those treatments in this experiment, providing a subset of validation data within our larger data set (Figure 3A and 3C). For example, in the furfural tolerance experiment, we observed significant enrichment of two known fitnessconferring knockouts on the NADP-specific reductases yqhD (2.95 log2 fold change, 8.19E-75 adj pval) and dkgA (3.54 log2 fold change, 2.96E-110 adj pval). Additionally, in the vanillin tolerance experiment, knock out of gltA significantly decreases fitness (-1.09 log2 fold change, 2.66E-07 adj pval) while gltA expression pertubation moderately increases fitness (0.32 fold change, 8.55E-02 adj pval), outcomes which are consistent with gltA catalytic variants having previously been identified consistently in adaptive library evolution (12).
Target Identification: Rapid discovery of thousands of novel genotype-phenotype links for biomass hydrolysate inhibitors The availability of a ladder of gene expression variants paired with the concomitant knockout strain for nearly every gene makes it possible to obtain a deeply nuanced view into the mechanisms of inhibitor tolerance in E. coli. Clustering of genes according to their enrichment/depletion response profiles reveals distinct groups that exhibit common responses to furfural exposure (Figure 4A). The distribution of enrichment/depletion values for all genes in two of the four clusters are shown as boxplots in Figure 4B and 4C.
Figure 4. Coordinated genotypic response to furfural stress. (A) Hierarchical clustering with HDBSCAN following UMAP dimensionality reduction on the gene-based Log2FoldChanges between an untreated control and furfural treated selections. Four distinct gene clusters are readily identified, with each having a unique profile of response to furfural across the knockout and promoter ladders. Position of key genes known to be involved in furfural tolerance (10, 11) is shown within their corresponding clusters. (B) A cluster of genes conferring furfural tolerance is characterized by decreasing fitness upon gene knockout and increasing fitness upon overexpression of genes in the cluster. The key genes in this cluster (pntA, fucO) are known to increase furfural tolerance upon overexpression (11). Tables shows genes that are enriched in this cluster. (C) A cluster of genes conferring furfural susceptibility is characterized by increasing fitness upon gene knockout and decreasing fitness for overexpression of genes in the cluster. The key genes in this cluster (metJ, yqhC) have been shown to regulate furfural tolerance (10, 13). (D) Mechanism of furfural toxicity (11). NADPH-specific oxidoreductases reduce furfural to the less-toxic furfuryl alcohol. The NADPH-dependent processes deplete cellular NADPH, resulting in stunted cellular growth.
As shown in Figure 4B, knock out of 246 genes (gray cluster in Figure 4A) results in a growth inhibition phenotype in the presence of furfural, while overexpression of those genes confers a growth benefit. Importantly, two genes previously validated for furfural tolerance (11), pntA and fucO, fall into this cluster. The fucO gene is a crucial NAD-dependent reductase that upon overexpression is shown to improve furfural tolerance by compensating for the drain of NADP during furfural detoxification. The pntA gene is a primary transhydrogenase that converts NAD to NADP and thus also helps shuttle NAD to NADP to mitigate NADP starvation. This gene cluster is also heavily enriched for genes involved in acyltransferase activity and the general transcription process. Many of the genes in the transcription-associated category are well-studied, broad-acting transcription factors such as arcA, pdhR, ihfA, iclR, and xylR, along with many other regulatory proteins. This result generally suggests that at least for single-gene perturbations, the knockout of transcription factors likely disrupts crucial regulatory patterns needed for survival in stressful conditions. Another cluster of 201 genes (orange cluster in Figure 4A) show furfural tolerance when deleted and decreases fitness when overexpressed. Two additional key furfural response genes, yqhC and metJ, are found in this cluster. The yqhC regulator of yqhD predictably leads to a fitness increase upon deletion and fitness decrease upon expression perturbation by deregulating yqhD and limiting its expression. This same pattern occurs for the metJ gene, which has been implicated due to the strong requirement of NADPH for iron-sulfur cluster protein formation (10).
Massively Parallel Genome-Edited Libraries Provide Distinct Advantages Compared to Traditional Omics Experiments
Large-scale measurements of gene expression and protein abundance can readily identify lists of genes that are involved with the mechanisms of tolerance. However, translating observed up- or down-regulation into actionable information can be challenging. The high-throughput establishment of genotype-phenotype connections made possible through the use of the Onyx platform provides an unprecedented amount of causal information in a single experiment. For example, proteomics analyses have indicated that the nanA gene is downregulated 3.7-fold under vanillin stress, suggesting that a knockout or knockdown nanA would improve furfural tolerance. However, the work presented here indicates that a knockout of nanA is highly deleterious (-0.9 log2 fold change, 3.62E-15 adj pval) and its overexpression is overwhelmingly beneficial (1.02 log2 fold change, 6.54E-16 adj pval). This can be especially important for poorly characterized genes and for industrial production conditions that diverge from standard experimental conditions where the responses of well-characterized genes are known.
Normalization and fold-change calculation were carried out with DESeq2 (14) and the ‘ashr’ adaptive shrinkage method (15). Functional classification of the broad categories (Figure 3B, 3D) of response for knockout or promoter perturbation were carried out with a gene set Z-score based method (9) and limma (16). Additional clustering of combined promoter and knockout profiles was done with UMAP(17) dimensionality reduction followed by HDBSCAN clustering (18). GO term enrichments on the set memberships from each cluster were performed with a hypergeometric test. Due to an expected dropout rate from a known lack of full library coverage, multiple testing correction was not performed. Future releases of Inscripta’s trackability reagents will enable proper assessment of the universe of genes for enrichment testing and support procedures equivalent to those used for omics measurements.
Notes and Comments
Copyright © 2020 Inscripta,Inc. For Research Use Only. Not for use in diagnostic procedures. Inscripta, Onyx and the Inscripta logo are all trademarks of Inscripta, Inc. in the United States and/or other countries. QIAprep is a registered mark of QIAGEN, MiSeq is a trademark of Illumina, Inc, or their respective owners.
1. Sandberg, T.E., Salazar, M.J., Weng, L.L., Palsson, B.O., and Feist, A.M. (2019). The emergence of adaptive laboratory evolution as an efficient tool for biological
discovery and industrial biotechnology. Metab. Eng. 56, 1–16.
2. Giaever, G., Flaherty, P., Kumm, J., Proctor, M., Nislow, C., Jaramillo, D.F., Chu, A.M., Jordan, M.I., Arkin, A.P., and Davis, R.W. (2004). Chemogenomic profiling:
identifying the functional interactions of small molecules in yeast. Proc. Natl. Acad. Sci. U. S. A. 101, 793–798.
3. Girgis, H.S., Hottes, A.K., and Tavazoie, S. (2009). Genetic architecture of intrinsic antibiotic susceptibility. PloS One 4, e5629.
4. Shepelin, D., Hansen, A.S.L., Lennen, R., Luo, H., and Herrgard, M.J. (2018). Selecting the Best: Evolutionary Engineering of Chemical Production in Microbes. Genes 9.
5. Garst, A.D., Bassalo, M.C., Pines, G., Lynch, S.A., Halweg-Edwards, A.L., Liu, R., Liang, L., Wang, Z., Zeitoun, R., Alexander, W.G., et al. (2017). Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat. Biotechnol. 35, 48–55.
6. Zha, Y., Muilwijk, B., Coulier, L., Punt, P.J. (2012) Inhibitory Compounds in Lignocellulosic Biomass Hydrolysates during Hydrolysate Fermentation Processes. J Bioprocess Biotechniq 2:1.
8. Hillenmeyer, M.E., Fung, E., Wildenhain, J., Pierce, S.E., Hoon, S., Lee, W., Proctor, M., St Onge, R.P., Tyers, M., Koller, D., et al. (2008). The chemical genomic portrait of
yeast: uncovering a phenotype for all genes. Science 320, 362–365.
9. Lee, C., Kim, I., Lee, J., Lee, K.-L., Min, B., and Park, C. (2010). Transcriptional activation of the aldehyde reductase YqhD by YqhC and its implication in glyoxal metabolism of Escherichia coli K-12. J. Bacteriol. 192, 4205–4214.
10. Miller, E.N., Jarboe, L.R., Turner, P.C., Pharkya, P., Yomano, L.P., York, S.W., Nunn, D., Shanmugam, K.T., and Ingram, L.O. (2009). Furfural inhibits growth by limiting sulfur assimilation in ethanologenic Escherichia coli strain LY180. Appl. Environ. Microbiol. 75, 6132–6141.
11. Wang, X., Yomano, L.P., Lee, J.Y., York, S.W., Zheng, H., Mullinnix, M.T., Shanmugam, K.T., and Ingram, L.O. (2013). Engineering furfural tolerance in Escherichia coli improves the fermentation of lignocellulosic sugars into renewable chemicals. Proc. Natl. Acad. Sci. U. S. A. 110, 4021–4026.
12. Pattrick, C.A., Webb, J.P., Green, J., Chaudhuri, R.R., Collins, M.O., and Kelly, D.J. (2019). Proteomic Profiling, Transcription Factor Modeling, and Genomics of Evolved Tolerant Strains Elucidate Mechanisms of Vanillin Toxicity in Escherichia coli. MSystems 4.
13. Turner, P. C. et al. (2011) . YqhC regulates transcription of the adjacent Escherichia coli genes yqhD and dkgA that are involved in furfural tolerance. J. Ind. Microbiol. Biotechnol. 38, 431–439.
14. Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550.
15. Stephens, M. (2017). False discovery rates: a new deal. Biostat. Oxf. Engl. 18, 275–294.
16. Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47.
17. McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat] (2018).
18. Rahman, M. F. et al. HDBSCAN: Density based Clustering over Location Based Services. arXiv:1602.03730 [cs] (2016).