Engineering new functionality into biology is inherently high-risk - we experience many failures along the way and interpreting and learning from these is an important part of the process. To mitigate the risks, and improve to scope and scale of what we can engineer, we are always experimenting with new tools and technologies for biomolecular engineering. We recently developed a deep learning model to help guide library design for directed evolution, and a schematic of the algorithm depicting the data pipeline and neural network architecture is shown below. Given a high-resolution structural data, this model is capable of classifying wild-type residues with high accuracy. Wild-type residues which are assigned a very low probability deviate from the structural or chemical consensus and may represent residues which are locally destabilizing. Several discrete changes were made to the original neural net framework which have a significant effect on classification accuracy. *Note that normalizing the amino acid abundance of the training data increased the size of the dataset by ~4-fold. Bias of wild-type amino acid classification is shown in a confusion matrix (lower right). Structurally unique amino acids Gly and Pro are assigned as wild-type with very high probability.
Using this algorithm we have successfully engineered several diverse model proteins for improved folding and activity in vivo. Other active areas of technology and method development include using live-cell emulsions for compartmentalizing interesting biochemistry and expanding the use of robotics and automated systems.