Discovery and Utilization of Industrial Enzymes in the Era of Big Data
Big data has begun a major era of transformation. Big data is changing the way we live and understand the world, and it has also become a source of new inventions and services. In the field of biotechnology, because the cost of sequencing technology has fallen sharply, people can obtain gene sequences at a rate that was unimaginable 10 years ago. How can you quickly identify and obtain industrially useful target enzyme genes and active enzyme proteins from massive genetic data resources in the shortest possible time? Genomic mining is to find the homologous gene sequences of related enzymes from the literature according to the needs of catalyzing specific reactions, and use them as gene probes to perform sequence comparisons in the genome database and screen to obtain the codes of homologous enzymes information, followed by batch heterologous expression of enzymes and high-throughput screening, and finally obtain new biocatalysts with better catalytic performance.
With the rapid development of structural biology, people's understanding of protein structure has gradually deepened, coupled with a large amount of structural and functional information provided by genomics and proteomics, the molecular transformation of enzymes has also gradually evolved from random mutations in the past to half. More precise methods, such as rational or rational design, can even use computers to design enzymes that do not exist in nature. In short, the design and preparation of biocatalysts has become more and more simple, so that the requirements for specific enzymes in various industries have become more urgent and feasible.
This article will start from today's bioinformatics big data and propose a new idea for industrial enzyme screening that is different from traditional methods. Based on this, several new industrial enzyme design and mining strategies are briefly introduced.
1. Mining target enzyme genes from sequenced microbial genomes
With the rapid development of gene sequencing technology, more and more microbial genomes have been sequenced. The potential functions of the enzymes encoded by the open reading frames of some genes have been predicted, but may not be confirmed by experiments. There are also a large number of open reading frames. The encoded enzyme information has not been annotated or experimentally studied. On the one hand, the annotated hypothetical enzyme gene can be directly cloned and expressed, and the required candidate biocatalyst can be obtained by viability detection; on the other hand, the open reading frame of the unannotated enzyme can be compared and analyzed, and compared with the conserved sequences of similar enzymes were reported for comparison, and the target new enzyme coding sequence with potential functions was found, and then the target biocatalyst with a completely new structure / function was obtained through cloning and expression. The latter is relatively risky (low success rate), but more innovative and easier to obtain intellectual property.
2. Gene mining based on probe enzyme sequences
After the relevant enzyme gene sequence that catalyzes a certain type of reaction has been reported in the literature, you can use this sequence as a gene probe (or template) to search in the public genome database to find candidate enzymes with homology to the probe sequence gene. Furthermore, primers were designed based on the retrieved gene sequence, and DNA encoding these enzymes was obtained by PCR amplification, and they were cloned and expressed. Finally, the target substrate is used for active screening, that is, it is possible to obtain the required biocatalyst with specific catalytic function.
3? New enzyme gene mining based on the combination of sequence and structure information
The technology of cloning the enzyme gene directly from the sequenced microbial genome and expressing it heterologously, or the technology of mining the target enzyme in the gene database based on the sequence of the probe enzyme has matured and achieved good application results.
However, the premise is that the function of the unknown enzyme has been predicted or an enzyme gene sequence that catalyzes the conversion of a specific substrate has been publicly reported.
However, for some specific substrates, enzymes that are mined based on the sequence information of enzymes that catalyze similar reactions often cannot catalyze the conversion of the target substrate, or although they can catalyze the reaction, they cannot achieve the desired effect. If the relevant information such as gene mining and structural analysis can be combined, it will be expected to significantly improve the efficiency of gene mining.