Organisms are composed of thousands of different proteins, each of which is encoded by a specific gene. For a cell type to acquire its unique identity, form and function, genes must be activated by "enhancers". Scientists have long tried to crack the code of how enhancers work. Now, in a new study, Alexander Stark's lab at the Institute of Molecular Pathology at the Vienna Biocenter in Austria and Eileen Furlong's lab at the European Molecular Biology Laboratory have used genomics and artificial intelligence to crack a second genetic code, the one underlying gene regulation. The paper, titled "Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo," was published online Dec. 12, 2023, in Nature.
Every healthy cell of a complex organism contains exactly the same copy of the genome, which includes thousands of genes, the blueprints for building proteins. To form different cell types, tissues and organs, additional mechanisms are needed to turn on and off the expression of specific genes with high precision.
As segments of DNA in the genome, enhancers are a key element in turning genes on, and the Stark lab has made it its mission to crack the code that links an enhancer's DNA sequence to its gene regulatory function. Although the first enhancers were discovered in the early 1980s, it is only in the past decade that scientists have developed ways to experimentally identify enhancers.
Building on this foundation, the Stark lab and collaborators are now aiming at three tasks that together constitute a seemingly impossible long-term goal: predicting the activity of enhancers based on their DNA sequences; predicting the consequences of enhancer mutations; and designing enhancers from scratch for specific tissues. In other words: reading, understanding and writing a second genetic code.
With recent advances in genomics and artificial intelligence, the opportunity to crack this code has arisen. These authors have developed a powerful deep learning and transfer learning model and trained it with a large amount of data obtained from previous studies in Drosophila melanogaster, a widely used model organism in developmental biology.
From lab to AI and back again
First, such models were trained using genome-wide DNA sequences and corresponding DNA accessibility data. The deep learning model was then used to initialize the fine-tuning of the migration learning model, in which the migration learning model learns to directly associate DNA sequences with specific enhancer activity.
Stark says, "You can explain migration learning this way: imagine you want to train a model to recognize cats in pictures, but you have very few pictures of cats available. But you have a lot of pictures of dogs. So, you first train an AI model on the dog pictures, then fine-tune it in a second step, and now you can recognize cats."

Image from Nature, 2023, doi:10.1038/s41586-023-06905-9.
Through transfer learning, the model was able to predict enhancer activity in five types of tissues in Drosophila embryos - the central nervous system, brain subsections, epidermis, gut and muscle.
Building on this prediction, these authors took their research efforts from the abstract world of big data and artificial intelligence back to the lab bench. Using sophisticated molecular biology tools, they tested 40 computationally designed synthetic enhancers in living Drosophila embryos. In fact, these enhancers are active and drive gene expression in the target tissue.
The ability to construct synthetic enhancers with specific properties offers unprecedented opportunities to control the targeted expression of genes," says Bernardo de Almeida of the Vienna Biocenter, first author of the paper. Future applications could be in synthetic biology or gene therapy, where precise design and manipulation of gene expression patterns is a prerequisite."
For Stark, however, providing new insights into a phenomenon that is fundamental to life is the most important aspect of this research: "About 60 years ago, scientists learned how the first genetic code worked, how the DNA molecular blueprint was translated into proteins. With the power of genomics and artificial intelligence, we have now succeeded in cracking the second genetic code of life, namely how gene activity is controlled."