The AI system Evo, originally trained on bacterial genomes to predict protein sequences, has been significantly upgraded to Evo 2. This new, open-source model was trained on trillions of DNA base pairs across all domains of life, including complex eukaryotes.
As a result, Evo 2 has learned to identify intricate genomic features in eukaryotes, such as regulatory DNA and splice sites, which are challenging to analyze due to their weak and scattered patterns. This advancement overcomes the initial limitation of applying the method to more complex genome structures.
The main topics covered are the evolution of the Evo AI system, the structural differences between bacterial and eukaryotic genomes, and the new capabilities of Evo 2 in understanding complex genetic features.