MERGE integrates 8 state-of-the-art machine learning models to predict the pathogenicity of genetic variants. Each model brings unique strengths in analyzing different aspects of genomic data.
AlphaGenome is an advanced deep learning framework designed to model whole-genome contexts and predict the functional impact of genetic variants across the entire human genome. It leverages deep evolutionary constraints to map sequence variations to potential clinical outcomes.
HyenaDNA is a long-range genomic foundation model capable of processing contexts of up to 1 million base pairs at single-nucleotide resolution. It uses the Hyena operator to capture distant regulatory interactions and variant effects that traditional models miss.
The Nucleotide Transformer (NT) is pre-trained on a vast, diverse collection of DNA sequences. It learns the fundamental language of genomes to accurately predict molecular phenotypes, regulatory elements, and the impact of non-coding variants.
AlphaMissense is built on AlphaFold's protein structure prediction capabilities. It combines deep structural insights with evolutionary information to predict the pathogenicity of missense variants with high accuracy across the entire human proteome.
ESM-1b (Evolutionary Scale Modeling) is a large-scale protein language model trained on millions of protein sequences. It captures evolutionary patterns to predict how mutations affect protein function without requiring explicit structural data.
GPN-MSA (Genomic Pre-trained Network with Multiple Sequence Alignments) integrates deep learning representations with whole-genome alignments. It effectively maps evolutionary constraints to predict pathogenic variations in both coding and regulatory regions.
popEVE extends the evolutionary variant effect prediction framework by incorporating population-scale genetics data. It uses deep generative models to assess the clinical significance and fitness effects of mutations across diverse human populations.
Evo2 is an advanced foundation model that directly links DNA sequence to function across vast evolutionary timescales. Trained across domains of life, it uncovers deep, cross-species genomic principles to robustly predict variant effects.