enformer-model / README.md
Lal
Add parameter count and fix loading code
eaadc28
metadata
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
  - biology
  - genomics
datasets:
  - Genentech/enformer-data

Enformer Model (Avsec et al. 2021)

Model Description

This repository contains the weights for the Enformer model, a long-range transformer architecture designed to predict functional genomic tracks from genomic DNA sequences.

  • Architecture: Convolutions followed by Transformer layers
  • Input: 196,608 bp of genomic DNA sequence
  • Output Resolution: 128 bp bins
  • Parameters: 246M
  • Source: Avsec, Ž. et al. Nature Methods (2021)

Model Heads & Output Tracks

Model Tracks Genome
Human 5,313 hg38
Mouse 1,643 mm10

Repository Content

The repository includes both full PyTorch Lightning checkpoints and raw state dictionaries for the human and mouse versions of the model. Note that the weights are derived from the publication but the model has been converted into the PyTorch Lightning format used by gReLU (https://github.com/Genentech/gReLU).

File Type Description
human.ckpt PyTorch Lightning Full checkpoint including base model and human head
mouse.ckpt PyTorch Lightning Full checkpoint including base model and mouse head
human_state_dict.h5 HDF5 Weights-only state dictionary for the human model
mouse_state_dict.h5 HDF5 Weights-only state dictionary for the mouse model
save_wandb_enformer_human.ipynb Jupyter Notebook Code used to create human.ckpt
save_wandb_enformer_mouse.ipynb Jupyter Notebook Code used to create mouse.ckpt

Usage

The models are intended for use with the grelu library.

from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

# Download the desired checkpoint
ckpt_path = hf_hub_download(
    repo_id="Genentech/enformer-model",
    filename="human.ckpt"
)

# Load the model
model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()