Description

Open source code for AlphaFold


**Software Category:** bio

For detailed information, visit the AlphaFold website.

Available Versions

The current installation of AlphaFold incorporates the most popular packages. To find the available versions and learn how to load them, run:

module spider alphafold

The output of the command shows the available AlphaFold module versions.

For detailed information about a particular AlphaFold module, including how to load the module, run the module spider command with the module’s full version label. For example:

module spider alphafold/2.0.0
ModuleVersion Module Load Command
alphafold2.0.0 module load singularity/3.7.1 alphafold/2.0.0

AlphaFold Installation Details

Dockerfile

We prepared a Docker image based on the official Dockerfile with some modifications. The biggest issues are the TensorFlow version and missing cudnn/cusolver libraries as reported here.

Our current solution is:

  • keep CUDA version at 11.0;
  • downgrade Python to 3.8.10;
  • downgrade TensorFlow to 2.4.1;
  • add libcudnn8 and libcusolver-11-0 in production stage.

We did not use TensorFlow 2.5.0 with CUDA 11.2 because currently our NVIDIA driver version does not support that.

For further details see here.

AlphaFold launch command

The full Singularity command to launch AlphaFold looks like this:

singularity run -B $ALPHAFOLD_DATA_PATH:/data -B .:/etc --pwd /app/alphafold --nv $CONTAINERDIR/alphafold-2.0.0.sif \
    --fasta_paths=/full/path/to/fasta \
    --output_dir=/full/path/to/outdir \
    --model_names= \
    --preset=[full_dbs|casp14] \
    --max_template_date= \
    --data_dir=/data \
    --uniref90_database_path=/data/uniref90/uniref90.fasta \
    --mgnify_database_path=/data/mgnify/mgy_clusters.fa \
    --uniclust30_database_path=/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb70_database_path=/data/pdb70/pdb70 \
    --template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat

Explanation of Singularity flags

  1. The database and models are stored in $ALPHAFOLD_DATA_PATH.
  2. A cache file ld.so.cache will be written to /etc, which is not allowed on Rivanna. The workaround is to bind-mount e.g. the current working directory to /etc inside the container. [-B .:/etc]
  3. You must launch AlphaFold from /app/alphafold inside the container due to this issue. [--pwd /app/alphafold]
  4. The --nv flag enables GPU support.

Explanation of AlphaFold flags

  1. The default command of the container is /app/run_alphafold.sh. All flags shown above are required and are passed to /app/run_alphafold.sh.
  2. As a consequence of the Singularity --pwd flag, the fasta and output paths must be full paths (e.g. /scratch/$USER/mydir, not relative paths (e.g. ./mydir).
  3. The model_names should be a comma-separated list of model_*. See $ALPHAFOLD_DATA_PATH/params for the complete set of model names. In run_docker.py model_1,model_2,model_3,model_4,model_5 is used.
  4. The max_template_date is of the form YYYY-MM-DD.
  5. For further explanations and additional options please see run_alphafold.py.

Launch script run

For your convenience, we have prepared a launch script run that takes care of the Singularity command and the database paths, since these are unlikely to change. If you do need to customize anything please use the full Singularity command in the previous section.

SLURM Script

Please copy and paste the following as a template for your SLURM script.

#!/bin/bash
#SBATCH -A mygroup      # your allocation account
#SBATCH -p gpu          # partition
#SBATCH --gres=gpu:1    # number of GPUs
#SBATCH -N 1            # number of nodes
#SBATCH -c 8            # number of cores
#SBATCH -t 10:00:00     # time

module purge
module load singularity alphafold

run --fasta_paths=/full/path/to/fasta \
    --output_dir=/full/path/to/outdir \
    --model_names= \
    --preset=[full_dbs|casp14] \
    --max_template_date=

You may need at least 8 CPU cores due to this line printed in the output:

Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpys2ocad8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./seq.fasta /share/resources/data/alphafold/mgnify/mgy_clusters.fa"