« Return to HPC Overview

BWA and UVA HPC

Description

Burrows-Wheeler Aligner (BWA) is an efficient program that aligns
relatively short nucleotide sequences against a long reference sequence such as the human genome.

BWA provides three alignment algorithms:

BWA-backtrack
BWA-SW
BWA-MEM

The BWA-backtrack algorithm is exclusively used for short sequence reads up to 100bp, the latter two can be used for sequence reads of up to 1MB. The BWA-MEM algorithm can also be used for high-quality short Illumina sequence reads (< 100bp) in many cases with better performance compared to the original BWA-backtrack algorithm. Therefore, the more universal BWA-MEM algorithm is recommended as a starting point for most alignment scenarios.

Before any of the alignment algorithms can be used, a FM-index needs to be constructed for the reference genome (see below).

**Software Category:** bio

For detailed information, visit the BWA website.

For a GitHub reference, visit: https://github.com/lh3/bwa

Available Versions

The current installation of BWA incorporates the most popular packages. To find the available versions and learn how to load them, run:

module spider bwa

The output of the command shows the available BWA module versions.

For detailed information about a particular BWA module, including how to load the module, run the module spider command with the module’s full version label. For example:

module spider bwa/0.7.17

Module	Version	Module Load Command
bwa	0.7.17	module load bwa/0.7.17

Slurm Script Examples

Creating a BWA Index for a Reference Genome

Index files are created with the bwa index command. A reference genome sequence in FASTA format needs to be provided, e.g. /scratch/$USER/bwaanalysis/refgenome.fa

#!/bin/bash
#SBATCH -A YOUR_ALLOCATION
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=64000
#SBATCH -p standard

# Run program
module purge
module load bwa
module list

cd /scratch/$USER/bwaanalysis

# replace refgnome.fa with the name of your reference genome
# reference in FASTA format
bwa index refgenome.fa

Alignment of Sequence Reads to a Reference Genome

BWA provides three basic alignment algorithms to align sequence reads to a reference genome, BWA-backtrack, BWA-SW, and BWA-MEM. Below we show an example for using the BWA-MEM algorithm (command bwa mem), which can process short Illumina reads (70bp) as well as longer reads up to 1 MB. The alignment output is saved in SAM file format. The use of SAMtools on the HPC system is documented here.

Specification of files

Reference genome file: /scratch/$USER/bwaanalysis/refgenome.fa
Sequence read file 1: /scratch/$USER/bwaanalysis/read1.fq
Sequence read file 2: /scratch/$USER/bwaanalysis/read2.fq
Output Alignment (SAM file): /scratch/$USER/bwaanalysis/aln-pe.sam

#!/bin/bash
#SBATCH -A YOUR_ALLOCATION
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=9000
#SBATCH -p standard

# Run program
module purge
module load bwa
module list

cd /scratch/$USER/bwaanalysis

# using paired-ends reads from two .fq sequence files
bwa mem refgenome.fa read1.fq read2.fq -t $SLURM_CPUS_PER_TASK > aln-pe.sam

Notes:

The use of -t $SLURM_CPUS_PER_TASK to define the numbe of processing threads based on the numbe of requested cpu core (1 thread / cpu core). Follow the online BWA documentation to adjust parameters for aligning single-end reads.
The use of --mem and --mem-per-cpu options are mutually exclusive. Job scripts should specify one or the other but not both.

Using an Interactive Session to run BWA

You should NOT do your computational processing on the head node. In order to obtain a login shell on a compute node, use the ijob command.

ijob -N 1 -n 1 -A <YOUR_ALLOCATION> -p standard -c 20 --mem=20000

Replace <YOUR_ALLOCATION> with your account name to charge SUs. The arguments for -c and --mem options depend on the resources you will use for the alignment step. For more details about submitting interactive jobs please see here.

Load module

First, let us load the bwa module:

module load bwa

In order to check all available bwa commands run:

bwa

If you wish to check various options for each command run:

bwa index

bwa mem

Updated June 22, 2019 | HPC, software, bio multi-core