Burrows-Wheeler Aligner (BWA) is an efficient program that aligns
relatively short nucleotide sequences against a long reference sequence such as the human genome.
BWA provides three alignment algorithms:
The BWA-backtrack algorithm is exclusively used for short sequence reads up to 100bp, the latter two can be used for sequence reads of up to 1MB. The BWA-MEM algorithm can also be used for high-quality short Illumina sequence reads (< 100bp) in many cases with better performance compared to the original BWA-backtrack algorithm. Therefore, the more universal BWA-MEM algorithm is recommended as a starting point for most alignment scenarios.
Before any of the alignment algorithms can be used, a FM-index needs to be constructed for the reference genome (see below).
**Software Category:** bio
For detailed information, visit the BWA website.
For a GitHub reference, visit: https://github.com/lh3/bwa
The current installation of BWA incorporates the most popular packages. To find the available versions and learn how to load them, run:
module spider bwa
The output of the command shows the available BWA module versions.
For detailed information about a particular BWA
module, including how to load the module, run the
module spider command with the module’s full version label. For example:
module spider bwa/0.7.15
|Module||Version||Module Load Command|
|bwa||0.7.15||module load gcc/7.1.0 bwa/0.7.15|
|bwa||0.7.17||module load gcc/9.2.0 bwa/0.7.17|
SLURM Script Examples
Creating a BWA Index for a Reference Genome
Index files are created with the
bwa index command. A reference genome sequence in FASTA format needs to be provided, e.g.
#!/bin/bash #SBATCH -A YOUR_ALLOCATION #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=64000 #SBATCH -p standard # Run program module purge module load bwa module list cd /scratch/$USER/bwaanalysis # replace refgnome.fa with the name of your reference genome # reference in FASTA format bwa index refgenome.fa
Alignment of Sequence Reads to a Reference Genome
BWA provides three basic alignment algorithms to align sequence reads to a reference genome, BWA-backtrack, BWA-SW, and BWA-MEM. Below we show an example for using the BWA-MEM algorithm (command
bwa mem), which can process short Illumina reads (70bp) as well as longer reads up to 1 MB. The alignment output is saved in SAM file format. The use of SAMtools on Rivanna is documented here.
Specification of files
- Reference genome file:
- Sequence read file 1:
- Sequence read file 2:
- Output Alignment (SAM file):
#!/bin/bash #SBATCH -A YOUR_ALLOCATION #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 #SBATCH --mem-per-cpu=9000 #SBATCH -p standard # Run program module purge module load bwa module list cd /scratch/$USER/bwaanalysis # using paired-ends reads from two .fq sequence files bwa mem refgenome.fa read1.fq read2.fq -t $SLURM_CPUS_PER_TASK > aln-pe.sam
- The use of
-t $SLURM_CPUS_PER_TASKto define the numbe of processing threads based on the numbe of requested cpu core (1 thread / cpu core). Follow the online BWA documentation to adjust parameters for aligning single-end reads.
- The use of
--mem-per-cpuoptions are mutually exclusive. Job scripts should specify one or the other but not both.
Using an Interactive Session to run BWA
You should NOT do your computational processing on the head node. In order to obtain a login shell on a compute node, use the
ijob -N 1 -n 1 -A <YOUR_ALLOCATION> -p standard -c 20 --mem=20000
<YOUR_ALLOCATION> with your account name to charge SUs. The arguments for
--mem options depend on the resources you will use for the alignment step. For more details about submitting interactive jobs please see here.
First, let us load the bwa module:
module load bwa
In order to check all available
bwa commands run:
If you wish to check various options for each command run: