BWA provides three alignment algorithms:
The BWA-backtrack algorithm is exclusively used for short sequence reads up to 100bp, the latter two can be used for sequence reads of up to 1MB. The BWA-MEM algorithm can also be used for high-quality short Illumina sequence reads (< 100bp) in many cases with better performance compared to the original BWA-backtrack algorithm. Therefore, the more universal BWA-MEM algorithm is recommended as a starting point for most alignment scenarios.
Before any of the alignment algorithms can be used, a FM-index needs to be constructed for the reference genome (see below).
Software Category: bio
For detailed information, visit the BWA website.
To find the available versions and learn how to load them, run:
module spider bwa
The output of the command shows the available BWA module versions.
For detailed information about a particular BWA
module, including how to load the module, run the
module spider command with the module’s full version label. For example:
module spider bwa/0.7.15
|Module||Version||Module Load Command|
|bwa||0.7.15||module load gcc/7.1.0 bwa/0.7.15|
|bwa||0.7.17||module load gcc/7.1.0 bwa/0.7.17|
SLURM Script Examples
Creating a BWA Index for a Reference Genome
Index files are created with the
bwa index command. A reference genome sequence in FASTA format needs to be provided, e.g.
#!/bin/bash #SBATCH -A YOUR_ACCOUNT #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=64000 #SBATCH -p standard #Run program module purge module load bwa cd /scratch/$USER/bwaanalysis # reference in FASTA format bwa index refgenome.fa
Alignment of Sequence Reads to a Reference Genome
BWA provides three basic alignment algorithms to align sequence reads to a reference genome, BWA-backtrack, BWA-SW, and BWA-MEM. Below we show an example for using the BWA-MEM algorithm (command bwa mem), which can process short Illumina reads as well as longer reads up to 1 MB. The alignment output is saved in SAM file format. The use of SAMtools on Rivanna is documented here.
Specification of files
- Reference genome file:
- Sequence read file 1:
- Sequence read file 2:
Output Alignment (SAM file):
#!/bin/bash #SBATCH -A YOUR_ACCOUNT #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 #SBATCH --mem-per-cpu=6000 #SBATCH -p standard #Run program module purge module load bwa cd /scratch/$USER/bwaanalysis # using paired-ends reads from two .fq sequence files bwa mem refgenome.fa read1.fq read2.fq -t $SLURM_CPUS_PER_TASK > aln-pe.sam
Note the use of
-t $SLURM_CPUS_PER_TASK to define the numbe of processing threads based on the numbe of requested cpu core (1 thread / cpu core). Follow the online BWA
documentation to adjust parameters for aligning single-end reads.