BWA-MEM2 — Faster BWA-MEM

BWA-MEM2 is a drop-in replacement for BWA-MEM that uses SIMD vectorization and multi-threading for 2-3x speedup. The alignment results are essentially identical to BWA-MEM (some tie-breaking edge cases differ). If you're running BWA-MEM in 2026 without considering BWA-MEM2, you're leaving 50% of your CPU budget on the table.

When to use

Any place you'd use BWA-MEM with read length ≥ 70 bp.
Production WGS / WES / exome pipelines where alignment time is the bottleneck.
Cloud or on-prem where CPU costs matter.

When NOT to use

BWA-ALN-style very short reads (BWA-MEM2 doesn't replace BWA-ALN, but BWA-ALN is also rarely the right choice in 2026).
Long reads — use minimap2.
When exact parity with BWA-MEM v0.7.17 is required (rare; aligner tie-breaking can differ for low MAPQ reads).

Prerequisites

bwa-mem2 ≥ 2.2.1 (built with AVX2 / AVX-512)
samtools ≥ 1.19
Reference FASTA

Code patterns

Index the reference

bwa-mem2 index reference/genome.fa
# Creates genome.fa.{0123, amb, ann, bwt.2bit.64, pac}

The index format is different from BWA-MEM (bwt.2bit.64 vs bwt), so a BWA-MEM2 index is not compatible with BWA-MEM and vice versa. Pick one and stick with it.

Paired-end alignment

The CLI is intentionally a strict superset of BWA-MEM, so the standard pipeline works:

bwa-mem2 mem -t 16 -M -K 100000000 \
    -R '@RG\tID:s1\tSM:s1\tPL:ILLUMINA\tLB:lib1' \
    reference/genome.fa \
    reads/R1.fq.gz reads/R2.fq.gz |
  samtools sort -@ 8 -m 4G -o s1.sorted.bam -
samtools index s1.sorted.bam

Single-end alignment

bwa-mem2 mem -t 16 -M -R '@RG\tID:s1\tSM:s1' ref.fa reads.fq.gz |
  samtools sort -@ 8 -o s1.bam -
samtools index s1.bam

Index a reference for the mem command

BWA-MEM2 2.2.x has a single index command for the modern mem algorithm (the older bwtgen and bwt2ix are deprecated):

bwa-mem2 index ref.fa

Benchmark vs BWA-MEM

# Time both on the same input
time bwa mem -t 16 -M ref.fa R1.fq.gz R2.fq.gz > /dev/null
time bwa-mem2 mem -t 16 -M ref.fa R1.fq.gz R2.fq.gz > /dev/null

Typical on a 30x human WGS sample: BWA-MEM takes ~6-8 hours with 16 cores; BWA-MEM2 takes ~2.5-3.5 hours on the same hardware.

Read-group aware multi-sample loop

for r1 in reads/*_R1.trimmed.fq.gz; do
  base=$(basename "$r1" _R1.trimmed.fq.gz)
  r2="reads/${base}_R2.trimmed.fq.gz"
  bwa-mem2 mem -t 16 -M -R "@RG\tID:${base}\tSM:${base}\tPL:ILLUMINA" \
      ref.fa "$r1" "$r2" |
    samtools sort -@ 8 -o "bam/${base}.bam" -
  samtools index "bam/${base}.bam"
done

Plug into nf-core/sarek

nf-core/sarek (the standard germline + somatic pipeline) defaults to BWA-MEM2 as of v3.2. No CLI changes needed for users.

Flags that matter

Flag	Behavior
`-t N`	Threads (BWA-MEM2 auto-uses multiple threads internally even without this, but `-t` controls the parallel thread pool)
`-M`	Picard-style split/marking
`-K 100M`	Disable seed chunking (good for high-quality short reads)
`-Y`	Soft-clip supplementary (for variant callers)
`-j`	ALT-aware soft clipping (GRCh38)
`-R '@RG...'`	Read group
`-k 19`	Min seed length (default 19)
`-a`	Output all alignments (chimera detection)

Common pitfalls

Index file format incompatibility. BWA-MEM2's .bwt.2bit.64 is not BWA-MEM's .bwt. If you bwa-mem2 index then bwa mem, the alignment will fail with a missing file.
Single-threaded alignment gives no speedup. BWA-MEM2's wins are all from internal threading. Always use -t 16 or higher.
AVX-512 build requires compatible CPU. Most x86 servers since 2017 have AVX-512; consumer CPUs may only have AVX2 (still 1.5-2x faster than BWA-MEM).
Memory. BWA-MEM2 peaks at ~10-12 GB for human reference + 8 threads.
Read-group format must be valid tab-separated. A space inside @RG will break it; use \t.

Validation

samtools flagstat output is essentially identical to BWA-MEM v0.7.17.
For a parity check, align a small test sample with both and diff the FLAG/MAPQ columns — only a tiny fraction of MAPQ=0 reads should differ.
Coverage and mapping rate should be within 0.1% of BWA-MEM.

Open alternatives

Need	Tool
Even faster (GPU)	`fast-bwa-mem2` (NVIDIA Parabricks), but closed source
Long reads	`minimap2`
Splice-aware RNA	`STAR`, `HISAT2`
Short ChIP-seq reads	`bowtie2`

References

BWA-MEM2 paper: Vasimuddin et al. 2019 — 10.1109/HPEC.2019.8916176
BWA-MEM2 GitHub: https://github.com/bwa-mem2/bwa-mem2
Companion: ors-bioinformatics-sequence-bwa-alignment, ors-bioinformatics-sequence-samtools-bam-processing.

Changelog

1.0.0 (2026-06-10): Initial adaptation by Pradyumna Jayaram from SciAgent bwa-mem2-dna-aligner skill; brought in line with BWA-MEM2 2.2.1.

skills/bioinformatics-sequence/bwa-mem2-alignment