skills/bioinformatics-sequence/bwa-mem2-alignment

stars:0
forks:0
watches:0
last updated:N/A

BWA-MEM2 — Faster BWA-MEM

BWA-MEM2 is a drop-in replacement for BWA-MEM that uses SIMD vectorization and multi-threading for 2-3x speedup. The alignment results are essentially identical to BWA-MEM (some tie-breaking edge cases differ). If you're running BWA-MEM in 2026 without considering BWA-MEM2, you're leaving 50% of your CPU budget on the table.

When to use

  • Any place you'd use BWA-MEM with read length ≥ 70 bp.
  • Production WGS / WES / exome pipelines where alignment time is the bottleneck.
  • Cloud or on-prem where CPU costs matter.

When NOT to use

  • BWA-ALN-style very short reads (BWA-MEM2 doesn't replace BWA-ALN, but BWA-ALN is also rarely the right choice in 2026).
  • Long reads — use minimap2.
  • When exact parity with BWA-MEM v0.7.17 is required (rare; aligner tie-breaking can differ for low MAPQ reads).

Prerequisites

  • bwa-mem2 ≥ 2.2.1 (built with AVX2 / AVX-512)
  • samtools ≥ 1.19
  • Reference FASTA

Code patterns

Index the reference

bwa-mem2 index reference/genome.fa
# Creates genome.fa.{0123, amb, ann, bwt.2bit.64, pac}

The index format is different from BWA-MEM (bwt.2bit.64 vs bwt), so a BWA-MEM2 index is not compatible with BWA-MEM and vice versa. Pick one and stick with it.

Paired-end alignment

The CLI is intentionally a strict superset of BWA-MEM, so the standard pipeline works:

bwa-mem2 mem -t 16 -M -K 100000000 \
    -R '@RG\tID:s1\tSM:s1\tPL:ILLUMINA\tLB:lib1' \
    reference/genome.fa \
    reads/R1.fq.gz reads/R2.fq.gz |
  samtools sort -@ 8 -m 4G -o s1.sorted.bam -
samtools index s1.sorted.bam

Single-end alignment

bwa-mem2 mem -t 16 -M -R '@RG\tID:s1\tSM:s1' ref.fa reads.fq.gz |
  samtools sort -@ 8 -o s1.bam -
samtools index s1.bam

Index a reference for the mem command

BWA-MEM2 2.2.x has a single index command for the modern mem algorithm (the older bwtgen and bwt2ix are deprecated):

bwa-mem2 index ref.fa

Benchmark vs BWA-MEM

# Time both on the same input
time bwa mem -t 16 -M ref.fa R1.fq.gz R2.fq.gz > /dev/null
time bwa-mem2 mem -t 16 -M ref.fa R1.fq.gz R2.fq.gz > /dev/null

Typical on a 30x human WGS sample: BWA-MEM takes ~6-8 hours with 16 cores; BWA-MEM2 takes ~2.5-3.5 hours on the same hardware.

Read-group aware multi-sample loop

for r1 in reads/*_R1.trimmed.fq.gz; do
  base=$(basename "$r1" _R1.trimmed.fq.gz)
  r2="reads/${base}_R2.trimmed.fq.gz"
  bwa-mem2 mem -t 16 -M -R "@RG\tID:${base}\tSM:${base}\tPL:ILLUMINA" \
      ref.fa "$r1" "$r2" |
    samtools sort -@ 8 -o "bam/${base}.bam" -
  samtools index "bam/${base}.bam"
done

Plug into nf-core/sarek

nf-core/sarek (the standard germline + somatic pipeline) defaults to BWA-MEM2 as of v3.2. No CLI changes needed for users.

Flags that matter

FlagBehavior
-t NThreads (BWA-MEM2 auto-uses multiple threads internally even without this, but -t controls the parallel thread pool)
-MPicard-style split/marking
-K 100MDisable seed chunking (good for high-quality short reads)
-YSoft-clip supplementary (for variant callers)
-jALT-aware soft clipping (GRCh38)
-R '@RG...'Read group
-k 19Min seed length (default 19)
-aOutput all alignments (chimera detection)

Common pitfalls

  • Index file format incompatibility. BWA-MEM2's .bwt.2bit.64 is not BWA-MEM's .bwt. If you bwa-mem2 index then bwa mem, the alignment will fail with a missing file.
  • Single-threaded alignment gives no speedup. BWA-MEM2's wins are all from internal threading. Always use -t 16 or higher.
  • AVX-512 build requires compatible CPU. Most x86 servers since 2017 have AVX-512; consumer CPUs may only have AVX2 (still 1.5-2x faster than BWA-MEM).
  • Memory. BWA-MEM2 peaks at ~10-12 GB for human reference + 8 threads.
  • Read-group format must be valid tab-separated. A space inside @RG will break it; use \t.

Validation

  • samtools flagstat output is essentially identical to BWA-MEM v0.7.17.
  • For a parity check, align a small test sample with both and diff the FLAG/MAPQ columns — only a tiny fraction of MAPQ=0 reads should differ.
  • Coverage and mapping rate should be within 0.1% of BWA-MEM.

Open alternatives

NeedTool
Even faster (GPU)fast-bwa-mem2 (NVIDIA Parabricks), but closed source
Long readsminimap2
Splice-aware RNASTAR, HISAT2
Short ChIP-seq readsbowtie2

References

  • BWA-MEM2 paper: Vasimuddin et al. 2019 — 10.1109/HPEC.2019.8916176
  • BWA-MEM2 GitHub: https://github.com/bwa-mem2/bwa-mem2
  • Companion: ors-bioinformatics-sequence-bwa-alignment, ors-bioinformatics-sequence-samtools-bam-processing.

Changelog

  • 1.0.0 (2026-06-10): Initial adaptation by Pradyumna Jayaram from SciAgent bwa-mem2-dna-aligner skill; brought in line with BWA-MEM2 2.2.1.
    Good AI Tools