skills/bioinformatics-sequence/bowtie2-alignment

stars:0
forks:0
watches:0
last updated:N/A

Bowtie2 Alignment for ChIP-seq / ATAC-seq

Bowtie2 is the workhorse for ChIP-seq, ATAC-seq, and other short-fragment applications. It's faster than BWA-MEM for reads ≤ 100 bp and handles gapped alignment with reasonable sensitivity. The 2026 reality: Bowtie2 is still the default in most ChIP-seq and ATAC-seq pipelines, with STAR and BWA-MEM2 as alternatives for specific use cases.

When to use

  • ChIP-seq / CUT&RUN / CUT&Tag alignment.
  • ATAC-seq alignment (with --local for Tn5 soft-clipping).
  • Whole-genome bisulfite-seq (with --no-mixed --no-discordant).
  • Any read length 50-150 bp where speed matters.

When NOT to use

  • Long reads → use minimap2.
  • Variant calling on whole-genome data → use bwa-mem or bwa-mem2.
  • RNA-seq splice alignment → use STAR or HISAT2.

Prerequisites

  • bowtie2 ≥ 2.5
  • samtools ≥ 1.19
  • Reference FASTA + Bowtie2 index (bowtie2-build)

Code patterns

Index the reference

bowtie2-build --threads 8 reference/genome.fa genome_bt2
# Creates genome_bt2.{1,2,3,4,rev.1,rev.2}

For very large genomes, use the --large-index flag.

Paired-end alignment (ChIP-seq default)

bowtie2 -p 16 --no-mixed --no-discordant \
    -x genome_bt2 \
    -1 reads_R1.fq.gz -2 reads_R2.fq.gz \
    --rg-id sample1 --rg SM:sample1 --rg PL:ILLUMINA \
    --rg LB:lib1 |
  samtools sort -@ 8 -m 4G -o sample1.bam -
samtools index sample1.bam

--no-mixed --no-discordant ensures that only properly paired reads are reported; useful for fragment length analysis.

Single-end alignment (CUT&RUN, ATAC-seq)

bowtie2 -p 16 --local -x genome_bt2 -U reads.fq.gz |
  samtools sort -@ 8 -o s1.bam -
samtools index s1.bam

--local enables soft-clipping at read ends, which is essential for Tn5 transposase-cut reads in ATAC-seq.

--very-sensitive for low-input or divergent samples

bowtie2 --very-sensitive-local -p 16 -x bt2 -1 R1.fq -2 R2.fq | \
  samtools sort -@ 8 -o s.bam -

Sensitivity levels: --fast, --sensitive (default), --very-sensitive. Time increases roughly 2x between levels.

Allow unpaired reads from a paired run

Drop --no-mixed --no-discordant if you want to keep the unpaired mates:

bowtie2 -p 16 -x bt2 -1 R1.fq -2 R2.fq | samtools sort -@ 8 -o s.bam -

ATAC-seq specific: soft-clip Tn5, keep mitochondrial reads but flag them later

bowtie2 -p 16 --local --no-mixed --no-discordant -X 2000 \
    -x bt2 -1 R1.fq.gz -2 R2.fq.gz |
  samtools sort -@ 8 -o s.bam -

Then mark duplicates and remove chrM with:

samtools view -h s.bam | grep -v chrM | samtools view -b -o s.no_chrM.bam
samtools index s.no_chrM.bam

Read group in the header (for downstream tools)

bowtie2 -p 16 --rg-id s1 --rg SM:s1 --rg PL:ILLUMINA --rg LB:lib1 \
    -x bt2 -1 R1.fq -2 R2.fq | samtools sort -@ 8 -o s.bam -

Insert size distribution (for ChIP-seq fragment QC)

samtools view -f 2 s.bam | awk '{print $9}' | sort -n | uniq -c > insert_sizes.txt

nf-core integration

nf-core/chipseq and nf-core/atacseq default to Bowtie2. No CLI changes needed.

Common pitfalls

  • Using --end-to-end for ATAC-seq. Tn5 transposase inserts at precise offsets and the read may start with the transposon sequence, so --local is required.
  • No read group. Downstream tools (MACS2, deepTools) often require @RG headers.
  • Allowing --no-mixed --no-discordant for very low input. If most of your reads are unpaired, this discards them. Drop these flags.
  • Indexing the wrong reference. Bowtie2 indexes are large (.bt2 files). Keep them in a reference/ directory.
  • Confusing Bowtie1 and Bowtie2. Bowtie1 is ungapped; for ChIP/ATAC, you almost always want Bowtie2.

Validation

  • samtools flagstat s.bam — should show high mapping rate (≥80% for ChIP-seq, ≥95% for ATAC-seq excluding chrM).
  • samtools view -c -f 2 s.bam — properly-paired count.
  • Insert size histogram: ChIP-seq TF should peak at ~150-300 bp; nucleosome-depleted ChIP-seq at ~50-100 bp; ATAC-seq at sub-nucleosomal + nucleosomal modes.
  • samtools view -F 4 s.bam | wc -l — mapped reads.

Open alternatives

NeedTool
RNA-seq splice-awareSTAR, HISAT2
WGS variant callingbwa-mem, bwa-mem2
Long readsminimap2
ChIP-seq peak callerMACS2, MACS3
ATAC-seq peak callerMACS2, Genrich

References

Changelog

  • 1.0.0 (2026-06-10): Initial adaptation by Pradyumna Jayaram from bio-bowtie2-alignment (bioSkills-main/read-alignment/bowtie2-alignment)."
    Good AI Tools