Version Compatibility

Reference examples tested with: AutoDock Vina 1.2.5+, SMINA 2020-12+, GNINA 1.1+, RDKit 2024.09+, meeko 0.5+, P2Rank 2.4+, ProDy 2.4+, pdb2pqr 3.6+.

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: vina --version; gnina --version; smina --version

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Virtual Screening

Screen chemical libraries against protein targets via molecular docking. Vina is the de-facto default, SMINA adds flexibility (Vinardo scoring, custom scoring), and GNINA adds CNN-based pose scoring. Deep-learning docking (DiffDock-L, EquiBind, NeuralPLexer) competes in pose accuracy but fails physical-plausibility tests (PoseBusters) ~50% of the time; the postdoc-grade workflow combines ML pose sampling with classical scoring and physical validation. For ultralarge libraries (>1M), library preparation, hierarchical filtering, and HPC orchestration become the limiting steps.

For pose physical-validity QC, see chemoinformatics/pose-validation. For ML-driven docking + rescoring, see chemoinformatics/docking-rescoring. For covalent docking, see chemoinformatics/covalent-design. For affinity calculations (FEP), see chemoinformatics/free-energy-calculations.

Docking Tool Taxonomy

Tool	Scoring	Speed (sec/lig)	Best at	Fails when
AutoDock Vina 1.2	Vina (empirical)	5-30	Default, well-validated	Cross-dock; cryptic pockets; metal centers
SMINA	Vina + flexible + custom	5-30	Custom scoring; flexible side chains	Same Vina-scoring caveats
Vinardo	Vinardo (improved Vina)	5-30	Better scoring than Vina on DUD-E	Limited adoption
GNINA 1.1	CNN (default) or Vina	30-120	Pose ranking, redocking	Slower; GPU recommended
AutoDock 4	AD4 + grid maps	30-60	Legacy reference	Slower than Vina
DOCK 6/7	DOCK + Amber	60-300	UCSF DOCK ecosystem	Steep learning curve
Glide (Schrodinger)	GlideScore	5-30	Commercial SOTA	License cost
GOLD (CCDC)	GOLDScore / ChemScore	60-180	Commercial; metal coordination	License cost
FlexX (BioSolveIT)	FlexX	30-60	Fragment-based	License cost
rDock	rDock	30-60	Open-source alternative	Slower than Vina
DiffDock-L	Diffusion-generative	5 (GPU)	Pose sampling for cross-docking	High PB-invalid rate (~50%)
EquiBind	Equivariant NN	<1 (GPU)	Single-shot pose	Lowest accuracy in PoseBuster benchmarks
Boltz-2 + GNINA rescore	Foundation model + CNN	10 (GPU)	Modern SOTA hybrid	High GPU; not all proteins

Decision: For most screens, GNINA 1.1 with CNN scoring is the modern default (better than Vina on every benchmark; 30s/ligand on GPU). For >1M library scale, hierarchical Vina -> GNINA -> MM/GBSA. For cross-docking (predicted target structure or apo-holo), GNINA's CNN scoring transfers better than Vina.

Decision Tree by Scenario

Scenario	Recommended workflow
Self-dock against known ligand pocket	GNINA `gnina --cnn_scoring rescore`
Cross-dock to apo or related-target structure	DiffDock-L pose + GNINA rescore + PoseBusters
Ultralarge library (10M+)	Vina hierarchical: pre-filter (Lipinski, PAINS) -> Vina dock -> top 1% GNINA rescore -> top 0.1% MM/GBSA
Cryptic pocket / induced fit	Ensemble docking (10 conformer snapshots) + AlphaFold3 or Boltz-1 holo prediction
Allosteric / undefined site	P2Rank for pocket detection -> ensemble dock all pockets
Metal-coordinated ligand	GOLD (commercial) or manually parameterize Vina metal scoring
Covalent inhibitor	See `chemoinformatics/covalent-design`: DOCKovalent, HCovDock
Fragment screen (<300 Da)	rDock or constrained Vina with seed atoms
Hit-to-lead refinement	Use co-crystal structure if available; MD-relaxed receptor; FEP for affinity

Receptor Preparation

Goal: Convert a protein PDB into a docking-ready format with correct protonation, missing atoms, and removed waters.

Approach: Strip ligands and waters -> fill missing atoms (PROPKA or pdbfixer) -> add hydrogens at pH 7.4 (PDB2PQR / Reduce) -> assign partial charges -> convert to PDBQT (Open Babel).

import subprocess

def prepare_receptor(pdb_in, pdbqt_out, pH=7.4, remove_het=True):
    base = pdb_in.replace('.pdb', '')
    fixed = f'{base}_fixed.pdb'
    propka_pdb = f'{base}_propka.pdb'

    if remove_het:
        with open(pdb_in) as fin, open(f'{base}_clean.pdb', 'w') as fout:
            for line in fin:
                if line.startswith('HETATM') and 'HOH' in line:
                    continue
                if line.startswith('HETATM'):
                    continue
                fout.write(line)

    subprocess.run(['pdb2pqr', '--ff=AMBER', f'--with-ph={pH}',
                    f'{base}_clean.pdb', propka_pdb], check=True)
    subprocess.run(['obabel', propka_pdb, '-O', pdbqt_out,
                    '-xr', '--partialcharge', 'gasteiger'], check=True)
    return pdbqt_out

Common pitfall: Forgetting to add hydrogens at protein pH (7.4) but using pH 7.0 ligand charges. Hist mistakenly protonated. Use PROPKA + manual review of catalytic residues.

Ligand Preparation

Goal: Generate a 3D, docking-ready ligand file from SMILES with appropriate protonation and conformation.

Approach: Protonate at pH 7.4 with Chem.MolFromSmiles then rdMolStandardize.Uncharger -> embed 3D with ETKDGv3 -> minimize with MMFF94 -> assign Gasteiger charges -> write PDBQT with meeko.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.MolStandardize import rdMolStandardize
from meeko import MoleculePreparation, PDBQTWriterLegacy

def prepare_ligand(smiles, pdbqt_out):
    mol = Chem.MolFromSmiles(smiles)
    uncharger = rdMolStandardize.Uncharger()
    mol = uncharger.uncharge(mol)
    mol = Chem.AddHs(mol)
    AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())
    AllChem.MMFFOptimizeMolecule(mol)

    mk_prep = MoleculePreparation()
    setups = mk_prep.prepare(mol)
    pdbqt_text, is_ok, err = PDBQTWriterLegacy.write_string(setups[0])
    if not is_ok:
        raise RuntimeError(f'meeko PDBQT export failed: {err}')
    with open(pdbqt_out, 'w') as f:
        f.write(pdbqt_text)
    return pdbqt_out

meeko (AutoDock developers' tool) handles torsion tree creation, rotamer flagging, and PDBQT writing -- preferred over Open Babel's PDBQT writer. Note: meeko 0.5+ separated the writer (PDBQTWriterLegacy) from MoleculePreparation; older code using prep.write_pdbqt_file() is deprecated.

Binding Site Detection

When the binding pocket is not known (apo target, novel allosteric site):

Tool	Approach	Output
P2Rank	ML on protein surface descriptors	Ranked pocket list with center coords
fpocket	Voronoi tessellation	Pocket descriptor list
DoGSiteScorer	Geometric + drugability	Pocket list with score
AutoSite (Vina)	Affinity map clustering	Pocket centers
AlphaFill	Co-fold known ligand	Plausible binding site

prank predict -f receptor.pdb -o pockets/

P2Rank output <receptor>_predictions.csv lists pocket centers with scores. Pocket 1 (highest score) is typically the orthosteric site for known protein families.

Vina Docking (Single Ligand)

# AutoDock Vina Python API requires Vina 1.2+; for Vina 1.1 use subprocess CLI:
# subprocess.run(['vina', '--receptor', ..., '--ligand', ..., '--center_x', ...], check=True)
from vina import Vina

def dock_single(receptor_pdbqt, ligand_pdbqt, center, box_size,
                exhaustiveness=8, n_poses=10):
    v = Vina(sf_name='vina')
    v.set_receptor(receptor_pdbqt)
    v.set_ligand_from_file(ligand_pdbqt)
    v.compute_vina_maps(center=center, box_size=box_size)
    v.dock(exhaustiveness=exhaustiveness, n_poses=n_poses)
    return v.energies(), v.poses()

Exhaustiveness:

8: default, ~30s/ligand, acceptable for screening
16: ~60s/ligand, lead-like prioritization
32: ~2 min/ligand, top-hit re-docking
64: ~4 min/ligand, production / co-crystal comparison

Vina pose RMSD lower bound rmsd_lb and upper bound rmsd_ub are NOT pose-vs-reference RMSDs; they are bounds on the conformational space sampled. Use external RMSD to a reference pose for accuracy QC.

GNINA with CNN Scoring (modern default)

gnina -r receptor.pdb -l ligand.sdf \
      --autobox_ligand reference_ligand.sdf \
      --cnn_scoring rescore \
      -o poses.sdf.gz \
      --num_modes 9 --exhaustiveness 8

--cnn_scoring:

rescore (default): Vina sampling + CNN scoring (best validated)
refinement: CNN refines top Vina pose
metrorefine: Metropolis-style CNN refinement (slower, ~10% better pose accuracy)
none: Vina-only (use SMINA equivalently)

--autobox_ligand: define box from reference ligand SDF/PDB. Otherwise specify --center_x/y/z + --size_x/y/z.

Critical: GNINA's CNN was trained on PDBbind 2019; performance on novel chemotypes outside this distribution is reduced. Validate with a known co-crystal redock if possible.

Virtual Screening Pipeline (Hierarchical)

Goal: Screen 10M-compound library down to top-1k candidates for follow-up.

Approach: Three-stage filter:

import pandas as pd
from concurrent.futures import ProcessPoolExecutor

# Stub helpers to be implemented per project; see the cross-referenced skills.
def drug_like_filter(df):
    raise NotImplementedError('See chemoinformatics/admet-prediction (Lipinski+Veber+PAINS)')
def vina_dock(smi, receptor_pdbqt, center, box):
    raise NotImplementedError('Wrap dock_single() above; return best affinity')
def gnina_rescore(smi, receptor_pdbqt, center, box):
    raise NotImplementedError('Wrap gnina --cnn_scoring rescore subprocess call')
def pose_validate(df):
    raise NotImplementedError('See chemoinformatics/pose-validation (PoseBusters)')

def vs_pipeline(library_smi, receptor_pdbqt, center, box, output_dir, n_workers=16):
    df = pd.read_csv(library_smi)
    df_stage1 = drug_like_filter(df)  # ~10x reduction

    with ProcessPoolExecutor(max_workers=n_workers) as ex:
        affinities = list(ex.map(
            lambda smi: vina_dock(smi, receptor_pdbqt, center, box),
            df_stage1['smiles']))
    df_stage1['vina_affinity'] = affinities
    df_stage2 = df_stage1.nsmallest(int(len(df_stage1) * 0.01), 'vina_affinity')

    df_stage2['gnina_affinity'] = df_stage2['smiles'].apply(
        lambda smi: gnina_rescore(smi, receptor_pdbqt, center, box))
    df_stage3 = df_stage2.nsmallest(1000, 'gnina_affinity')

    return pose_validate(df_stage3)

For 10M-compound libraries (ZINC22, Enamine REAL), use slurm-orchestrated parallel docking. Single-node GPU GNINA: ~3000 ligands/day. SLURM cluster (50 nodes): ~150k ligands/day.

Ultralarge Library Screening (ZINC22, Enamine REAL)

Library	Size	Format	Notes
ZINC22	4.1B	3D SMILES + xyz	Pre-built; 37GB tranches
Enamine REAL	29B	SMILES	Make-on-demand; via Enamine
Enamine HTS	4M	SMILES + SDF	Stock for HTS
Mcule	25M	SMILES	Real, in-stock
ChEMBL	2.5M	SMILES + bioactivity	Activity-labeled

For ultralarge VS:

Pre-filter to drug-like (Lipinski + Ro5 + PAINS) -> typically 10-20x reduction
Pre-filter by 2D similarity to known actives (Tanimoto >= 0.4 via ECFP4) -> 100x reduction
Vina dock the filtered subset
Rescore top 1% with GNINA
Rescore top 0.1% with MM/GBSA or FEP

Lyu et al. 2019 ZINC15 ultralarge VS gold standard: 138M docked, top 96 hits picked, 13 nM-µM actives.

Per-Tool Failure Modes

Vina -- cross-dock failure

Trigger: Receptor structure not the holo (co-crystal with ligand from another binder).

Mechanism: Vina pose accuracy degrades 2-3x from self-dock (RMSD<=2A 70%) to cross-dock (RMSD<=2A 30%).

Symptom: Top-ranked pose makes no geometric sense; key contacts missing.

Fix: GNINA CNN scoring or ensemble docking. For genuine apo, predict holo with AlphaFold3 / Boltz-1 then dock.

GNINA CNN -- novel chemotype out-of-distribution

Trigger: Ligand chemotype not in PDBbind training.

Mechanism: CNN scoring overfits to PDBbind chemotypes; novel macrocycle / peptide / PROTAC scores poorly.

Symptom: Affinity prediction far worse than Vina alone.

Fix: Use --cnn_scoring rescore (sampling still by Vina) rather than CNN sampling. Validate against co-crystal of close analog.

Box too small

Trigger: Binding box defined tightly around small ligand reference.

Mechanism: Vina explores only within the box; large analogs cannot fit. " Symptom: Many ligands report "no valid pose"; chemotype-biased hits.

Fix: Add padding 5-10 A beyond reference ligand bounding box. For unknown ligand size, use 25x25x25 A.

Multi-pocket protein -- wrong site

Trigger: Protein has multiple binding sites (orthosteric + allosteric).

Mechanism: P2Rank or AutoBox picks the most "drugable" pocket; not always the desired one.

Symptom: Hits dock in wrong pocket; SAR confusing.

Fix: Verify pocket from co-crystal data; explicitly set center_x/y/z from known ligand centroid.

DiffDock-L -- PoseBusters invalid

Trigger: Default DiffDock-L output for any receptor.

Mechanism: Diffusion generates poses without physical-validity loss; ~50% fail planarity / stereochemistry / vdW overlap tests.

Symptom: Poses look reasonable but fail PoseBusters checks.

Fix: Filter to PB-valid (PoseBusters); rescore with GNINA. See chemoinformatics/pose-validation.

Wrong ionization state

Trigger: Ligand or receptor residues protonated incorrectly at pH 7.4.

Mechanism: Aspartate/glutamate/histidine protonation depends on local environment; default protonation may be wrong.

Symptom: Salt bridges missing; poses misranked.

Fix: Run PROPKA on receptor (cabinet residue pKas); for catalytic His, manually inspect rotamer state.

Reconciliation: Vina vs GNINA Disagreement

Vina top pose	GNINA top pose	Action
Same pose, similar score	Same pose, similar score	Trust pose; use GNINA score for ranking
Vina top pose ≠ GNINA top pose	Same pocket, different orientation	Use GNINA (better trained for pose ranking)
Vina excellent, GNINA mediocre	Different pose, very different score	GNINA found pose Vina missed; trust GNINA
Both poor scores	Many ligands score similarly poor	Wrong pocket / protein conformation; reconsider receptor

Common Errors

Symptom	Cause	Fix
Vina segfault	PDBQT corrupted (atom names)	Re-prep with meeko
GNINA hangs	GPU OOM	Reduce batch / `-num_modes 5`
All affinities very poor (-3 to -5)	Wrong protonation; ligand too large for box	Re-check pKa; expand box
Identical affinity across ligands	Receptor grid not computed	Call `v.compute_vina_maps()` before dock
Pose poses make no sense	Receptor and ligand in different frames	Ensure same coordinate origin
AutoDock 4 needed	Vina cannot handle Zn / Fe metals	Use AutoDock 4 with metal scoring or GOLD
GPU mode slow	Vina is CPU-only; only GNINA is GPU	Use GNINA for GPU; Vina is multi-core CPU

References

Trott & Olson, J. Comput. Chem. 31:455 -- AutoDock Vina.
Eberhardt et al., J. Chem. Inf. Model. 61:3891 -- Vina 1.2 features.
Quiroga & Villarreal, PLoS ONE 11:e0155183 -- Vinardo scoring.
McNutt et al., J. Cheminformatics 13:43 -- GNINA 1.0 CNN docking.
Buttenschoen et al., Chem. Sci. 15:3130 -- PoseBusters (DL methods fail).
Lyu et al., Nature 566:224 -- ultralarge VS proof-of-concept.
Krivak & Hoksza, J. Cheminformatics 10:39 -- P2Rank.
Forli et al., Nat. Protoc. 11:905 -- AutoDockTools / meeko.

Related Skills

chemoinformatics/molecular-io - Parse ligands
chemoinformatics/conformer-generation - Generate 3D for ligand prep
chemoinformatics/molecular-standardization - Canonicalize before docking
chemoinformatics/pose-validation - PoseBusters physical-validity QC
chemoinformatics/docking-rescoring - DiffDock-L + GNINA hybrid
chemoinformatics/covalent-design - Covalent docking
chemoinformatics/free-energy-calculations - FEP for refined affinity
chemoinformatics/admet-prediction - Filter library before docking
structural-biology/structure-io - PDB / mmCIF handling
structural-biology/modern-structure-prediction - AlphaFold3 / Boltz-1 for apo receptors

skills/chemoinformatics/virtual-screening