skills/chemoinformatics/conformer-generation
Version Compatibility
Reference examples tested with: RDKit 2024.09+, xtb 6.7+, CREST 3.0+, OpenMM 8.1+ for follow-up MD.
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures - CLI:
xtb --version;crest --version
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Conformer Generation
Generate 3D conformer ensembles for molecules from 2D structures. The choice of method depends on molecule size, flexibility, and downstream use: ETKDG / ETKDGv3 (Riniker & Landrum 2015) is the modern default for drug-like molecules, MMFF94/UFF for fast energy minimization, CREST + GFN2-xTB for high-accuracy semi-empirical sampling of macrocycles and peptides. A single conformer is rarely sufficient: descriptor variance across the ensemble can exceed the descriptor signal, and docking pose accuracy degrades if the starting conformer is non-bioactive.
For docking pose validation, see chemoinformatics/pose-validation. For free-energy methods (which require ensemble sampling), see chemoinformatics/free-energy-calculations.
Conformer Method Taxonomy
| Method | Cost / mol | Quality | Use case | Fails when |
|---|---|---|---|---|
| ETKDGv3 + MMFF94 | <1s | Good for drug-like | Default; docking input; descriptors | Macrocycles, peptides, transition metals |
| ETKDGv3 + UFF | <1s | Lower-quality MMFF94 alternative | Fallback when MMFF94 fails to parameterize | Same as MMFF94 |
| Omega (OpenEye) | 1s | Industry-standard commercial | Commercial pipelines | License cost |
| Confab (Open Babel) | 5s | Systematic torsion search | Patent expiration | Quality limited |
| RDKit ETKDGv3 + macrocycle preferences | 10-60s | Drug-like macrocycles | Macrocyclic peptides | Still limited; CREST better |
| CREST + GFN2-xTB | minutes | High-accuracy semi-empirical | Macrocycles, peptides, conformer ensembles for QSAR | Computationally expensive; metal centers |
| CREST + GFN-FF | seconds | GFN2 quality at FF speed | Quick screening | Limited element coverage |
| GeoMol (Ganea 2021) | <0.1s GPU | ML-fast, ETKDGv3-quality | Large library 3D conformers | ML training distribution |
| TorsionNet (Gogineni 2020) | <0.1s GPU | ML-fast | Drug-like | ML training distribution |
| MD sampling (OpenMM) | hours | High-quality dynamic | Free energy, induced fit | Computational cost |
Decision: For drug-like molecules (<500 Da, <8 rotatable bonds), ETKDGv3 + MMFF94 with 20-100 conformers is the modern default. For macrocycles, peptides, or molecules with >12 rotatable bonds, CREST + GFN2-xTB captures the conformational diversity. For ML-scale (>1M molecules), GeoMol trades accuracy for speed.
Decision Tree by Scenario
| Scenario | Method | Conformer count | Energy window |
|---|---|---|---|
| Single docking pose (initial 3D) | ETKDGv3 + MMFF94 | 1 | n/a |
| Multi-conformer docking | ETKDGv3 + MMFF94 | 10-50 | 10 kcal/mol |
| 3D QSAR descriptor input | ETKDGv3 + MMFF94 | 50-200 | 5 kcal/mol |
| Pharmacophore search | ETKDGv3 + MMFF94 | 100-500 | 5 kcal/mol |
| Macrocycle / peptide | CREST + GFN2-xTB | 50-200 (auto from CREST) | 5-8 kcal/mol |
| FEP input | CREST + GFN2-xTB then MD relax | 1-3 representative | 3 kcal/mol |
| Bioactive conformer search | ETKDGv3 + MMFF94 then dock with rescore | 100-500 | 10 kcal/mol |
| Shape similarity / ROCS | ETKDGv3 + MMFF94 | 50-200 | 10 kcal/mol |
| Conformer-dependent descriptors | ETKDGv3 ensemble + Boltzmann avg | 20-100 | 5 kcal/mol |
ETKDGv3 (Modern Default)
ETKDGv3 (Riniker & Landrum 2015) incorporates experimental torsion preferences into distance geometry: starts from random embeddings, refines by satisfying experimentally-derived bond, angle, and torsion preferences.
Goal: Generate an ensemble of 3D conformers from a SMILES with the modern default embedding algorithm.
Approach: Add explicit hydrogens, configure ETKDGv3 params (random seed, max attempts, random coords), and embed multiple conformers via EmbedMultipleConfs.
from rdkit import Chem
from rdkit.Chem import AllChem
def gen_conformers(smiles, n_conf=20, seed=42):
mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
params = AllChem.ETKDGv3()
params.randomSeed = seed
params.useRandomCoords = True
params.maxAttempts = 1000
ids = AllChem.EmbedMultipleConfs(mol, numConfs=n_conf, params=params)
return mol, list(ids)
useRandomCoords=True improves convergence for macrocycles and heavily-rotated molecules. maxAttempts=1000 handles difficult embeddings.
Force-Field Optimization
After embedding, minimize each conformer to a local minimum.
Goal: Reduce strain in each embedded conformer to a stable local minimum and record the resulting energies.
Approach: Build MMFF94s force-field parameters, minimize each conformer in place, and collect energies; fall back to UFF when MMFF94 cannot parameterize the molecule.
def optimize_conformers(mol, conf_ids, force_field='mmff94'):
energies = []
if force_field == 'mmff94':
mmff_props = AllChem.MMFFGetMoleculeProperties(mol, mmffVariant='MMFF94s')
for cid in conf_ids:
ff = AllChem.MMFFGetMoleculeForceField(mol, mmff_props, confId=cid)
ff.Minimize()
energies.append(ff.CalcEnergy())
else: # UFF fallback
for cid in conf_ids:
ff = AllChem.UFFGetMoleculeForceField(mol, confId=cid)
ff.Minimize()
energies.append(ff.CalcEnergy())
return energies
" MMFF94 vs MMFF94s: MMFF94s is the "standard" set with simpler aromatic nitrogen handling; preferred for most drug-like.
UFF (Universal Force Field): Lower quality but handles any element including transition metals. Use as fallback when MMFF94 cannot parameterize (uncommon elements, charged species).
RMSD Pruning
Remove near-duplicate conformers within a chosen RMSD cutoff to keep the ensemble diverse:
import numpy as np
def prune_conformers_rmsd(mol, conf_ids, rmsd_cutoff=0.5):
n = len(conf_ids)
keep = []
for i, cid in enumerate(conf_ids):
is_unique = True
for kept_cid in keep:
rmsd = AllChem.GetBestRMS(mol, mol, cid, kept_cid)
if rmsd < rmsd_cutoff:
is_unique = False
break
if is_unique:
keep.append(cid)
return keep
Typical RMSD cutoff (Source / Rationale):
| Cutoff | Use case | Source |
|---|---|---|
| 0.5 Å | Drug-like ensemble for descriptors / docking | Empirical: below this conformers represent same minimum (Hawkins 2007) |
| 1.0 Å | Drug-like ensemble for pharmacophore | Standard ROCS / pharmacophore practice |
| 1.5-2.0 Å | Macrocycles / peptides | Higher conformational freedom; Tan 2018 macrocycle benchmarks |
| 2.0+ Å | Cluster-centroid representative ensembles | Coarse representative sampling |
Energy Window Filtering
Remove conformers above an energy cutoff (high-energy conformers are unlikely to be bioactive):
def filter_by_energy(mol, conf_ids, energies, window_kcal=10.0):
min_e = min(energies)
keep = []
for cid, e in zip(conf_ids, energies):
if e - min_e <= window_kcal:
keep.append(cid)
return keep
Window choice:
- 3 kcal/mol: very strict, only near-global-min conformers (FEP, MD setup)
- 5 kcal/mol: typical for 3D QSAR, pharmacophore
- 10 kcal/mol: typical for docking input (bioactive conformer may be higher)
- 25 kcal/mol: macrocycles, no filter (bioactive conformer can be high-energy when bound)
Macrocycle Handling
Macrocycles (>=12 atom rings) have distinct conformational issues: ETKDGv3 default knowledge base under-samples macrocycle torsions. Use macrocycle-specific torsion preferences:
from rdkit.Chem import AllChem
def macrocycle_conformers(smiles, n_conf=200, seed=42):
mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
params = AllChem.ETKDGv3()
params.randomSeed = seed
params.useRandomCoords = True
params.useMacrocycleTorsions = True
params.useSmallRingTorsions = True
params.maxAttempts = 5000
ids = AllChem.EmbedMultipleConfs(mol, numConfs=n_conf, params=params)
return mol, list(ids)
For pharmaceutical macrocycles (cyclosporine, paclitaxel, large peptides), CREST + GFN2-xTB is the gold standard.
CREST + GFN2-xTB for High-Quality Sampling
CREST (Grimme 2024) performs iterative meta-dynamics + GFN2-xTB optimization for conformer sampling.
Goal: Sample high-quality conformer ensembles for macrocycles, peptides, or molecules where ETKDGv3 + MMFF94 is inadequate.
Approach: Start from an RDKit-generated MMFF94-relaxed conformer, write to XYZ, and run CREST with GFN2-xTB driver to perform iterative meta-dynamics + reoptimization.
xtb mol.xyz --opt extreme
crest opt.xyz --gfn2 --T 12 -ewin 6
--gfn2: use GFN2-xTB (most accurate of GFN family for drug-like molecules).
--gfn-ff: use GFN-FF (faster, less accurate).
-ewin 6: 6 kcal/mol energy window above global min.
-T 12: use 12 CPU threads.
Output: crest_conformers.xyz with sampled ensemble.
Workflow: Start from RDKit ETKDGv3 + MMFF94 (cheap initial structure) -> save as XYZ -> CREST refinement.
from rdkit import Chem
from rdkit.Chem import AllChem
import subprocess
def crest_workflow(smiles, out_dir='crest_out'):
mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())
AllChem.MMFFOptimizeMolecule(mol)
xyz = Chem.MolToXYZBlock(mol)
with open(f'{out_dir}/input.xyz', 'w') as f:
f.write(xyz)
subprocess.run(['crest', f'{out_dir}/input.xyz', '--gfn2', '-T', '12'],
cwd=out_dir, check=True)
return f'{out_dir}/crest_conformers.xyz'
Boltzmann Averaging of Properties
For ensemble descriptors (3D shape, dipole moment, polar surface area in 3D), Boltzmann-weight by energy:
import numpy as np
def boltzmann_weights(energies, T=300.0):
energies = np.array(energies)
kt = 0.001987 * T # kcal/mol at 300K
rel = energies - energies.min()
w = np.exp(-rel / kt)
return w / w.sum()
def boltzmann_average(values, energies, T=300.0):
w = boltzmann_weights(energies, T)
return float(np.sum(np.array(values) * w))
For Boltzmann averaging, energies should be MMFF94 or higher quality. UFF energies are unreliable for Boltzmann weighting.
ML-Based Conformer Generation (GeoMol, TorsionNet)
For very large libraries (>1M compounds), classical methods become bottlenecks. ML-based methods generate conformers in <0.1s/mol on GPU:
# Pseudo-code for GeoMol-style ML conformer generation
# (Requires pre-trained model + dependencies)
# from geomol import generate_conformers
# conformers = generate_conformers(smiles, n_conformers=10)
Trade-off: ML methods (GeoMol, TorsionNet) match ETKDGv3 quality on drug-like molecules but extrapolate poorly outside training distribution (macrocycles, organometallics).
Per-Tool Failure Modes
ETKDGv3 -- failed embedding
Trigger: Macrocycle, highly constrained polycyclic, or sterically crowded molecule.
Mechanism: Distance geometry cannot find consistent 3D structure within max attempts.
Symptom: EmbedMolecule returns -1; EmbedMultipleConfs returns empty list.
Fix: Set useRandomCoords=True, increase maxAttempts to 5000+; for macrocycles, set useMacrocycleTorsions=True. As fallback, use CREST.
MMFF94 -- parameter missing
Trigger: Molecule contains element not parameterized (transition metals, certain S+ species).
Mechanism: MMFF94 only covers H, C, N, O, F, Si, P, S, Cl, Br, I + select cations.
Symptom: MMFFGetMoleculeProperties returns None; optimization silently no-ops.
Fix: Fall back to UFF; or for metals, use GFN2-xTB.
Conformer ensemble too small
Trigger: n_conf=10 for a flexible molecule (>5 rotatable bonds).
Mechanism: 10 conformers insufficient to sample conformational space; many minima missed.
Symptom: RMSD distribution narrow; descriptor variance underestimated.
Fix: Use n_conf = max(10, 5 * NumRotatableBonds + 10) heuristic (Hawkins 2017).
Single-conformer 3D descriptor
Trigger: Calculating 3D descriptors from a single conformer.
Mechanism: 3D descriptor variance across conformers can be 50%+ of mean.
Symptom: Same molecule produces different 3D descriptors on rerun.
Fix: Always compute descriptor over ensemble; report mean ± std, or Boltzmann-weighted mean.
CREST -- timeout on flexible molecule
Trigger: Cyclosporin or large peptide.
Mechanism: CREST metadynamics scales poorly with rotational complexity.
Symptom: Hours of CPU time per molecule; incomplete sampling.
Fix: Use --gfn-ff for faster initial sampling; reduce metadynamics time --mdtime 5 or skip metadyn with --noopt.
GFN2-xTB conformer reordering
Trigger: Comparing conformer energies between GFN2-xTB and DFT.
Mechanism: GFN2-xTB is parameterized for energies; relative conformer ordering can differ from DFT by 1-2 kcal/mol.
Symptom: "Wrong" conformer reported as global minimum vs DFT reference.
Fix: For high-stakes work, re-rank top GFN2-xTB conformers with DFT single-points (e.g., r2SCAN-3c).
Reconciliation: ETKDGv3 vs CREST
| Use case | ETKDGv3 | CREST |
|---|---|---|
| Drug-like, <500 Da, <8 RotBonds | Sufficient | Overkill |
| 8-12 RotBonds | OK with n_conf>=100 | Better at expense of cost |
| Macrocycle, peptide, >12 RotBonds | Inadequate | Required |
| Boltzmann-weighted descriptors | OK but energies less accurate | Better |
| FEP input | Possible | Preferred (after MMFF cleanup) |
For ETKDGv3 ensembles, run CREST on a subset for benchmarking; if RMSD < 1A across methods, ETKDGv3 is adequate.
Common Errors
| Symptom | Cause | Fix |
|---|---|---|
EmbedMolecule returns -1 | Embed failed | Set useRandomCoords=True; raise maxAttempts |
| MMFFOptimize no-op | MMFF parameters missing | Use UFF fallback |
| All conformers identical | Stiff molecule | OK; molecule is rigid |
| Conformers physically wrong | Stereochemistry lost | Re-add explicit stereo before embedding |
| 3D descriptors differ per run | Random seed not set | params.randomSeed = 42 |
| CREST out-of-memory | Too many conformers in search | Reduce --T threads; raise --ewin window |
| Macrocycle ring inverted | Default torsion preferences wrong | Set useMacrocycleTorsions=True |
| AddHs not called | Implicit H not embedded | mol = Chem.AddHs(mol) before EmbedMolecule |
References
- Hawkins et al., J. Chem. Inf. Model. 50:572 -- OMEGA conformer sampling.
- Riniker & Landrum, J. Chem. Inf. Model. 55:2562-2574 -- ETKDG / ETKDGv3.
- Halgren, J. Comput. Chem. 17:490 -- MMFF94 force field.
- Rappe et al., J. Am. Chem. Soc. 114:10024 -- UFF.
- Pracht, Bohle, Grimme, J. Chem. Phys. 160:114110 -- CREST 3.0.
- Bannwarth et al., J. Chem. Theory Comput. 15:1652 -- GFN2-xTB.
- Ganea et al., NeurIPS -- GeoMol ML conformer generation.
- Hawkins, J. Chem. Inf. Model. 57:1747 -- conformer count heuristics.
Related Skills
- chemoinformatics/molecular-io - Parse molecules
- chemoinformatics/molecular-standardization - Standardize before embedding
- chemoinformatics/molecular-descriptors - 3D descriptors from ensembles
- chemoinformatics/shape-similarity - Multi-conformer 3D shape matching
- chemoinformatics/virtual-screening - Generate 3D ligands for docking
- chemoinformatics/free-energy-calculations - Sample conformers for MD setup
- chemoinformatics/pharmacophore-modeling - 3D pharmacophore from ensembles
