skills/pyopenms
PyOpenMS
Overview
PyOpenMS provides Python bindings to the OpenMS library for computational mass spectrometry, enabling analysis of proteomics and metabolomics data. Use it to read/write MS file formats, process raw spectra, detect and quantify features, identify peptides and proteins, and run end-to-end LC-MS/MS pipelines.
This skill ships ready-to-run scripts in scripts/ covering the most common
high-level workflows. Prefer running a script over writing new code—each is a
parameterized CLI tool that handles loading, processing, and export. Drop into the
Python API (and the references/) only when no script fits.
Installation
uv pip install pyopenms
Verify (note: __version__ works, but the bundled binary prints a one-line
memory-status notice on import that is harmless):
import pyopenms as ms
print(ms.__version__) # 3.5.0
Scripts (start here)
Run with python scripts/<name>.py --help for full options. All accept standard
MS file formats and write featureXML/consensusXML/CSV/mzTab/PNG as appropriate.
Inspect & convert
| Script | What it does |
|---|---|
inspect_ms_data.py | Summarize any mzML/mzXML/featureXML/consensusXML/idXML (counts, RT/m/z ranges, TIC, metadata); optional per-spectrum CSV. |
convert_format.py | Convert between mzML/mzXML/MGF with optional MS-level, RT, and intensity filtering. |
process_spectra.py | Configurable signal-processing chain: smoothing (Gauss/SGolay), centroiding (PeakPickerHiRes), normalization, S/N and intensity thresholds. |
Feature detection & quantification
| Script | What it does |
|---|---|
detect_features_metabo.py | Untargeted metabolomics feature finding: MassTraceDetection → ElutionPeakDetection → FeatureFindingMetabo. |
detect_features_centroided.py | Peptide/centroided feature detection via FeatureFinderAlgorithmPicked. |
align_link_quantify.py | Multi-sample pipeline: detect (or load) features → RT alignment → consensus linking → quant matrix CSV. |
consensus_to_matrix.py | consensusXML → wide intensity matrix + metadata, with optional median/quantile normalization and long format. |
Annotation
| Script | What it does |
|---|---|
detect_adducts.py | Group adducts/charge variants of the same neutral mass (MetaboliteFeatureDeconvolution). |
accurate_mass_search.py | Annotate features against HMDB by accurate mass (AccurateMassSearchEngine → mzTab/CSV). |
export_gnps_sirius.py | Export GNPS FBMN inputs (MGF + quant table) or a SIRIUS .ms file. |
Identification
| Script | What it does |
|---|---|
process_identifications.py | Re-index against FASTA, estimate FDR/q-values, filter (FDR/length/best-per-spectrum), export idXML + CSV. |
Chemistry
| Script | What it does |
|---|---|
mass_calculator.py | Monoisotopic/average mass, charged m/z, formula, and isotope pattern for peptides or empirical formulas. |
digest_protein.py | In-silico protease digestion of FASTA/sequence → theoretical peptides with masses and m/z. |
theoretical_spectrum.py | Generate annotated theoretical fragment spectra (b/y/a/c/x/z, losses) for a peptide. |
Targeted & visualization
| Script | What it does |
|---|---|
extract_chromatograms.py | Build TIC/BPC and XIC traces for target m/z (CSV + optional plot). |
plot_ms_data.py | Quick plots: single spectrum, TIC, 2D feature map, MS1 signal map. |
Common script recipes
# Inspect a file
python scripts/inspect_ms_data.py sample.mzML --spectra-csv spectra.csv
# Untargeted metabolomics: features for one sample
python scripts/detect_features_metabo.py sample.mzML --out-csv features.csv
# Full multi-sample quantification study
python scripts/align_link_quantify.py s1.mzML s2.mzML s3.mzML --out-prefix study
python scripts/consensus_to_matrix.py study.consensusXML --out quant.csv --normalize median
# Peptide chemistry
python scripts/mass_calculator.py --peptide "PEPTIDEM(Oxidation)K" --charges 1 2 3 --isotopes 5
python scripts/digest_protein.py proteins.fasta --enzyme Trypsin --missed 2 --out peptides.csv
# Identification post-processing
python scripts/process_identifications.py search.idXML --fasta db.fasta --fdr 0.01 --out filtered.idXML --csv hits.csv
Key 3.5.0 API notes
These changed from older OpenMS releases—older tutorials and code will break:
- Feature finding:
FeatureFinder("centroided")was removed. UseFeatureFinderAlgorithmPicked(proteomics/centroided) or theMassTraceDetection → ElutionPeakDetection → FeatureFindingMetabopipeline (metabolomics). Seedetect_features_*.py. - idXML I/O:
IdXMLFile().load/storerequire ams.PeptideIdentificationList()for peptide IDs (a plain Pythonlistraises "can not handle type"). Protein IDs remain a plain list. - Adduct decharging: the class is
MetaboliteFeatureDeconvolution, and adducts useElements:Charge:Probabilitysyntax (e.g.H:+:0.4,H-2O-1:0:0.05)—not bracket notation like[M+H]+. - DataFrame columns:
FeatureMap.get_df()uses lowercasert/mz(notRT).ConsensusMapprovidesget_intensity_df()andget_metadata_df(). - Bundled data caveat: the pip wheel ships
HMDBMappingFile.tsvbut notHMDB2StructMapping.tsv;accurate_mass_search.pydetects this and explains how to supply it.
Core data structures
- MSExperiment – collection of spectra and chromatograms
- MSSpectrum / MSChromatogram – a single spectrum / chromatographic trace
- Feature / FeatureMap – a detected LC-MS peak / collection of features
- ConsensusMap – features linked across samples (the quant table)
- PeptideIdentification / ProteinIdentification – search results
- AASequence / EmpiricalFormula – sequence and formula chemistry
For details: see references/data_structures.md.
Parameter management
Most algorithms expose an OpenMS Param object:
algo = ms.FeatureFindingMetabo()
p = algo.getDefaults()
for key in p.keys():
print(key.decode(), "=", p.getValue(key), "|", p.getDescription(key))
p.setValue("charge_lower_bound", 1)
algo.setParameters(p)
Export to pandas
fm = ms.FeatureMap(); ms.FeatureXMLFile().load("features.featureXML", fm)
df = fm.get_df() # columns include lowercase rt, mz, intensity, charge, quality
cm = ms.ConsensusMap(); ms.ConsensusXMLFile().load("study.consensusXML", cm)
intensities = cm.get_intensity_df() # features x samples
metadata = cm.get_metadata_df() # rt, mz, charge, quality, ...
Integration with other tools
Pandas (DataFrames), NumPy (peak arrays), scikit-learn (ML), Matplotlib/Seaborn (plots), and downstream tools via export: GNPS (FBMN), SIRIUS, and mzTab.
Resources
- Official docs (3.5.0): https://pyopenms.readthedocs.io/en/release-3.5.0/
- OpenMS: https://www.openms.org
- GitHub: https://github.com/OpenMS/OpenMS
References
references/file_io.md– file format handlingreferences/signal_processing.md– signal processing algorithmsreferences/feature_detection.md– feature detection and linkingreferences/identification.md– peptide and protein identificationreferences/metabolomics.md– metabolomics-specific workflowsreferences/data_structures.md– core objects and data structures
