skills/lab-automation/protocolsio-protocol-repository
protocols.io Protocol Repository
Overview
protocols.io is the world's largest open repository of experimental protocols with 90,000+ submissions across molecular biology, biochemistry, cell biology, bioinformatics, clinical research, lab automation, and microfluidics. The REST API enables programmatic access to search the database, retrieve complete protocol details (steps, reagents, materials, timing), browse version history, and discover community-adapted protocols. Public protocols are freely accessible without authentication; OAuth2 tokens are needed for private protocols or creating new protocols. API responses return structured data that can be integrated directly into automation scripts, ELN entries, or protocol editors.
Author: Pradyumna Jayaram.
When to Use
- Finding peer-reviewed, field-tested protocols by technique, reagent, or paper DOI
- Extracting detailed reagent lists, timing steps, and equipment lists for protocol implementation
- Building lab automation workflows by adapting community-validated methods
- Getting real-world tips and error notes that are not in vendor manuals
- Citing the specific version DOI in methods sections for reproducibility
- Discovering open alternatives to commercial kits or proprietary methods
- Using alongside
opentrons-protocol-apito convert published protocols into Python scripts - For writing and publishing your own protocols, use the protocols.io web interface
Prerequisites
- Python packages:
requests,pandas - Data requirements: protocol keywords, DOIs, or protocol IDs for retrieval
- Environment: Internet connection
- Rate limits: 10 requests/second for public API; unauthenticated requests allowed for public protocols
- Authentication: None needed for public protocols; OAuth2 token for private content (optional)
pip install requests pandas
Quick Start
import requests
import pandas as pd
"
BASE = "https://www.protocols.io/api/v4"
HEADERS = {"Authorization": "YOUR_TOKEN_HERE"} # Optional for public
# Search CRISPR protocols with filters
r = requests.get(
f"{BASE}/protocols",
params={"q": "CRISPR guide RNA design", "order_field": "views", "page_size": 10},
headers=HEADERS,
)
r.raise_for_status()
data = r.json()
print(f"Total protocols: {data['pagination']['total_results']}")
for p in data["items"][:3]:
print(f" {p['title']} (DOI: {p.get('doi')} | Views: {p.get('stats', {}).get('number_of_views', 0)})")
Core API
Module 1: Protocol Search and Discovery
Search the public protocol library by keyword, technique, or full text. Use filters to narrow by category, author, or performance metrics.
def search_protocols(query, page_size=20, order_field="relevance", category_id=None):
params = {"q": query, "page_size": page_size, "order_field": order_field}
if category_id:
params["filter[categories_ids][]"] = category_id
r = requests.get(f"{BASE}/protocols", params=params)
r.raise_for_status()
return r.json()
# Search by keyword
data = search_protocols("RNA extraction tissue", order_field="views", page_size=15)
total = data["pagination"]["total_results"]
print(f"RNA extraction protocols found: {total}")
# Extract structured info
rows = []
for p in data["items"][:10]:
stats = p.get("stats", {})
rows.append({
"id": p.get("id"),
"title": p.get("title"),
"doi": p.get("doi"),
"views": stats.get("number_of_views", 0),
"forks": stats.get("number_of_forks", 0),
"steps": p.get("number_of_steps", 0),
"created": p.get("created_on")[:10],
"category": p.get("categories", [{}])[0].get("name", "n/a"),
"authors": [a["name"] for a in p.get("creators", [])[:2]],
})
df = pd.DataFrame(rows).sort_values("views", ascending=False)
print("\nTop protocols:")
print(df[["title", "views", "forks", "steps"]].head(5).to_string(index=False))
# Search with category filter
category_id = 22 # Molecular Biology category
data = search_protocols("qPCR primer design", category_id=category_id, order_field="views")
for p in data["items"][:3]:
print(f" {p['title'][:60]} (DOI: {p.get('doi', 'n/a')})")
Module 2: Retrieve Full Protocol Details
Fetch the complete protocol with structured data for steps, materials, and equipment.
def get_protocol(protocol_id):
r = requests.get(f"{BASE}/protocols/{protocol_id}")
r.raise_for_status()
return r.json()
# Retrieve protocol by ID (from search results or DOI)
protocol_id = 45979 # Example: a publicly published protocol
data = get_protocol(protocol_id)
protocol = data.get("payload", data) # Handle response structure
print(f"Title: {protocol.get('title')}")
print(f"DOI: {protocol.get('doi')}")
print(f"Authors: {', '.join(a['name'] for a in protocol.get('creators', []))}")
print(f"Steps: {len(protocol.get('steps', []))}")
print(f"Materials: {len(protocol.get('materials', []))}")
print(f"Abstract: {protocol.get('description', '')[:200]}...")
# Extract structured steps
steps_df = []
for i, step in enumerate(protocol.get("steps", []), 1):
duration = step.get("duration", {})
steps_df.append({
"step_number": i,
"description": step.get("description", ""),
"duration_value": duration.get("duration"),
"duration_unit": duration.get("unit_label", ""),
"temperature": step.get("temperature", {}).get("value"),
"temp_unit": step.get("temperature", {}).get("unit_label", ""),
"equipment": [eq.get("name") for eq in step.get("equipment", [])],
})
steps_df = pd.DataFrame(steps_df)
print(f"\nFirst 5 steps:")
print(steps_df[["step_number", "description", "duration_value", "duration_unit"]].head().to_string(index=False))
# Extract materials and equipment
materials = protocol.get("materials", [])
equipment_list = []
for material in materials[:10]:
equipment_list.append({
"name": material.get("name"),
"quantity": material.get("quantity"),
"unit": material.get("unit", {}).get("name", ""),
"supplier": material.get("supplier", {}).get("name", ""),
"catalog": material.get("sku"),
})
print(f"\nSample materials:")
pd.DataFrame(equipment_list[:5])
Module 3: Retrieve Protocol by DOI
Fetch a protocol using its exact DOI for precise citation-based retrieval.
def get_protocol_by_doi(doi):
r = requests.get(f"{BASE}/protocols",
params={"q": doi, "page_size": 5})
r.raise_for_status()
items = r.json()["items"]
for item in items:
if item.get("doi") == doi:
return item
return None
doi = "10.17504/protocols.io.bvb3n2qn" # protocols.io DOI format
protocol = get_protocol_by_doi(doi)
if protocol:
print(f"Found: {protocol['title']}")
print(f" ID: {protocol['id']}")
print(f" Version: {protocol.get('version_id')}")
Module 4: Browse Protocol Categories
Find available categories for targeted searches, useful for building filter interfaces.
def get_categories():
r = requests.get(f"{BASE}/categories")
r.raise_for_status()
return r.json().get("items", [])
cats = get_categories()
print("Available protocol categories:")
for cat in cats[:10]:
print(f" {cat['id']}: {cat['name']} (protocols: {cat.get('protocol_count', 0)})")
Module 5: Protocol Versioning and History
Track updates and cite the correct version in publications.
def get_protocol_versions(protocol_id):
r = requests.get(f"{BASE}/protocols/{protocol_id}")
r.raise_for_status()
protocol = r.json().get("payload", r.json())
return {
"title": protocol.get("title"),
"version": protocol.get("version_id"),
"published": protocol.get("published_on"),
"doi": protocol.get("doi"),
"parent_doi": protocol.get("parent_publication", {}).get("doi"),
"current_version": protocol.get("is_current_version", False),
}
versions = get_protocol_versions
print("\nVersion info:")
for k, v in versions.items():
print(f" {k}: {v}")
Key Concepts
Protocol DOIs and Versioning
- Each published protocol has a citable DOI:
10.17504/protocols.io.XXXXX - When a protocol is updated, it creates a new version with a new DOI
- The original DOI remains valid and points to the first version
- Always cite the specific version DOI for reproducibility
API Authentication and Rate Limits
- Public protocols: Accessible without authentication, but tokens get higher rate limits (20 requests/second vs 10)
- Private protocols: Require an OAuth2 token from
https://www.protocols.io/developers - Rate limiting: 10 requests/second for public; include
time.sleep(0.15)between requests to avoid hitting limits
Protocol Structure
A protocol contains:
- Steps: Ordered list of operations with timing, temperature, equipment, and safety notes
- Materials: Reagents, consumables, equipment with suppliers and catalog numbers
- Equipment: Hardware requirements per step
- Metadata: Authors, tags, publication status, community metrics
Common Workflows
Workflow 1: Protocol Discovery and Quality Scoring
Find protocols with the most community validation (high forks = widely adapted).
def search_and_rank(query, top_n=20):
"""Search and rank protocols by views + forks (community adaptation)."""
r = requests.get(f"{BASE}/protocols",
params={"q": query, "page_size": top_n, "order_field": "views"})
r.raise_for_status()
data = r.json()
rows = []
for p in data["items"]:
stats = p.get("stats", {})
rows.append({
"id": p.get("id"),
"title": p.get("title"),
"doi": p.get("doi"),
"views": stats.get("number_of_views", 0),
"forks": stats.get("number_of_forks", 0),
"steps": p.get("number_of_steps"),
"created": p.get("created_on")[:10] if p.get("created_on") else "n/a",
"category": p.get("categories", [{}])[0]. for c in p.get("creators", [])[:3],
})
df = pd.DataFrame(rows)
df["popularity_score"] = df["views"] * 0.7 + df["forks"] * 0.3
return df.sort_values("popularity_score", ascending=False)
# Find high-quality western blot protocols
df = search_and_rank("western blot protein detection", top_n=15)
df.to_csv("western_blot_protocols.csv", index=False)
print("\nTop western blot protocols:")
print(df[["title", "views", "forks", "steps"]].head(5).to_string(index=False))
Workflow 2: Extract Protocol Data for Automation
Parse steps, reagents, and timing to build scripts or ELN entries.
def extract_protocol_for_protocol_id(protocol_id):
"""Extract structured data for protocol automation."""
r = requests.get(f"{BASE}/protocols/{protocol_id}")
r.raise_for_status()
protocol = r.json().get("payload", r.json())
# Steps with timing
steps = [{
"step_number": i,
"description": step.get("description", ""),
"duration_value": step.get("duration", {}).get("duration"),
"duration_unit": step.get("duration", {}).get("unit_label", ""),
"temperature": step.get("temperature", {}).get("value"),
"equipment": [eq.get("name") for eq in step.get("equipment", [])],
} for i, step in enumerate(protocol.get("steps", []), 1)]
# Materials with suppliers
materials = [{
"name": m.get("name"),
"quantity": m.get("quantity"),
"unit": m.get("unit", {}).get("name", ""),
"supplier": m.get("supplier", {}).get("name", ""),
"catalog": m.get("sku"),
} for m in protocol.get("materials", [])]
return {
"title": protocol.get("title"),
"doi": protocol.get("doi"),
"steps": pd.DataFrame(steps),
"materials": pd.DataFrame(materials),
}
# Extract a protocol for automation
result = extract_protocol_for_protocol_id
print(f"\nProtocol: {result['title']} (DOi: {result['doi']})")
print(f"\nSteps summary:")
print(result["steps"].head())
print(f"\nMaterials summary:")
print(result["materials"].head())
# Export for integration
result["steps"].to_csv("protocol_steps.csv", index=False)
result["materials"].to_csv("protocol_materials.csv", index=False)
print("\nExported: protocol_steps.csv, protocol_materials.csv")
Workflow 3: Reorder Protocol Steps by Equipment
Group steps by the equipment used, useful for batch operations.
def group_steps_by_equipment(protocol_id):
"""Group steps by required equipment for workflow planning."""
r = requests.get(f"{BASE}/protocols/{protocol_id}")
r.raise_for_status()
protocol = r.json().get("payload", r.json())
equipment_groups = {}
for i, step in enumerate(protocol.get("steps", []), 1):
equipment = [eq.get("name") for eq in step.get("equipment", [])]
if not equipment:
equipment = ["manual"]
for eq_name in equipment:
if eq_name not in equipment_groups:
equipment_groups[eq_name] = []
equipment_groups[eq_name].append({
"step": i,
"desc": step.get("description", "")[:80],
"duration": step.get("duration", {}),
})
return equipment_groups
# Group a protocol by equipment
groups = group_steps_by_equipment
print("\nSteps grouped by equipment:")
for eq, steps in groups.items():
print(f"\nEquipment: {eq}")
for step in steps[:3]:
dur = step["duration"]
dur_str = f" {dur.get('duration')}{dur.get('unit_label', '')}" if dur else ""
print(f" Step {step['step']}: {step['desc']}{dur_str}")
Key Parameters
| Parameter | Function | Default | Range / Options | Effect |
|---|---|---|---|---|
q | Search | — | string | Full-text search query |
order_field | Search | "relevance" | "relevance", "views", "date", "activity" | Result sort order |
page_size | Search | 10 | 1–50 | Items per page |
filter[categories_ids][] | Search | — | category ID | Filter by category |
protocol_id | Get protocol | required | integer | Specific protocol to fetch |
DOI | Search by DOI | — | string | Exact DOI match for retrieval |
Best Practices
-
Sort by views first for quality protocols:
order_field=viewsgives protocols that have been tested by many users, increasing the chance of catching hidden pitfalls early. -
Always cite the specific version DOI: When including a protocol in methods, cite
doi:10.17504/protocols.io.XXXXXwith the version number. protocols.io DOIs are versioned; the DOI in thepayloadis always the latest. -
Cross-check vendor info: The materials list often has supplier and catalog numbers — use this for reordering, but double-check that it's current with the vendor before placing orders.
-
Extract safety notes: Some steps include safety warnings or special precautions that aren't in the original vendor manual. Import these into your automated protocol.
-
Cache high-quality protocols: After finding a good protocol, store the steps and materials locally in case the protocol is removed or changes.
-
Use the
forksmetric for adaptation: A protocol with many forks has been adapted by many labs — check the forks for common workarounds. -
Check license before use: All public protocols.io protocols are CC-BY 4.0 by default. Check the license field for commercial use restrictions.
Common Recipes
Recipe: Find and rank reagent-specific protocols
# Find protocols using a specific kit
r = requests.get(f"{BASE}/protocols",
params={"q": "RNeasy Mini Kit RNA extraction", "order_field": "views"},
headers=HEADERS)
data = r.json()
print(f"Protocols using RNeasy: {data['pagination']['total_results']}")
for p in data["items"][:3]:
print(f" {p['title'][:60]} (DOI: {p.get('doi', 'n/a')})")
Recipe: Generate citation strings for methods
def generate_citation(doi):
"""Generate a formatted citation string."""
protocol = get_protocol_by_doi(doi)
if not protocol:
return "Not found"
authors = "; ".join(a["name"] for a in protocol.get("creators", [])[:3])
year = protocol.get("created_on", "")[:4] or "n.d."
title = protocol.get("title") or "(untitled)"
print(f"{authors} ({year}). {title}. protocols.io. https://doi.org/{protocol.get('doi')}")
return f"{authors} ({year}). {title}. protocols.io. https://doi.org/{protocol.get('doi')}"
# Generate citation for a protocol
citation = generate_citation("10.17504/protocols.io.bvb3n2qn")
Recipe: Build an equipment list for a lab
def extract_equipment_from_category(category_id, sample_size=50):
"""Get unique equipment from a category of protocols."""
r = requests.get(f"{BASE}/categories")
cats = r.json().get("items", [])
for cat in cats:
if cat["id"] == category_id:
category_name = cat["name"]
break
else:
return "Category not found"
equipment = set()
page = 1
total_gathered = 0
while total_gathered < sample_size:
r = requests.get(f"{BASE}/protocols",
params={"filter[categories_ids][]": category_id,
"page_size": 50, "page_id": page})
data = r.json()
if not data["items"]:
break
for p in data["items"][:sample_size - total_gathered]:
for step in p.get("steps", []):
for eq in step.get("equipment", []):
equipment.add(eq.get("name", "Unknown"))
total_gathered += 1
page += 1
print(f"Equipment from {category_name} protocols:")
for eq in sorted(equipment):
print(f" - {eq}")
return sorted(equipment)
# Get equipment from a category
equipment = extract_equipment_from_category(22, 20) # Molecular Biology
Expected Outputs
- CSV/Excel files with protocol metadata, steps, and materials
- DOI strings for citation in publications
- Structured JSON for protocol automation
- Equipment lists and reagent lists for ordering
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
401 error accessing a protocol | Private protocol without auth | Get an OAuth2 token from https://www.protocols.io/developers |
| Search returns empty results | Query too specific or no match | Broaden the query; try fewer keywords or related terms |
| Protocol steps are empty or HTML-formatted | Protocol uses rich text | Use BeautifulSoup or re.sub(r'<[^>]+>', '', text) to strip HTML |
materials list is empty | Materials embedded in step text | Check steps for reagent names and quantities listed inline |
| DOI lookup returns wrong protocol | Similar title match | Compare DOI field exactly using if item.get("doi") == expected_doi |
Rate limit errors (429 Too Many Requests) | Exceeding 10 requests/second | Add time.sleep(0.15) between requests or get a token for 20/second |
References
- protocols.io API documentation — official REST API reference
- protocols.io platform — browse and submit protocols
- protocols.io citation guide — how to cite in publications
- Teytelman et al. Nature Methods 13, 401 — platform description
- CC-BY license — terms of use
Related Skills
opentrons-ot2-protocols— convert downloaded protocols into automated Python scriptseln-elabftw,eln-chemotion,eln-openbis— store protocols and run details in an ELNpylabrobot-vendor-agnostic— run protocols across Hamilton, Tecan, Opentrons robotswestern-blot-quantification— follow published protocols for WB analysis
