>> scientific-skills/hypogenic
Hypogenic
Overview
Hypogenic provides automated hypothesis generation and testing using large language models to accelerate scientific discovery. The framework supports three approaches: HypoGeniC (data-driven hypothesis generation), HypoRefine (synergistic literature and data integration), and Union methods (mechanistic combination of literature and data-driven hypotheses).
Quick Start
Get started with Hypogenic in minutes:
# Install the package
uv pip install hypogenic
# Clone example datasets
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# Run basic hypothesis generation
hypogenic_generation --config ./data/your_task/config.yaml --method hypogenic --num_hypotheses 20
# Run inference on generated hypotheses
hypogenic_inference --config ./data/your_task/config.yaml --hypotheses output/hypotheses.json
Or use Python API:
from hypogenic import BaseTask
# Create task with your configuration
task = BaseTask(config_path="./data/your_task/config.yaml")
# Generate hypotheses
task.generate_hypotheses(method="hypogenic", num_hypotheses=20)
# Run inference
results = task.inference(hypothesis_bank="./output/hypotheses.json")
When to Use This Skill
Use this skill when working on:
- Generating scientific hypotheses from observational datasets
- Testing multiple competing hypotheses systematically
- Combining literature insights with empirical patterns
- Accelerating research discovery through automated hypothesis ideation
- Domains requiring hypothesis-driven analysis: deception detection, AI-generated content identification, mental health indicators, predictive modeling, or other empirical research
Key Features
Automated Hypothesis Generation
- Generate 10-20+ testable hypotheses from data in minutes
- Iterative refinement based on validation performance
- Support for both API-based (OpenAI, Anthropic) and local LLMs
Literature Integration
- Extract insights from research papers via PDF processing
- Combine theoretical foundations with empirical patterns
- Systematic literature-to-hypothesis pipeline with GROBID
Performance Optimization
- Redis caching reduces API costs for repeated experiments
- Parallel processing for large-scale hypothesis testing
- Adaptive refinement focuses on challenging examples
Flexible Configuration
- Template-based prompt engineering with variable injection
- Custom label extraction for domain-specific tasks
- Modular architecture for easy extension
Proven Results
- 8.97% improvement over few-shot baselines
- 15.75% improvement over literature-only approaches
- 80-84% hypothesis diversity (non-redundant insights)
- Human evaluators report significant decision-making improvements
Core Capabilities
1. HypoGeniC: Data-Driven Hypothesis Generation
Generate hypotheses solely from observational data through iterative refinement.
Process:
- Initialize with a small data subset to generate candidate hypotheses
- Iteratively refine hypotheses based on performance
- Replace poorly-performing hypotheses with new ones from challenging examples
Best for: Exploratory research without existing literature, pattern discovery in novel datasets
2. HypoRefine: Literature and Data Integration
Synergistically combine existing literature with empirical data through an agentic framework.
Process:
- Extract insights from relevant research papers (typically 10 papers)
- Generate theory-grounded hypotheses from literature
- Generate data-driven hypotheses from observational patterns
- Refine both hypothesis banks through iterative improvement
Best for: Research with established theoretical foundations, validating or extending existing theories
3. Union Methods
Mechanistically combine literature-only hypotheses with framework outputs.
Variants:
- Literature ∪ HypoGeniC: Combines literature hypotheses with data-driven generation
- Literature ∪ HypoRefine: Combines literature hypotheses with integrated approach
Best for: Comprehensive hypothesis coverage, eliminating redundancy while maintaining diverse perspectives
Installation
Install via pip:
uv pip install hypogenic
Optional dependencies:
- Redis server (port 6832): Enables caching of LLM responses to significantly reduce API costs during iterative hypothesis generation
- s2orc-doc2json: Required for processing literature PDFs in HypoRefine workflows
- GROBID: Required for PDF preprocessing (see Literature Processing section)
Clone example datasets:
# For HypoGeniC examples
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# For HypoRefine/Union examples
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data
Dataset Format
Datasets must follow HuggingFace datasets format with specific naming conventions:
Required files:
<TASK>_train.json: Training data<TASK>_val.json: Validation data<TASK>_test.json: Test data
Required keys in JSON:
text_features_1throughtext_features_n: Lists of strings containing feature valueslabel: List of strings containing ground truth labels
Example (headline click prediction):
{
"headline_1": [
"What Up, Comet? You Just Got *PROBED*",
"Scientists Made a Breakthrough in Quantum Computing"
],
"headline_2": [
"Scientists Everywhere Were Holding Their Breath Today. Here's Why.",
"New Quantum Computer Achieves Milestone"
],
"label": [
"Headline 2 has more clicks than Headline 1",
"Headline 1 has more clicks than Headline 2"
]
}
Important notes:
- All lists must have the same length
- Label format must match your
extract_label()function output format - Feature keys can be customized to match your domain (e.g.,
review_text,post_content, etc.)
Configuration
Each task requires a config.yaml file specifying:
Required elements:
- Dataset paths (train/val/test)
- Prompt templates for:
- Observations generation
- Batched hypothesis generation
- Hypothesis inference
- Relevance checking
- Adaptive methods (for HypoRefine)
Template capabilities:
- Dataset placeholders for dynamic variable injection (e.g.,
${text_features_1},${num_hypotheses}) - Custom label extraction functions for domain-specific parsing
- Role-based prompt structure (system, user, assistant roles)
Configuration structure:
task_name: your_task_name
train_data_path: ./your_task_train.json
val_data_path: ./your_task_val.json
test_data_path: ./your_task_test.json
prompt_templates:
# Extra keys for reusable prompt components
observations: |
Feature 1: ${text_features_1}
Feature 2: ${text_features_2}
Observation: ${label}
# Required templates
batched_generation:
system: "Your system prompt here"
user: "Your user prompt with ${num_hypotheses} placeholder"
inference:
system: "Your inference system prompt"
user: "Your inference user prompt"
# Optional templates for advanced features
few_shot_baseline: {...}
is_relevant: {...}
adaptive_inference: {...}
adaptive_selection: {...}
Refer to references/config_template.yaml for a complete example configuration.
Literature Processing (HypoRefine/Union Methods)
To use literature-based hypothesis generation, you must preprocess PDF papers:
Step 1: Setup GROBID (first time only)
bash ./modules/setup_grobid.sh
Step 2: Add PDF files
Place research papers in literature/YOUR_TASK_NAME/raw/
Step 3: Process PDFs
# Start GROBID service
bash ./modules/run_grobid.sh
# Process PDFs for your task
cd examples
python pdf_preprocess.py --task_name YOUR_TASK_NAME
This converts PDFs to structured format for hypothesis extraction. Automated literature search will be supported in future releases.
CLI Usage
Hypothesis Generation
hypogenic_generation --help
Key parameters:
- Task configuration file path
- Model selection (API-based or local)
- Generation method (HypoGeniC, HypoRefine, or Union)
- Number of hypotheses to generate
- Output directory for hypothesis banks
Hypothesis Inference
hypogenic_inference --help
Key parameters:
- Task configuration file path
- Hypothesis bank file path
- Test dataset path
- Inference method (default or multi-hypothesis)
- Output file for results
Python API Usage
For programmatic control and custom workflows, use Hypogenic directly in your Python code:
Basic HypoGeniC Generation
from hypogenic import BaseTask
# Clone example datasets first
# git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# Load your task with custom extract_label function
task = BaseTask(
config_path="./data/your_task/config.yaml",
extract_label=lambda text: extract_your_label(text)
)
# Generate hypotheses
task.generate_hypotheses(
method="hypogenic",
num_hypotheses=20,
output_path="./output/hypotheses.json"
)
# Run inference
results = task.inference(
hypothesis_bank="./output/hypotheses.json",
test_data="./data/your_task/your_task_test.json"
)
HypoRefine/Union Methods
# For literature-integrated approaches
# git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data
# Generate with HypoRefine
task.generate_hypotheses(
method="hyporefine",
num_hypotheses=15,
literature_path="./literature/your_task/",
output_path="./output/"
)
# This generates 3 hypothesis banks:
# - HypoRefine (integrated approach)
# - Literature-only hypotheses
# - Literature∪HypoRefine (union)
Multi-Hypothesis Inference
from examples.multi_hyp_inference import run_multi_hypothesis_inference
# Test multiple hypotheses simultaneously
results = run_multi_hypothesis_inference(
config_path="./data/your_task/config.yaml",
hypothesis_bank="./output/hypotheses.json",
test_data="./data/your_task/your_task_test.json"
)
Custom Label Extraction
The extract_label() function is critical for parsing LLM outputs. Implement it based on your task:
def extract_label(llm_output: str) -> str:
"""Extract predicted label from LLM inference text.
Default behavior: searches for 'final answer:\s+(.*)' pattern.
Customize for your domain-specific output format.
"""
import re
match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
if match:
return match.group(1).strip()
return llm_output.strip()
Important: Extracted labels must match the format of label values in your dataset for correct accuracy calculation.
Workflow Examples
Example 1: Data-Driven Hypothesis Generation (HypoGeniC)
Scenario: Detecting AI-generated content without prior theoretical framework
Steps:
- Prepare dataset with text samples and labels (human vs. AI-generated)
- Create
config.yamlwith appropriate prompt templates - Run hypothesis generation:
hypogenic_generation --config config.yaml --method hypogenic --num_hypotheses 20 - Run inference on test set:
hypogenic_inference --config config.yaml --hypotheses output/hypotheses.json --test_data data/test.json - Analyze results for patterns like formality, grammatical precision, and tone differences
Example 2: Literature-Informed Hypothesis Testing (HypoRefine)
Scenario: Deception detection in hotel reviews building on existing research
Steps:
- Collect 10 relevant papers on linguistic deception cues
- Prepare dataset with genuine and fraudulent reviews
- Configure
config.yamlwith literature processing and data generation templates - Run HypoRefine:
hypogenic_generation --config config.yaml --method hyporefine --papers papers/ --num_hypotheses 15 - Test hypotheses examining pronoun frequency, detail specificity, and other linguistic patterns
- Compare literature-based and data-driven hypothesis performance
Example 3: Comprehensive Hypothesis Coverage (Union Method)
Scenario: Mental stress detection maximizing hypothesis diversity
Steps:
- Generate literature hypotheses from mental health research papers
- Generate data-driven hypotheses from social media posts
- Run Union method to combine and deduplicate:
hypogenic_generation --config config.yaml --method union --literature_hypotheses lit_hyp.json - Inference captures both theoretical constructs (posting behavior changes) and data patterns (emotional language shifts)
Performance Optimization
Caching: Enable Redis caching to reduce API costs and computation time for repeated LLM calls
Parallel Processing: Leverage multiple workers for large-scale hypothesis generation and testing
Adaptive Refinement: Use challenging examples to iteratively improve hypothesis quality
Expected Outcomes
Research using hypogenic has demonstrated:
- 14.19% accuracy improvement in AI-content detection tasks
- 7.44% accuracy improvement in deception detection tasks
- 80-84% of hypothesis pairs offering distinct, non-redundant insights
- High helpfulness ratings from human evaluators across multiple research domains
Troubleshooting
Issue: Generated hypotheses are too generic
Solution: Refine prompt templates in config.yaml to request more specific, testable hypotheses
Issue: Poor inference performance Solution: Ensure dataset has sufficient training examples, adjust hypothesis generation parameters, or increase number of hypotheses
Issue: Label extraction failures
Solution: Implement custom extract_label() function for domain-specific output parsing
Issue: GROBID PDF processing fails
Solution: Ensure GROBID service is running (bash ./modules/run_grobid.sh) and PDFs are valid research papers
Creating Custom Tasks
To add a new task or dataset to Hypogenic:
Step 1: Prepare Your Dataset
Create three JSON files following the required format:
your_task_train.jsonyour_task_val.jsonyour_task_test.json
Each file must have keys for text features (text_features_1, etc.) and label.
Step 2: Create config.yaml
Define your task configuration with:
- Task name and dataset paths
- Prompt templates for observations, generation, inference
- Any extra keys for reusable prompt components
- Placeholder variables (e.g.,
${text_features_1},${num_hypotheses})
Step 3: Implement extract_label Function
Create a custom label extraction function that parses LLM outputs for your domain:
from hypogenic import BaseTask
def extract_my_label(llm_output: str) -> str:
"""Custom label extraction for your task.
Must return labels in same format as dataset 'label' field.
"""
# Example: Extract from specific format
if "Final prediction:" in llm_output:
return llm_output.split("Final prediction:")[-1].strip()
# Fallback to default pattern
import re
match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
return match.group(1).strip() if match else llm_output.strip()
# Use your custom task
task = BaseTask(
config_path="./your_task/config.yaml",
extract_label=extract_my_label
)
Step 4: (Optional) Process Literature
For HypoRefine/Union methods:
- Create
literature/your_task_name/raw/directory - Add relevant research paper PDFs
- Run GROBID preprocessing
- Process with
pdf_preprocess.py
Step 5: Generate and Test
Run hypothesis generation and inference using CLI or Python API:
# CLI approach
hypogenic_generation --config your_task/config.yaml --method hypogenic --num_hypotheses 20
hypogenic_inference --config your_task/config.yaml --hypotheses output/hypotheses.json
# Or use Python API (see Python API Usage section)
Repository Structure
Understanding the repository layout:
hypothesis-generation/
├── hypogenic/ # Core package code
├── hypogenic_cmd/ # CLI entry points
├── hypothesis_agent/ # HypoRefine agent framework
├── literature/ # Literature processing utilities
├── modules/ # GROBID and preprocessing modules
├── examples/ # Example scripts
│ ├── generation.py # Basic HypoGeniC generation
│ ├── union_generation.py # HypoRefine/Union generation
│ ├── inference.py # Single hypothesis inference
│ ├── multi_hyp_inference.py # Multiple hypothesis inference
│ └── pdf_preprocess.py # Literature PDF processing
├── data/ # Example datasets (clone separately)
├── tests/ # Unit tests
└── IO_prompting/ # Prompt templates and experiments
Key directories:
- hypogenic/: Main package with BaseTask and generation logic
- examples/: Reference implementations for common workflows
- literature/: Tools for PDF processing and literature extraction
- modules/: External tool integrations (GROBID, etc.)
Related Publications
HypoBench (2025)
Liu, H., Huang, S., Hu, J., Zhou, Y., & Tan, C. (2025). HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation. arXiv preprint arXiv:2504.11524.
- Paper: https://arxiv.org/abs/2504.11524
- Description: Benchmarking framework for systematic evaluation of hypothesis generation methods
BibTeX:
@misc{liu2025hypobenchsystematicprincipledbenchmarking,
title={HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation},
author={Haokun Liu and Sicong Huang and Jingyu Hu and Yangqiaoyu Zhou and Chenhao Tan},
year={2025},
eprint={2504.11524},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.11524},
}
Literature Meets Data (2024)
Liu, H., Zhou, Y., Li, M., Yuan, C., & Tan, C. (2024). Literature Meets Data: A Synergistic Approach to Hypothesis Generation. arXiv preprint arXiv:2410.17309.
- Paper: https://arxiv.org/abs/2410.17309
- Code: https://github.com/ChicagoHAI/hypothesis-generation
- Description: Introduces HypoRefine and demonstrates synergistic combination of literature-based and data-driven hypothesis generation
BibTeX:
@misc{liu2024literaturemeetsdatasynergistic,
title={Literature Meets Data: A Synergistic Approach to Hypothesis Generation},
author={Haokun Liu and Yangqiaoyu Zhou and Mingxuan Li and Chenfei Yuan and Chenhao Tan},
year={2024},
eprint={2410.17309},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.17309},
}
Hypothesis Generation with Large Language Models (2024)
Zhou, Y., Liu, H., Srivastava, T., Mei, H., & Tan, C. (2024). Hypothesis Generation with Large Language Models. In Proceedings of EMNLP Workshop of NLP for Science.
- Paper: https://aclanthology.org/2024.nlp4science-1.10/
- Description: Original HypoGeniC framework for data-driven hypothesis generation
BibTeX:
@inproceedings{zhou2024hypothesisgenerationlargelanguage,
title={Hypothesis Generation with Large Language Models},
author={Yangqiaoyu Zhou and Haokun Liu and Tejes Srivastava and Hongyuan Mei and Chenhao Tan},
booktitle = {Proceedings of EMNLP Workshop of NLP for Science},
year={2024},
url={https://aclanthology.org/2024.nlp4science-1.10/},
}
Additional Resources
Official Links
- GitHub Repository: https://github.com/ChicagoHAI/hypothesis-generation
- PyPI Package: https://pypi.org/project/hypogenic/
- License: MIT License
- Issues & Support: https://github.com/ChicagoHAI/hypothesis-generation/issues
Example Datasets
Clone these repositories for ready-to-use examples:
# HypoGeniC examples (data-driven only)
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# HypoRefine/Union examples (literature + data)
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data
Community & Contributions
- Contributors: 7+ active contributors
- Stars: 89+ on GitHub
- Topics: research-tool, interpretability, hypothesis-generation, scientific-discovery, llm-application
For contributions or questions, visit the GitHub repository and check the issues page.
Local Resources
references/
config_template.yaml - Complete example configuration file with all required prompt templates and parameters. This includes:
- Full YAML structure for task configuration
- Example prompt templates for all methods
- Placeholder variable documentation
- Role-based prompt examples
scripts/
Scripts directory is available for:
- Custom data preparation utilities
- Format conversion tools
- Analysis and evaluation scripts
- Integration with external tools
assets/
Assets directory is available for:
- Example datasets and templates
- Sample hypothesis banks
- Visualization outputs
- Documentation supplements
