>>
SAP-RPT-1-OSS Predictor
SAP-RPT-1-OSS is SAP's open source tabular foundation model (Apache 2.0) for predictions on structured business data. Unlike LLMs that predict text, RPT-1 predicts field values in table rows using in-context learning—no model training required.
Repository: https://github.com/SAP-samples/sap-rpt-1-oss Model: https://huggingface.co/SAP/sap-rpt-1-oss
Setup
1. Install Package
pip install git+https://github.com/SAP-samples/sap-rpt-1-oss
2. Hugging Face Authentication
Model weights require HF login and license acceptance:
# Install HF CLI
pip install huggingface_hub
# Login (creates ~/.huggingface/token)
huggingface-cli login
Then accept model terms at: https://huggingface.co/SAP/sap-rpt-1-oss
3. Hardware Requirements
| Config | GPU Memory | Context Size | Bagging | Use Case |
|---|---|---|---|---|
| Optimal | 80GB (A100) | 8192 | 8 | Production, best accuracy |
| Standard | 40GB (A6000) | 4096 | 4 | Good balance |
| Minimal | 24GB (RTX 4090) | 2048 | 2 | Development |
| CPU | N/A | 1024 | 1 | Testing only (slow) |
Quick Start
Classification (Customer Churn, Payment Default)
import pandas as pd
from sap_rpt_oss import SAP_RPT_OSS_Classifier
# Load SAP data export
df = pd.read_csv("sap_customers.csv")
X = df.drop(columns=["CHURN_STATUS"])
y = df["CHURN_STATUS"]
# Split data
X_train, X_test = X[:400], X[400:]
y_train, y_test = y[:400], y[400:]
# Initialize and predict
clf = SAP_RPT_OSS_Classifier(max_context_size=4096, bagging=4)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
Regression (Delivery Delay Days, Demand Quantity)
from sap_rpt_oss import SAP_RPT_OSS_Regressor
reg = SAP_RPT_OSS_Regressor(max_context_size=4096, bagging=4)
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)
Core Workflow
- Extract SAP data → Export to CSV from relevant tables
- Prepare dataset → Include 50-500 rows with known outcomes
- Rename fields → Use semantic names (see Data Preparation)
- Run prediction → Fit on training data, predict on new data
- Interpret results → Probabilities for classification, values for regression
SAP Use Cases
See references/sap-use-cases.md for detailed extraction queries:
- FI-AR: Payment default probability (BSID, BSAD, KNA1)
- FI-GL: Journal entry anomaly detection (ACDOCA, BKPF)
- SD: Delivery delay prediction (VBAK, VBAP, LIKP)
- SD: Customer churn likelihood (VBRK, VBRP, KNA1)
- MM: Vendor performance scoring (EKKO, EKPO, EBAN)
- PP: Production delay risk (AFKO, AFPO)
Data Preparation
Semantic Column Names (Important!)
RPT-1-OSS uses an LLM to embed column names and values. Descriptive names improve accuracy:
# Good: Model understands business context
CUSTOMER_CREDIT_LIMIT, DAYS_SINCE_LAST_ORDER, PAYMENT_DELAY_DAYS
# Bad: Generic names lose semantic value
COL1, VALUE, FIELD_A
Use scripts/prepare_sap_data.py to rename SAP technical fields:
from scripts.prepare_sap_data import SAPDataPrep
prep = SAPDataPrep()
df = prep.rename_sap_fields(df) # BUKRS → COMPANY_CODE, etc.
Dataset Size
- Minimum: 50 training examples
- Recommended: 200-500 examples
- Maximum context: 8192 rows (GPU dependent)
Scripts
scripts/rpt1_oss_predict.py- Local model prediction wrapperscripts/prepare_sap_data.py- SAP field renaming and SQL templatesscripts/batch_predict.py- Chunked processing for large datasets
Alternative: RPT Playground API
For users with SAP access, the closed-source RPT-1 is available via API:
from scripts.rpt1_api import RPT1Client
client = RPT1Client(token="YOUR_RPT_TOKEN") # Get from rpt-playground.sap.com
result = client.predict(data="data.csv", target_column="TARGET", task_type="classification")
See references/api-reference.md for RPT Playground API documentation.
Limitations
- Tabular data only (no images, text documents)
- Requires labeled examples for in-context learning
- First prediction is slow (model loading)
- GPU strongly recommended for production use
