SLURM Calculator
The SLURM calculator allows FZ to execute calculations on HPC clusters using the SLURM Workload Manager.
Overview
SLURM (Simple Linux Utility for Resource Management) is a widely-used job scheduler for HPC clusters. The FZ SLURM calculator provides seamless integration for submitting and managing jobs on SLURM-enabled systems.
URI Format
Components
- user (optional): Username for remote SLURM clusters
- host (optional): Hostname for remote SLURM clusters
- port (optional): SSH port for remote access (default: 22)
- partition (required): SLURM partition name (e.g., compute, gpu, debug)
- script: Shell command or script to execute
Local SLURM Execution
For local SLURM clusters (when FZ runs on the cluster login node):
import fz
results = fz.fzr(
"input.txt",
{"param1": [1, 2, 3]},
model,
calculators="slurm://:compute/bash script.sh",
results_dir="results"
)
Example with GPU partition:
Remote SLURM Execution
For remote SLURM clusters accessed via SSH:
With custom SSH port:
Features
Automatic Job Management
- Submits jobs to SLURM scheduler using
sbatch - Monitors job status using
squeue - Retrieves results when jobs complete
- Handles job failures and retries
Interrupt Handling
Press Ctrl+C to gracefully terminate SLURM jobs:
- Cancels all running SLURM jobs using
scancel - Cleans up temporary files
- Preserves completed results
File Transfer
For remote execution:
- Automatically uploads input files to the cluster
- Downloads output files after job completion
- Uses SSH/SCP for secure file transfer
Configuration
SLURM Script Headers
The calculator automatically adds appropriate SLURM directives to job scripts:
#!/bin/bash
#SBATCH --job-name=fz_case_001
#SBATCH --output=output_%j.log
#SBATCH --error=error_%j.log
#SBATCH --partition=compute
Custom SLURM Options
You can specify additional SLURM options in your model configuration:
model = {
"varprefix": "$",
"slurm_options": {
"nodes": 1,
"ntasks": 4,
"time": "01:00:00",
"mem": "8GB"
}
}
Examples
Basic Parametric Study
import fz
model = {
"varprefix": "$",
"output": {
"result": "grep 'Result:' output.txt | awk '{print $2}'"
}
}
results = fz.fzr(
"simulation.input",
{
"temperature": [300, 350, 400, 450],
"pressure": [1.0, 2.0, 3.0]
},
model,
calculators="slurm://:compute/bash run_simulation.sh",
results_dir="slurm_results",
n_parallel=6 # Submit up to 6 jobs simultaneously
)
Multiple Partitions
Use different partitions for different job types:
calculators = [
"slurm://:compute/bash short_job.sh", # Quick jobs
"slurm://:gpu/bash gpu_job.sh" # GPU-intensive jobs
]
Remote HPC Cluster
Requirements
- SLURM commands must be available:
sbatch,squeue,scancel - For remote execution: SSH access with key-based authentication
- Python
paramikopackage for remote SSH connections
Limitations
- Requires SLURM workload manager installed
- Job scheduling may introduce delays depending on cluster load
- Remote execution requires SSH key authentication (password auth not supported)
Troubleshooting
Job Submission Fails
Check that SLURM is available:
Partition Not Found
Verify partition names:
Remote Connection Issues
Test SSH connection:
Ensure SSH key authentication is configured.
See Also
- SSH Calculator - For remote execution without SLURM
- Local Shell Calculator - For local execution
- Environment Variables - Configuration options