Comparison of cluster software

{{short description|None}}

The following tables compare general and technical information for notable computer cluster software. This software can be grossly separated in four categories: Job scheduler, nodes management, nodes installation and integrated stack (all the above).

General information

{{sort-under}}

class="wikitable sortable sort-under"
Software

! Maintainer

! Category

! Development status

! Latest release

! ArchitectureOCS

! High-Performance / High-Throughput Computing

! License

! Platforms supported

! Cost

! {{verth|Paid support
available}}

{{rh}} | Amoeba

|

|

| {{No}} active development

|

|

|

| {{Open source|MIT}}

|

|

|

{{rh}} | Base One Foundation Component Library

|

|

|

|

|

|

| {{Proprietary}}

|

|

|

{{rh}} class="table-rh" | DIET

| INRIA, SysFera, Open Source

| All in one

|

|

| GridRPC, SPMD, Hierarchical and distributed architecture, CORBA

| HTC/HPC

| {{Open source|CeCILL}}

| Unix-like, Mac OS X, AIX

| {{Free}}

|

{{rh}} | [https://dh2i.com/dxenterprise DxEnterprise]

| [https://dh2i.com DH2i]

| Nodes management

| {{Active|Actively}} developed

| v23.0

|

|

| {{Proprietary}}

| Windows 2012R2/2016/2019/2022 and 8+, RHEL 7/8/9, CentOS 7, Ubuntu 16.04/18.04/20.04/22.04, SLES 15.4

|Cost

| {{Yes}}

{{rh}} class="table-rh" | Enduro/X

| Mavimax, Ltd.

| Job/Data Scheduler

| {{Active|Actively}} developed

|

| SOA Grid

| HTC/HPC/HA

| GPLv2 or Commercial

| Linux, FreeBSD, MacOS, Solaris, AIX

| Free / Cost

| {{Yes}}

{{rh}} class="table-rh" | Ganglia

|

| Monitoring

| {{Active|Actively}} developed

|{{wikidata|property|preferred|references|edit@end|Q1193169|P348|P548=Q2804309}} {{Start date and age|{{wikidata|qualifier|preferred|single|Q1193169|P348|P548=Q2804309|P577}}}}

|

|

| {{BSD-lic}}

| Unix, Linux, Microsoft Windows NT/XP/2000/2003/2008, FreeBSD, NetBSD, OpenBSD, DragonflyBSD, Mac OS X, Solaris, AIX, IRIX, Tru64, HPUX.

| {{Free}}

|

{{rh}} | Grid MP

| Univa (formerly United Devices)

| Job Scheduler

| {{No}} active development

|

| Distributed master/worker

| HTC/HPC

| {{Proprietary}}

| Windows, Linux, Mac OS X, Solaris

| {{Nonfree|Cost}}

|

{{rh}} class="table-rh" | Apache Mesos

| Apache

|

| {{Active|Actively}} developed

|

|

|

| {{Open source|Apache license v2.0}}

| Linux

| {{Free}}

| {{Yes}}

{{rh}} class="table-rh" | Moab Cluster Suite

| Adaptive Computing

| Job Scheduler

| {{Active|Actively}} developed

|

|

| HPC

| {{Proprietary}}

| Linux, Mac OS X, Windows, AIX, OSF/Tru-64, Solaris, HP-UX, IRIX, FreeBSD & other UNIX platforms

| {{Nonfree|Cost}}

| {{Yes}}

{{rh}} class="table-rh" | NetworkComputer

| Runtime Design Automation

|

| {{Active|Actively}} developed

|

|

| HTC/HPC

| {{Proprietary}}

| Unix-like, Windows

| {{Nonfree|Cost}}

|

{{rh}} class="table-rh" | OpenHPC

| OpenHPC project

| all in one

| {{Active|Actively}} developed

| v2.61 {{Start date and age|2023|02|02}}

|

| HPC

|

| Linux (CentOS / OpenSUSE Leap)

| {{Free}}

| {{No}}

{{rh}} class="table-rh" | OpenLava

| {{CNone|None.}} Formerly Teraproc

| Job Scheduler

| Halted by injunction

|

| Master/Worker, multiple admin/submit nodes

| HTC/HPC

| Illegal due to being a pirated version of IBM Spectrum LSF

| Linux

| {{n/a|Not legally available}}

| {{No}}

{{rh}} class="table-rh" | PBS Pro

| Altair

| Job Scheduler

| {{Active|Actively}} developed

|

| Master/worker distributed with fail-over

| HPC/HTC

| AGPL or Proprietary

| Linux, Windows

| {{Free}} or Cost

| {{Yes}}

{{rh}} class="table-rh" | Proxmox Virtual Environment

| Proxmox Server Solutions

| Complete

| {{Active|Actively}} developed

|

|

|

| {{Open source|AGPL v3}}

| Linux, Windows, other operating systems are known to work and are community supported

| {{Free}}

| {{Yes}}

{{rh}} | Rocks Cluster Distribution

| Open Source/NSF grant

| All in one

| {{Active|Actively}} developed

|{{wikidata|property|references|edit|Q972850|P348|P548=Q2804309}} (Manzanita) {{Start date and age|{{wikidata|qualifier|single|Q972850|P348|P548=Q2804309|P577}}}}

|

| HTC/HPC

| {{Open source}}

| CentOS

| {{Free}}

|

{{rh}} | Popular Power

|

|

|

|

|

|

|

|

|

|

{{rh}} | ProActive

| INRIA, ActiveEon, Open Source

| All in one

| {{Active|Actively}} developed

|

| Master/Worker, SPMD, Distributed Component Model, Skeletons

| HTC/HPC

| {{GPL-lic}}

| Unix-like, Windows, Mac OS X

| {{Free}}

|

{{rh}} | RPyC

| Tomer Filiba

|

| {{Active|Actively}} developed

|

|

|

| {{Open source|MIT License}}

| *nix/Windows

| {{Free}}

|

{{rh}} | SLURM

| SchedMD

| Job Scheduler

| {{Active|Actively}} developed

| v23.11.3 {{Start date and age|2024|01|24}}

|

| HPC/HTC

| {{GPL-lic}}

| Linux/*nix

| {{Free}}

| {{Yes}}

{{rh}} class="table-rh" | Spectrum LSF

| IBM

| Job Scheduler

| {{Active|Actively}} developed

|

| Master node with failover/exec clients, multiple admin/submit nodes, Suite addOns

| HPC/HTC

| {{Proprietary}}

| Unix, Linux, Windows

| {{Nonfree|Cost}} and Academic - model - Academic, Express, Standard, Advanced and Suites

| {{Yes}}

{{rh}} | Oracle Grid Engine (Sun Grid Engine, SGE)

| Altair

| Job Scheduler

| active Development moved to Altair Grid Engine

|

| Master node/exec clients, multiple admin/submit nodes

| HPC/HTC

| {{Proprietary}}

| *nix/Windows

| {{Nonfree|Cost}}

|

{{rh}} | Some Grid Engine / Son of Grid Engine / Sun Grid Engine

| daimh

| Job Scheduler

| {{Active|Actively}} developed (stable/maintenance)

|

| Master node/exec clients, multiple admin/submit nodes

| HPC/HTC

| {{Open source|SISSL}}

| *nix

| {{Free}}

| {{No}}

{{rh}} | SynfiniWay

| Fujitsu

|

| {{Active|Actively}} developed

|

|

| HPC/HTC

| {{dunno}}

| Unix, Linux, Windows

| {{Nonfree|Cost}}

|

{{rh}} class="table-rh" | Techila Distributed Computing Engine

| [https://www.techilatechnologies.com/ Techila Technologies Ltd.]

| All in one

| {{Active|Actively}} developed

|

| Master/worker distributed

| HTC

| {{Proprietary}}

| Linux, Windows

| {{Nonfree|Cost}}

| {{Yes}}

{{rh}} | TORQUE Resource Manager

| Adaptive Computing

| Job Scheduler

| {{Active|Actively}} developed

|

|

|

| {{Proprietary}}

| Linux, *nix

| {{Nonfree|Cost}}

| {{Yes}}

{{rh}} | TrinityX

| [https://www.clustervision.com/ ClusterVision]

| All in one

| {{Active|Actively}} developed

| v15 {{Start date and age|2025|02|27}}

|

| HPC/HTC

| {{GPL-lic}} v3

| Linux/*nix

| {{Free}}

| {{Yes}}

{{rh}} class="table-rh" | UniCluster

| Univa

| All in One

| Functionality and development moved to UniCloud (see above)

|

|

|

|

|

| {{Free}}

| {{Yes}}

{{rh}} | UNICORE

|

|

|

|

|

|

|

|

|

|

{{rh}} | Xgrid

| Apple Computer

|

|

|

|

|

|

|

|

|

{{rh}} | Warewulf

|

|Provision and clusters management

| {{Active|Actively}} developed

|v4.4.1 {{Start date and age|2023|07|06}}

|

|HPC

|{{Open source}}

|Linux

| {{Free}}

|

{{rh}} | xCAT

|

|Provision and clusters management

| {{Active|Actively}} developed

|v2.16.5 {{Start date and age|2023|03|07}}

|

|HPC

| Eclipse Public License

|Linux

| {{Free}}

|

Software

! Maintainer

! Category

! Development status

!Latest release

! Architecture

! High-Performance/ High-Throughput Computing

! License

! Platforms supported

! Cost

! {{verth|Paid support
available}}

Table explanation

  • Software: The name of the application that is described

Technical information

{{sort-under}}

class="wikitable sortable sort-under"

! Software

! Implementation Language

! Authentication

! Encryption

! Integrity

! Global File System

! Global File System + Kerberos

! Heterogeneous/ Homogeneous exec node

! Jobs priority

! Group priority

! Queue type

! SMP aware

! Max exec node

! Max job submitted

! CPU scavenging

! Parallel job

! Job checkpointing

! {{verth|Python
interface}}

{{rh}} class="table-rh" | Enduro/X

| C/C++

| OS Authentication

| GPG, AES-128, SHA1

| {{CNone|None}}

| {{Any}} cluster Posix FS (gfs, gpfs, ocfs, etc.)

| {{Any}} cluster Posix FS (gfs, gpfs, ocfs, etc.)

| Heterogeneous

| OS Nice level

| OS Nice level

| SOA Queues, FIFO

| {{Yes}}

| OS Limits

| OS Limits

| {{Yes}}

| {{Yes}}

| {{No}}

| {{No}}

{{rh}} class="table-rh" | HTCondor

| C++

| GSI, SSL, Kerberos, Password, File System, Remote File System, Windows, Claim To Be, Anonymous

| None, Triple DES, BLOWFISH

| None, MD5

| None, NFS, AFS

| {{unofficial|Not official, hack with ACL and NFS4}}

| Heterogeneous

| {{Yes}}

| {{Yes}}

| Fair-share with some programmability

| basic (hard separation into different node)

| tested ~10000?

| tested ~100000?

| {{Yes}}

| MPI, OpenMP, PVM

| {{Yes}}

| {{Yes}}https://github.com/dasayan05/condor, and [https://htcondor.readthedocs.io/en/latest/apis/python-bindings/install.html native Python Binding]

{{rh}} class="table-rh" | PBS Pro

| C/Python

| OS Authentication, Munge

|

|

| {{Any}}, e.g., NFS, Lustre, GPFS, AFS

| Limited availability

| Heterogeneous

| {{Yes}}

| {{Yes}}

| Fully configurable

| {{Yes}}

| tested ~50,000

| Millions

| {{Yes}}

| MPI, OpenMP

| {{Yes}}

| {{Yes}}https://github.com/prisms-center/pbs

{{rh}} class="table-rh" | OpenLava

| C/C++

| OS authentication

| {{CNone|None}}

|

| NFS

|

| Heterogeneous Linux

| {{Yes}}

| {{Yes}}

| Configurable

| {{Yes}}

|

|

| {{Yes}}, supports preemption based on priority

| {{Yes}}

| {{Yes}}

| {{No}}

{{rh}} class="table-rh" | Slurm

| C

| Munge, None, Kerberos

|

|

|

|

| Heterogeneous

| {{Yes}}

| {{Yes}}

| Multifactor Fair-share

| {{Yes}}

| tested 120k

| tested 100k

| {{No}}

| {{Yes}}

| {{Yes}}

| {{Yes}}[https://github.com/PySlurm/pyslurm PySlurm]

{{rh}} class="table-rh" | Spectrum LSF

| C/C++

| Multiple - OS Authentication/Kerberos

| {{Optional}}

| {{Optional}}

| {{Any}} - GPFS/Spectrum Scale, NFS, SMB

| {{Any}} - GPFS/Spectrum Scale, NFS, SMB

| Heterogeneous - HW and OS agnostic (AIX, Linux or Windows)

| Policy based - no queue to computenode binding

| Policy based - no queue to computegroup binding

| Batch, interactive, checkpointing, parallel and combinations

| {{Yes}} and GPU aware (GPU License free)

| > 9.000 compute hots

| > 4 mio jobs a day

| {{Yes}}, supports preemption based on priority, supports checkpointing/resume

| {{Yes}}, fx parallel submissions for job collaboration over fx MPI

| {{Yes}}, with support for user, kernel or library level checkpointing environments

| {{Yes}}https://github.com/IBMSpectrumComputing/lsf-python-api

{{rh}} | Torque

| C

| SSH, munge

|

|

| None, any

|

| Heterogeneous

| {{Yes}}

| {{Yes}}

| Programmable

| {{Yes}}

| tested

| tested

| {{Yes}}

| {{Yes}}

| {{Yes}}

| {{Yes}}https://github.com/jkitchin/python-torque

Software

! Implementation Language

! Authentication

! Encryption

! Integrity

! Global File System

! Global File System + Kerberos

! Heterogeneous/ Homogeneous exec node

! Jobs priority

! Group priority

! Queue type

! SMP aware

! Max exec node

! Max job submitted

! CPU scavenging

! Parallel job

! Job checkpointing

! {{verth|Python
interface}}

Table Explanation

  • Software: The name of the application that is described
  • SMP aware:
  • basic: hard split into multiple virtual host
  • basic+: hard split into multiple virtual host with some minimal/incomplete communication between virtual host on the same computer
  • dynamic: split the resource of the computer (CPU/Ram) on demand

See also

References