OpenCL#OpenCL 3.0
{{Short description|Open standard for programming heterogenous computing systems, such as CPUs or GPUs}}
{{Distinguish|OpenGL}}
{{For|the cryptographic library initially known as OpenCL|Botan (programming library)}}
{{Technical|date=October 2021}}
{{Use mdy dates|date=October 2018}}
{{Infobox software
| name = OpenCL API
| title = OpenCL API
| logo = OpenCL logo.svg
| logo caption =
| logo size = 200px
| logo alt = OpenCL logo
| screenshot =
| caption =
| screenshot size =
| screenshot alt =
| collapsible =
| author = Apple Inc.
| developer = Khronos Group
| released = {{Start date and age|2009|08|28}}
| discontinued =
| latest release version = {{wikidata|property|preferred|references|edit|P348|P548=Q2804309}}
| latest release date = {{Start date and age|{{wikidata|qualifier|preferred|single|P348|P548=Q2804309|P577}}|df=yes}}
| latest preview version =
| latest preview date =
| programming language = C with C++ bindings
| operating system = Android (vendor dependent),{{cite web |title=Android Devices With OpenCL support |url=https://docs.google.com/a/arrayfire.com/spreadsheets/d/1Mpzfl2NmLUVSAjIph77-FOsJeuyD9Xjha89r5iHw1hI/edit?pli=1#gid=0 |website=Google Docs |publisher=ArrayFire |access-date=April 28, 2015}} FreeBSD,{{cite web |title=FreeBSD Graphics/OpenCL |url=https://wiki.freebsd.org/Graphics/OpenCL |publisher=FreeBSD |access-date=December 23, 2015}} Linux, macOS (via Pocl), Windows
| platform = ARMv7, ARMv8,{{cite web |title=Conformant Products |url=https://www.khronos.org/conformance/adopters/conformant-products/opencl |publisher=Khronos Group |access-date=May 9, 2015}} Cell, IA-32, Power, x86-64
| size =
| language =
| language count =
| language footnote =
| genre = Heterogeneous computing API
| license = OpenCL specification license
| alexa =
| website = {{URL|https://www.khronos.org/opencl/}}
| standard =
}}
{{Infobox programming language
| name = OpenCL C/C++ and C++ for OpenCL
| logo =
| logo caption =
| paradigm = Imperative (procedural), structured, (C++ only) object-oriented, generic programming
| family = C
| designer =
| developer =
| released =
| latest release version = OpenCL C++ 1.0 revision V2.2–11{{cite web|last1=Sochacki|first1=Bartosz|title=The OpenCL C++ 1.0 Specification|url=https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.pdf|publisher=Khronos OpenCL Working Group|access-date=Jul 19, 2019|date=Jul 19, 2019}}
| latest release date = {{Start date and age|2021|12|20}}
| latest preview version =
| latest preview date =
| typing = Static, weak, manifest, nominal
| scope =
| programming language = Implementation specific
| platform =
| operating_system =
| license =
| file_ext = .cl .clcpp
| file format =
| website =
| implementations = AMD, Gallium Compute, IBM, Intel NEO, Intel SDK, Texas Instruments, Nvidia, POCL, Arm
| dialects =
| influenced_by = C99, CUDA, C++14, C++17
| influenced =
}}
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a programming language (based on C99) for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.
OpenCL is an open standard maintained by the Khronos Group, a non-profit, open standards organisation. Conformant implementations (passed the Conformance Test Suite) are available from a range of companies including AMD, Arm, Cadence, Google, Imagination, Intel, Nvidia, Qualcomm, Samsung, SPI and Verisilicon.{{cite web |title=Conformant Companies |url=https://www.khronos.org/conformance/adopters/conformant-companies#opencl |publisher=Khronos Group |access-date=September 19, 2024}}{{cite web |last1=Gianelli |first1=Silvia E. |title=Xilinx SDAccel Development Environment for OpenCL, C, and C++, Achieves Khronos Conformance |url=http://www.prnewswire.com/news-releases/xilinx-sdaccel-development-environment-for-opencl-c-and-c-achieves-khronos-conformance-300020285.html |website=PR Newswire |publisher=Xilinx |access-date=April 27, 2015 |date=January 14, 2015}}
Overview
OpenCL views a computing system as consisting of a number of compute devices, which might be central processing units (CPUs) or "accelerators" such as graphics processing units (GPUs), attached to a host processor (a CPU). It defines a C-like language for writing programs. Functions executed on an OpenCL device are called "kernels".{{cite web |last1=Howes |first1=Lee |title=The OpenCL Specification Version: 2.1 Document Revision: 23 |url=https://www.khronos.org/registry/cl/specs/opencl-2.1.pdf |publisher=Khronos OpenCL Working Group |access-date=November 16, 2015 |date=November 11, 2015}}{{rp|17}} A single compute device typically consists of several compute units, which in turn comprise multiple processing elements (PEs). A single kernel execution can run on all or many of the PEs in parallel. How a compute device is subdivided into compute units and PEs is up to the vendor; a compute unit can be thought of as a "core", but the notion of core is hard to define across all the types of devices supported by OpenCL (or even within the category of "CPUs"),{{r|Gaster}}{{rp|49–50}} and the number of compute units may not correspond to the number of cores claimed in vendors' marketing literature (which may actually be counting SIMD lanes).{{Cite web |title=An Introduction to the OpenCL Programming Model |first1=Jonathan |last1=Tompson |first2=Kristofer |last2=Schlachter |url=http://www.cs.nyu.edu/~lerner/spring12/Preso07-OpenCL.pdf |year=2012 |access-date=July 6, 2015 |publisher=New York University Media Research Lab |archive-url=https://web.archive.org/web/20150706143727/http://www.cs.nyu.edu/~lerner/spring12/Preso07-OpenCL.pdf |archive-date=July 6, 2015 |url-status=dead }}
In addition to its C-like programming language, OpenCL defines an application programming interface (API) that allows programs running on the host to launch kernels on the compute devices and manage device memory, which is (at least conceptually) separate from host memory. Programs in the OpenCL language are intended to be compiled at run-time, so that OpenCL-using applications are portable between implementations for various host devices.{{cite journal |first1=John E. |last1=Stone |first2=David |last2=Gohara |first3=Guochin |last3=Shi |year=2010 |title=OpenCL: a parallel programming standard for heterogeneous computing systems |journal=Computing in Science & Engineering |doi=10.1109/MCSE.2010.69 |pmid=21037981 |volume=12 |issue=3 |pages=66–73|pmc=2964860 |bibcode=2010CSE....12c..66S }} The OpenCL standard defines host APIs for C and C++; third-party APIs exist for other programming languages and platforms such as Python,{{Cite journal | last1 = Klöckner | first1 = Andreas | last2 = Pinto | first2 = Nicolas | last3 = Lee | first3 = Yunsup | last4 = Catanzaro | first4 = Bryan | last5 = Ivanov | first5 = Paul | last6 = Fasih | first6 = Ahmed | year = 2012 | title = PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation | journal = Parallel Computing | volume = 38 | issue = 3 | pages = 157–174 | doi = 10.1016/j.parco.2011.09.001 | arxiv = 0911.3456 }} Java, Perl,{{cite web|url = https://metacpan.org/pod/OpenCL |title = OpenCL - Open Computing Language Bindings |access-date = August 18, 2018 |publisher = metacpan.org }} D{{cite web|url = https://code.dlang.org/packages/opencl|title = D binding for OpenCL|access-date = June 29, 2021|publisher = dlang.org}} and .NET.{{cite book |title=Heterogeneous Computing with OpenCL: Revised OpenCL 1.2 Edition |first1=Benedict |last1=Gaster |first2=Lee |last2=Howes |first3=David R. |last3=Kaeli |author-link3=David R. Kaeli |first4=Perhaad |last4=Mistry |first5=Dana |last5=Schaa |year=2012 |publisher=Morgan Kaufmann}}{{rp|15}} An implementation of the OpenCL standard consists of a library that implements the API for C and C++, and an OpenCL C compiler for the compute devices targeted.
In order to open the OpenCL programming model to other languages or to protect the kernel source from inspection, the Standard Portable Intermediate Representation (SPIR){{cite web |url=https://www.khronos.org/spir/ |title=SPIR – The first open standard intermediate language for parallel compute and graphics |publisher=Khronos Group |date=2014-01-21 }} can be used as a target-independent way to ship kernels between a front-end compiler and the OpenCL back-end.
More recently Khronos Group has ratified SYCL,{{cite web |url=https://www.khronos.org/sycl/ |title=SYCL – C++ Single-source Heterogeneous Programming for OpenCL |publisher=Khronos Group |date=2014-01-21 |access-date=October 24, 2016 |archive-date=January 18, 2021 |archive-url=https://web.archive.org/web/20210118094449/https://www.khronos.org/sycl/ |url-status=dead }} a higher-level programming model for OpenCL as a single-source eDSL based on pure C++17 to improve programming productivity. People interested by C++ kernels but not by SYCL single-source programming style can use C++ features with compute kernel sources written in "C++ for OpenCL" language.{{Cite web|title=C++ for OpenCL, OpenCL-Guide|url=https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/cpp_for_opencl.md|access-date=2021-04-18|website=GitHub|language=en}}
= Memory hierarchy =
OpenCL defines a four-level memory hierarchy for the compute device:
- global memory: shared by all processing elements, but has high access latency ({{mono|__global}});
- read-only memory: smaller, low latency, writable by the host CPU but not the compute devices ({{mono|__constant}});
- local memory: shared by a group of processing elements ({{mono|__local}});
- per-element private memory (registers; {{mono|__private}}).
Not every device needs to implement each level of this hierarchy in hardware. Consistency between the various levels in the hierarchy is relaxed, and only enforced by explicit synchronization constructs, notably barriers.
Devices may or may not share memory with the host CPU. The host API provides handles on device memory buffers and functions to transfer data back and forth between host and devices.
OpenCL kernel language
The programming language that is used to write compute kernels is called kernel language. OpenCL adopts C/C++-based languages to specify the kernel computations performed on the device with some restrictions and additions to facilitate efficient mapping to the heterogeneous hardware resources of accelerators. Traditionally OpenCL C was used to program the accelerators in OpenCL standard, later C++ for OpenCL kernel language was developed that inherited all functionality from OpenCL C but allowed to use C++ features in the kernel sources.
= OpenCL C language =
OpenCL C{{cite web |title=The OpenCL C Specification, Version 2.0 |editor-first=Aaftab |editor-last=Munshi |year=2014 |url=https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf |access-date=June 24, 2014}} is a C99-based language dialect adapted to fit the device model in OpenCL. Memory buffers reside in specific levels of the memory hierarchy, and pointers are annotated with the region qualifiers {{mono|__global}}, {{mono|__local}}, {{mono|__constant}}, and {{mono|__private}}, reflecting this. Instead of a device program having a {{mono|main}} function, OpenCL C functions are marked {{mono|__kernel}} to signal that they are entry points into the program to be called from the host program. Function pointers, bit fields and variable-length arrays are omitted, and recursion is forbidden.{{cite web|url=http://developer.amd.com/zones/OpenCLZone/courses/Documents/Introduction_to_OpenCL_Programming%20(201005).pdf|title=Introduction to OpenCL Programming 201005|publisher=AMD|pages=89–90|archive-url=https://web.archive.org/web/20110516092008/http://developer.amd.com/zones/OpenCLZone/courses/Documents/Introduction_to_OpenCL_Programming%20(201005).pdf|archive-date=May 16, 2011|url-status=dead|access-date=August 8, 2017}} The C standard library is replaced by a custom set of standard functions, geared toward math programming.
OpenCL C is extended to facilitate use of parallelism with vector types and operations, synchronization, and functions to work with work-items and work-groups. In particular, besides scalar types such as {{mono|float}} and {{mono|double}}, which behave similarly to the corresponding types in C, OpenCL provides fixed-length vector types such as {{mono|float4}} (4-vector of single-precision floats); such vector types are available in lengths two, three, four, eight and sixteen for various base types.{{rp|§ 6.1.2}} Vectorized operations on these types are intended to map onto SIMD instructions sets, e.g., SSE or VMX, when running OpenCL programs on CPUs. Other specialized types include 2-d and 3-d image types.{{r|openclc}}{{rp|10–11}}
== Example: matrix–vector multiplication ==
File:Matrix multiplication qtl5.svg
The following is a matrix–vector multiplication algorithm in OpenCL C.
// Multiplies A*x, leaving the result in y.
// A is a row-major matrix, meaning the (i,j) element is at A[i*ncols+j].
__kernel void matvec(__global const float *A, __global const float *x,
uint ncols, __global float *y)
{
size_t i = get_global_id(0); // Global id, used as the row index
__global float const *a = &A[i*ncols]; // Pointer to the i'th row
float sum = 0.f; // Accumulator for dot product
for (size_t j = 0; j < ncols; j++) {
sum += a[j] * x[j];
}
y[i] = sum;
}
The kernel function {{mono|matvec}} computes, in each invocation, the dot product of a single row of a matrix {{mvar|A}} and a vector {{mvar|x}}:
To extend this into a full matrix–vector multiplication, the OpenCL runtime maps the kernel over the rows of the matrix. On the host side, the {{mono|clEnqueueNDRangeKernel}} function does this; it takes as arguments the kernel to execute, its arguments, and a number of work-items, corresponding to the number of rows in the matrix {{mvar|A}}.
== Example: computing the FFT ==
This example will load a fast Fourier transform (FFT) implementation and execute it. The implementation is shown below.{{cite web |url=http://s08.idav.ucdavis.edu/munshi-opencl.pdf |title=OpenCL |access-date=August 14, 2008 |publisher=SIGGRAPH2008 |date=August 14, 2008 |archive-url=https://web.archive.org/web/20120216135129/http://s08.idav.ucdavis.edu/munshi-opencl.pdf |archive-date=February 16, 2012 |url-status=dead }} The code asks the OpenCL library for the first available graphics card, creates memory buffers for reading and writing (from the perspective of the graphics card), JIT-compiles the FFT-kernel and then finally asynchronously runs the kernel. The result from the transform is not read in this example.
- include
- include
- include "CL/opencl.h"
- define NUM_ENTRIES 1024
int main() // (int argc, const char* argv[])
{
// CONSTANTS
// The source code of the kernel is represented as a string
// located inside file: "fft1D_1024_kernel_src.cl". For the details see the next listing.
const char *KernelSource =
#include "fft1D_1024_kernel_src.cl"
;
// Looking up the available GPUs
const cl_uint num = 1;
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 0, NULL, (cl_uint*)&num);
cl_device_id devices[1];
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, num, devices, NULL);
// create a compute context with GPU device
cl_context context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
// create a command queue
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_DEFAULT, 1, devices, NULL);
cl_command_queue queue = clCreateCommandQueue(context, devices[0], 0, NULL);
// allocate the buffer memory objects
cl_mem memobjs[] = { clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float) * 2 * NUM_ENTRIES, NULL, NULL),
clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float) * 2 * NUM_ENTRIES, NULL, NULL) };
// create the compute program
// const char* fft1D_1024_kernel_src[1] = { };
cl_program program = clCreateProgramWithSource(context, 1, (const char **)& KernelSource, NULL, NULL);
// build the compute program executable
clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
// create the compute kernel
cl_kernel kernel = clCreateKernel(program, "fft1D_1024", NULL);
// set the args values
size_t local_work_size[1] = { 256 };
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjs[0]);
clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobjs[1]);
clSetKernelArg(kernel, 2, sizeof(float)*(local_work_size[0] + 1) * 16, NULL);
clSetKernelArg(kernel, 3, sizeof(float)*(local_work_size[0] + 1) * 16, NULL);
// create N-D range object with work-item dimensions and execute kernel
size_t global_work_size[1] = { 256 };
global_work_size[0] = NUM_ENTRIES;
local_work_size[0] = 64; //Nvidia: 192 or 256
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL);
}
The actual calculation inside file "fft1D_1024_kernel_src.cl" (based on "Fitting FFT onto the G80 Architecture"):{{cite web | url=http://www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf | title=Fitting FFT onto G80 Architecture | access-date=November 14, 2008 | publisher=Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report |date=May 2008}}
R"(
// This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into
// calls to a radix 16 function, another radix 16 function and then a radix 4 function
__kernel void fft1D_1024 (__global float2 *in, __global float2 *out,
__local float *sMemx, __local float *sMemy) {
int tid = get_local_id(0);
int blockIdx = get_group_id(0) * 1024 + tid;
float2 data[16];
// starting index of data to/from global memory
in = in + blockIdx; out = out + blockIdx;
globalLoads(data, in, 64); // coalesced global reads
fftRadix16Pass(data); // in-place radix-16 pass
twiddleFactorMul(data, tid, 1024, 0);
// local shuffle using local memory
localShuffle(data, sMemx, sMemy, tid, (((tid & 15) * 65) + (tid >> 4)));
fftRadix16Pass(data); // in-place radix-16 pass
twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication
localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15)));
// four radix-4 function calls
fftRadix4Pass(data); // radix-4 function number 1
fftRadix4Pass(data + 4); // radix-4 function number 2
fftRadix4Pass(data + 8); // radix-4 function number 3
fftRadix4Pass(data + 12); // radix-4 function number 4
// coalesced global writes
globalStores(data, out, 64);
}
)"
A full, open source implementation of an OpenCL FFT can be found on Apple's website.{{cite web | url=https://developer.apple.com/library/archive/samplecode/OpenCL_FFT/Introduction/Intro.html | title=OpenCL_FFT | access-date=June 18, 2022 | publisher=Apple | date=June 26, 2012}}
= C++ for OpenCL language =
In 2020, Khronos announced{{Cite web|last=Trevett|first=Neil|date=28 April 2020|title=Khronos Announcements and Panel Discussion|url=https://www.iwocl.org/wp-content/uploads/iwocl-syclcon-2020-panel-slides.pdf}} the transition to the community driven C++ for OpenCL programming language{{cite book |doi=10.1145/3388333.3388647 |chapter=The C++ for OpenCL Programming Language |title=Proceedings of the International Workshop on OpenCL |date=2020 |last1=Stulova |first1=Anastasia |last2=Hickey |first2=Neil |last3=Van Haastregt |first3=Sven |last4=Antognini |first4=Marco |last5=Petit |first5=Kevin |pages=1–2 |isbn=978-1-4503-7531-3 }} that provides features from C++17 in combination with the traditional OpenCL C features. This language allows to leverage a rich variety of language features from standard C++ while preserving backward compatibility to OpenCL C. This opens up a smooth transition path to C++ functionality for the OpenCL kernel code developers as they can continue using familiar programming flow and even tools as well as leverage existing extensions and libraries available for OpenCL C.
The language semantics is described in the documentation published in the releases of OpenCL-Docs{{Citation|title=KhronosGroup/OpenCL-Docs|date=2021-04-16|url=https://github.com/KhronosGroup/OpenCL-Docs|publisher=The Khronos Group|access-date=2021-04-18}} repository hosted by the Khronos Group but it is currently not ratified by the Khronos Group. The C++ for OpenCL language is not documented in a stand-alone document and it is based on the specification of C++ and OpenCL C. The open source Clang compiler has supported C++ for OpenCL since release 9.{{Cite web|date=September 2019|title=Clang release 9 documentation, OpenCL support|url=https://releases.llvm.org/9.0.0/tools/clang/docs/ReleaseNotes.html#opencl-kernel-language-changes-in-clang|access-date=2021-04-18|website=releases.llvm.org}}
C++ for OpenCL has been originally developed as a Clang compiler extension and appeared in the release 9.{{Cite web|date=September 2019|title=Clang 9, Language Extensions, OpenCL|url=https://releases.llvm.org/9.0.0/tools/clang/docs/LanguageExtensions.html#opencl-features|access-date=2021-04-18|website=releases.llvm.org}} As it was tightly coupled with OpenCL C and did not contain any Clang specific functionality its documentation has been re-hosted to the OpenCL-Docs repository from the Khronos Group along with the sources of other specifications and reference cards. The first official release of this document describing C++ for OpenCL version 1.0 has been published in December 2020.{{Cite web|date=December 2020|title=Release of Documentation of C++ for OpenCL kernel language, version 1.0, revision 1 · KhronosGroup/OpenCL-Docs|url=https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r1|access-date=2021-04-18|website=GitHub|language=en}} C++ for OpenCL 1.0 contains features from C++17 and it is backward compatible with OpenCL C 2.0. In December 2021, a new provisional C++ for OpenCL version 2021 has been released which is fully compatible with the OpenCL 3.0 standard.{{Cite web|date=December 2021|title=Release of Documentation of C++ for OpenCL kernel language, version 1.0 and 2021 · KhronosGroup/OpenCL-Docs|url=https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-docrev2021.12|access-date=2022-12-02|website=GitHub|language=en}} A work in progress draft of the latest C++ for OpenCL documentation can be found on the Khronos website.{{Cite web|title=The C++ for OpenCL 1.0 Programming Language Documentation|url=https://www.khronos.org/opencl/assets/CXX_for_OpenCL.html|access-date=2021-04-18|website=www.khronos.org}}
== Features ==
C++ for OpenCL supports most of the features (syntactically and semantically) from OpenCL C except for nested parallelism and blocks.{{Cite web|date=March 2021|title=Release of C++ for OpenCL Kernel Language Documentation, version 1.0, revision 2 · KhronosGroup/OpenCL-Docs|url=https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r2|access-date=2021-04-18|website=GitHub|language=en}} However, there are minor differences in some supported features mainly related to differences in semantics between C++ and C. For example, C++ is more strict with the implicit type conversions and it does not support the {{Mono|restrict}} type qualifier. The following C++ features are not supported by C++ for OpenCL: virtual functions, {{Mono|dynamic_cast}} operator, non-placement {{Mono|new}}/{{Mono|delete}} operators, exceptions, pointer to member functions, references to functions, C++ standard libraries. C++ for OpenCL extends the concept of separate memory regions (address spaces) from OpenCL C to C++ features – functional casts, templates, class members, references, lambda functions, and operators. Most of C++ features are not available for the kernel functions e.g. overloading or templating, arbitrary class layout in parameter type.
== Example: complex-number arithmetic ==
The following code snippet illustrates how kernels with complex-number arithmetic can be implemented in C++ for OpenCL language with convenient use of C++ features.
// Define a class Complex, that can perform complex-number computations with
// various precision when different types for T are used - double, float, half.
template
class complex_t {
T m_re; // Real component.
T m_im; // Imaginary component.
public:
complex_t(T re, T im): m_re{re}, m_im{im} {};
// Define operator for complex-number multiplication.
complex_t operator*(const complex_t &other) const
{
return {m_re * other.m_re - m_im * other.m_im,
m_re * other.m_im + m_im * other.m_re};
}
T get_re() const { return m_re; }
T get_im() const { return m_im; }
};
// A helper function to compute multiplication over complex numbers read from
// the input buffer and to store the computed result into the output buffer.
template
void compute_helper(__global T *in, __global T *out) {
auto idx = get_global_id(0);
// Every work-item uses 4 consecutive items from the input buffer
// - two for each complex number.
auto offset = idx * 4;
auto num1 = complex_t{in[offset], in[offset + 1]};
auto num2 = complex_t{in[offset + 2], in[offset + 3]};
// Perform complex-number multiplication.
auto res = num1 * num2;
// Every work-item writes 2 consecutive items to the output buffer.
out[idx * 2] = res.get_re();
out[idx * 2 + 1] = res.get_im();
}
// This kernel is used for complex-number multiplication in single precision.
__kernel void compute_sp(__global float *in, __global float *out) {
compute_helper(in, out);
}
- ifdef cl_khr_fp16
// This kernel is used for complex-number multiplication in half precision when
// it is supported by the device.
- pragma OPENCL EXTENSION cl_khr_fp16: enable
__kernel void compute_hp(__global half *in, __global half *out) {
compute_helper(in, out);
}
- endif
== Tooling and execution environment ==
C++ for OpenCL language can be used for the same applications or libraries and in the same way as OpenCL C language is used. Due to the rich variety of C++ language features, applications written in C++ for OpenCL can express complex functionality more conveniently than applications written in OpenCL C and in particular generic programming paradigm from C++ is very attractive to the library developers.
C++ for OpenCL sources can be compiled by OpenCL drivers that support cl_ext_cxx_for_opencl extension.{{Cite web|date=September 2020|title=cl_ext_cxx_for_opencl|url=https://www.khronos.org/registry/OpenCL/extensions/ext/cl_ext_cxx_for_opencl.html|access-date=2021-04-18|website=www.khronos.org}} Arm has announced support for this extension in December 2020.{{Cite web|date=December 2020|title=Mali SDK Supporting Compilation of Kernels in C++ for OpenCL|url=https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/mali-sdk-supporting-compilation-of-kernels|access-date=2021-04-18|website=community.arm.com|language=en}} However, due to increasing complexity of the algorithms accelerated on OpenCL devices, it is expected that more applications will compile C++ for OpenCL kernels offline using stand alone compilers such as Clang{{Cite web|title=Clang Compiler User's Manual — C++ for OpenCL Support|url=https://clang.llvm.org/docs/UsersManual.html#cxx-for-opencl|access-date=2021-04-18|website=clang.llvm.org}} into executable binary format or portable binary format e.g. SPIR-V.{{Cite web|title=OpenCL-Guide, Offline Compilation of OpenCL Kernel Sources|url=https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/os_tooling.md|access-date=2021-04-18|website=GitHub|language=en}} Such an executable can be loaded during the OpenCL applications execution using a dedicated OpenCL API.{{Cite web|title=OpenCL-Guide, Programming OpenCL Kernels|url=https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/programming_opencl_kernels.md|access-date=2021-04-18|website=GitHub|language=en}}
Binaries compiled from sources in C++ for OpenCL 1.0 can be executed on OpenCL 2.0 conformant devices. Depending on the language features used in such kernel sources it can also be executed on devices supporting earlier OpenCL versions or OpenCL 3.0.
Aside from OpenCL drivers kernels written in C++ for OpenCL can be compiled for execution on Vulkan devices using clspv compiler and clvk{{Citation|last=Petit|first=Kévin|title=Experimental implementation of OpenCL on Vulkan|date=2021-04-17|url=https://github.com/kpet/clvk|access-date=2021-04-18}} runtime layer just the same way as OpenCL C kernels.
== Contributions ==
C++ for OpenCL is an open language developed by the community of contributors listed in its documentation. New contributions to the language semantic definition or open source tooling support are accepted from anyone interested as soon as they are aligned with the main design philosophy and they are reviewed and approved by the experienced contributors.
History
OpenCL was initially developed by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, IBM, Qualcomm, Intel, and Nvidia. Apple submitted this initial proposal to the Khronos Group. On June 16, 2008, the Khronos Compute Working Group was formed{{cite press release |url=https://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/ |title=Khronos Launches Heterogeneous Computing Initiative |access-date=June 18, 2008 |publisher=Khronos Group |date=June 16, 2008 |url-status=dead |archive-url=https://web.archive.org/web/20080620123431/http://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/ |archive-date=June 20, 2008 |df=mdy-all }} with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by November 18, 2008.{{cite web |url=http://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2 |title=OpenCL gets touted in Texas |publisher=MacWorld |date=November 20, 2008 |access-date=June 12, 2009}} This technical specification was reviewed by the Khronos members and approved for public release on December 8, 2008.{{cite press release|title=The Khronos Group Releases OpenCL 1.0 Specification|date=December 8, 2008|publisher=Khronos Group|url=https://www.khronos.org/news/press/the_khronos_group_releases_opencl_1.0_specification|access-date=December 4, 2016}}
= OpenCL 1.0 =
OpenCL 1.0 released with Mac OS X Snow Leopard on August 28, 2009. According to an Apple press release:{{cite press release |url=https://www.apple.com/pr/library/2008/06/09snowleopard.html |title=Apple Previews Mac OS X Snow Leopard to Developers |access-date=June 9, 2008 |publisher=Apple Inc. |date=June 9, 2008 |url-status=dead |archive-url=https://web.archive.org/web/20120318064324/http://www.apple.com/pr/library/2008/06/09Apple-Previews-Mac-OS-X-Snow-Leopard-to-Developers.html |archive-date=March 18, 2012 |df=mdy-all }}
Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.
AMD decided to support OpenCL instead of the now deprecated Close to Metal in its Stream framework.{{cite press release |url=https://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~127451,00.html |title=AMD Drives Adoption of Industry Standards in GPGPU Software Development |access-date=August 14, 2008 |publisher=AMD |date=August 6, 2008}}{{cite web |url=http://www.eweek.com/c/a/Desktops-and-Notebooks/AMD-Backing-OpenCL-and-Microsoft-DirectX-11/ |title=AMD Backs OpenCL, Microsoft DirectX 11 |access-date=August 14, 2008 |publisher=eWeek |date=August 6, 2008 |archive-date=December 6, 2012 |archive-url=https://archive.today/20121206024506/http://www.eweek.com/c/a/Desktops-and-Notebooks/AMD-Backing-OpenCL-and-Microsoft-DirectX-11/ |url-status=dead }} RapidMind announced their adoption of OpenCL underneath their development platform to support GPUs from multiple vendors with one interface.{{cite web |url=http://www.hpcwire.com/topic/applications/RapidMind_Embraces_Open_Source_and_Standards_Projects.html |title=HPCWire: RapidMind Embraces Open Source and Standards Projects |access-date=November 11, 2008 |publisher=HPCWire |date=November 10, 2008 |url-status=dead |archive-url=https://web.archive.org/web/20081218113648/http://www.hpcwire.com/topic/applications/RapidMind_Embraces_Open_Source_and_Standards_Projects.html |archive-date=December 18, 2008}} On December 9, 2008, Nvidia announced its intention to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.{{cite press release |url=http://www.nvidia.com/object/io_1228825271885.html |title=Nvidia Adds OpenCL To Its Industry Leading GPU Computing Toolkit |access-date=December 10, 2008 |publisher=Nvidia |date=December 9, 2008}} On October 30, 2009, IBM released its first OpenCL implementation as a part of the XL compilers.{{cite web |url=http://www.alphaworks.ibm.com/tech/opencl |title=OpenCL Development Kit for Linux on Power |access-date=October 30, 2009 |publisher=alphaWorks |date=October 30, 2009 |archive-date=August 9, 2011 |archive-url=https://web.archive.org/web/20110809160855/http://www.alphaworks.ibm.com/tech/opencl |url-status=dead }}
Acceleration of calculations with factor to 1000 are possible with OpenCL in graphic cards against normal CPU.{{fact|date=April 2025}}
Some important features of next Version of OpenCL are optional in 1.0 like double- or half-precision operations.{{cite web|url=https://developer.amd.com/wordpress/media/2012/10/opencl-1.0.48.pdf|title=The OpenCL Specification Version: 1.0 Document Revision: 48|publisher=Khronos OpenCL Working Group}}
= OpenCL 1.1 =
OpenCL 1.1 was ratified by the Khronos Group on June 14, 2010,{{cite web |url=https://www.khronos.org/news/press/releases/khronos-group-releases-opencl-1-1-parallel-computing-standard/ |title=Khronos Drives Momentum of Parallel Computing Standard with Release of OpenCL 1.1 Specification |access-date=February 24, 2016 |archive-url=https://web.archive.org/web/20160302231506/https://www.khronos.org/news/press/releases/khronos-group-releases-opencl-1-1-parallel-computing-standard/ |archive-date=March 2, 2016 |url-status=dead }} and adds significant functionality for enhanced parallel programming flexibility, functionality, and performance including:
- New data types including 3-component vectors and additional image formats;
- Handling commands from multiple host threads and processing buffers across multiple devices;
- Operations on regions of a buffer including read, write and copy of 1D, 2D, or 3D rectangular regions;
- Enhanced use of events to drive and control command execution;
- Additional OpenCL built-in C functions such as integer clamp, shuffle, and asynchronous strided copies;
- Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.
= OpenCL 1.2 =
On November 15, 2011, the Khronos Group announced the OpenCL 1.2 specification,{{cite web |url=https://www.khronos.org/news/press/khronos-releases-opencl-1.2-specification |title=Khronos Releases OpenCL 1.2 Specification |publisher=Khronos Group |date=November 15, 2011 |access-date=June 23, 2015}} which added significant functionality over the previous versions in terms of performance and features for parallel programming. Most notable features include:
- Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks.
- Separate compilation and linking of objects: the functionality to compile OpenCL into external libraries for inclusion into other programs.
- Enhanced image support (optional): 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images.
- Built-in kernels: custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include video encoding/decoding and digital signal processors.
- DirectX functionality: DX9 media surface sharing allows for efficient sharing between OpenCL and DX9 or DXVA media surfaces. Equally, for DX11, seamless sharing between OpenCL and DX11 surfaces is enabled.
- The ability to force IEEE 754 compliance for single-precision floating-point math: OpenCL by default allows the single-precision versions of the division, reciprocal, and square root operation to be less accurate than the correctly rounded values that IEEE 754 requires.{{cite web |url=https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf |title=OpenCL 1.2 Specification |publisher=Khronos Group |access-date=June 23, 2015}} If the programmer passes the "-cl-fp32-correctly-rounded-divide-sqrt" command line argument to the compiler, these three operations will be computed to IEEE 754 requirements if the OpenCL implementation supports this, and will fail to compile if the OpenCL implementation does not support computing these operations to their correctly rounded values as defined by the IEEE 754 specification. This ability is supplemented by the ability to query the OpenCL implementation to determine if it can perform these operations to IEEE 754 accuracy.
= OpenCL 2.0 =
On November 18, 2013, the Khronos Group announced the ratification and public release of the finalized OpenCL 2.0 specification.{{cite web |url=https://www.khronos.org/news/press/khronos-finalizes-opencl-2.0-specification-for-heterogeneous-computing |title=Khronos Finalizes OpenCL 2.0 Specification for Heterogeneous Computing |date=November 18, 2013 |access-date=February 10, 2014 |publisher=Khronos Group}} Updates and additions to OpenCL 2.0 include:
= OpenCL 2.1 =
The ratification and release of the OpenCL 2.1 provisional specification was announced on March 3, 2015, at the Game Developer Conference in San Francisco. It was released on November 16, 2015.{{cite web |title=Khronos Releases OpenCL 2.1 and SPIR-V 1.0 Specifications for Heterogeneous Parallel Programming |url=https://www.khronos.org/news/press/khronos-releases-opencl-2.1-and-spir-v-1.0-specifications-for-heterogeneous |publisher=Khronos Group |access-date=November 16, 2015 |date=November 16, 2015}} It introduced the OpenCL C++ kernel language, based on a subset of C++14, while maintaining support for the preexisting OpenCL C kernel language. Vulkan and OpenCL 2.1 share SPIR-V as an intermediate representation allowing high-level language front-ends to share a common compilation target. Updates to the OpenCL API include:
- Additional subgroup functionality
- Copying of kernel objects and states
- Low-latency device timer queries
- Ingestion of SPIR-V code by runtime
- Execution priority hints for queues
- Zero-sized dispatches from host
AMD, ARM, Intel, HPC, and YetiWare have declared support for OpenCL 2.1.{{cite web |title=Khronos Announces OpenCL 2.1: C++ Comes to OpenCL |url=http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl |publisher=AnandTech |access-date=April 8, 2015 |date=March 3, 2015}}{{cite web |title=Khronos Releases OpenCL 2.1 Provisional Specification for Public Review |url=https://www.khronos.org/news/press/khronos-releases-opencl-2.1-provisional-specification-for-public-review |publisher=Khronos Group |access-date=April 8, 2015 |date=March 3, 2015}}
= OpenCL 2.2 =
OpenCL 2.2 brings the OpenCL C++ kernel language into the core specification for significantly enhanced parallel programming productivity.{{cite web |url= https://www.khronos.org/opencl/ |title= OpenCL Overview |publisher= Khronos Group|date= 2013-07-21 }}{{cite web |url= https://www.khronos.org/news/press/khronos-releases-opencl-2.2-provisional-spec-opencl-c-kernel-language |title= Khronos Releases OpenCL 2.2 Provisional Specification with OpenCL C++ Kernel Language for Parallel Programming |date=April 18, 2016 |publisher= Khronos Group }}{{cite web |title= OpenCL – A State of the Union |url= http://www.iwocl.org/wp-content/uploads/iwocl-2016-opencl-state-union.pdf|first=Neil|last=Trevett|website=IWOCL|publisher=Khronos Group |location=Vienna |date= April 2016 |access-date= January 2, 2017 }} It was released on May 16, 2017.{{cite web |url=https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2 |title=Khronos Releases OpenCL 2.2 With SPIR-V 1.2 |date=May 16, 2017 |publisher=Khronos Group }} Maintenance Update released in May 2018 with bugfixes.{{Cite web|url=https://www.khronos.org/blog/opencl-2.2-maintenance-update-released|title=OpenCL 2.2 Maintenance Update Released|date=May 14, 2018|website=The Khronos Group}}
- The OpenCL C++ kernel language is a static subset of the C++14 standard and includes classes, templates, lambda expressions, function overloads and many other constructs for generic and meta-programming.
- Uses the new Khronos SPIR-V 1.1 intermediate language which fully supports the OpenCL C++ kernel language.
- OpenCL library functions can now use the C++ language to provide increased safety and reduced undefined behavior while accessing features such as atomics, iterators, images, samplers, pipes, and device queue built-in types and address spaces.
- Pipe storage is a new device-side type in OpenCL 2.2 that is useful for FPGA implementations by making connectivity size and type known at compile time, enabling efficient device-scope communication between kernels.
- OpenCL 2.2 also includes features for enhanced optimization of generated code: applications can provide the value of specialization constant at SPIR-V compilation time, a new query can detect non-trivial constructors and destructors of program scope global objects, and user callbacks can be set at program release time.
- Runs on any OpenCL 2.0-capable hardware (only a driver update is required).
= OpenCL 3.0 =
The OpenCL 3.0 specification was released on September 30, 2020, after being in preview since April 2020. OpenCL 1.2 functionality has become a mandatory baseline, while all OpenCL 2.x and OpenCL 3.0 features were made optional. The specification retains the OpenCL C language and deprecates the OpenCL C++ Kernel Language, replacing it with the C++ for OpenCL language based on a Clang/LLVM compiler which implements a subset of C++17 and SPIR-V intermediate code.{{Cite web|url=https://www.phoronix.com/review/opencl-30-spec|title=OpenCL 3.0 Bringing Greater Flexibility, Async DMA Extensions|website=www.phoronix.com}}{{Cite web|url=https://www.khronos.org/news/press/khronos-group-releases-opencl-3.0|title = Khronos Group Releases OpenCL 3.0|date = April 26, 2020}}{{cite web|url=https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf|title=The OpenCL Specification|publisher=Khronos OpenCL Working Group}}
Version 3.0.7 of C++ for OpenCL with some Khronos openCL extensions were presented at IWOCL 21.{{Cite web |last=Trevett |first=Neil |date=2021 |title=State of the Union: OpenCL Working Group |url=https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf |page=9}} Actual is 3.0.11 with some new extensions and corrections.
NVIDIA, working closely with the Khronos OpenCL Working Group, improved Vulkan Interop with semaphores and memory sharing.{{Cite web|url=https://developer.nvidia.com/blog/using-semaphore-and-memory-sharing-extensions-for-vulkan-interop-with-opencl/|title=Using Semaphore and Memory Sharing Extensions for Vulkan Interop with NVIDIA OpenCL|date=February 24, 2022}} Last minor update was 3.0.14 with bugfix and a new extension for multiple devices.{{cite web | url=https://www.phoronix.com/news/OpenCL-3.0.14 | title=OpenCL 3.0.14 Released with New Extension for Command Buffer Multi-Device }}
Roadmap
File:IWOCL2017.jpg (IWOCL) held by the Khronos Group]]
When releasing OpenCL 2.2, the Khronos Group announced that OpenCL would converge where possible with Vulkan to enable OpenCL software deployment flexibility over both APIs.{{Cite web|url=https://www.pcper.com/reviews/General-Tech/Breaking-OpenCL-Merging-Roadmap-Vulkan|title=Breaking: OpenCL Merging Roadmap into Vulkan | PC Perspective|website=www.pcper.com|access-date=May 17, 2017|archive-url=https://web.archive.org/web/20171101062642/https://www.pcper.com/reviews/General-Tech/Breaking-OpenCL-Merging-Roadmap-Vulkan|archive-date=November 1, 2017|url-status=dead}}{{Cite web|url=https://www.phoronix.com/scan.php?page=article&item=siggraph-2018-khr&num=2|title=SIGGRAPH 2018: OpenCL-Next Taking Shape, Vulkan Continues Evolving – Phoronix|website=www.phoronix.com}} This has been now demonstrated by Adobe's Premiere Rush using the clspv{{Citation|title=Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders: google/clspv|date=2019-08-17|url=https://github.com/google/clspv|access-date=2019-08-20}} open source compiler to compile significant amounts of OpenCL C kernel code to run on a Vulkan runtime for deployment on Android.{{Cite web|url=https://www.khronos.org/assets/uploads/developers/library/2019-siggraph/Vulkan-01-Update-SIGGRAPH-Jul19.pdf|title=Vulkan Update SIGGRAPH 2019}} OpenCL has a forward looking roadmap independent of Vulkan, with 'OpenCL Next' under development and targeting release in 2020. OpenCL Next may integrate extensions such as Vulkan / OpenCL Interop, Scratch-Pad Memory Management, Extended Subgroups, SPIR-V 1.4 ingestion and SPIR-V Extended debug info. OpenCL is also considering Vulkan-like loader and layers and a "flexible profile" for deployment flexibility on multiple accelerator types.{{Cite web|url=https://www.khronos.org/assets/uploads/developers/library/2019-embedded-vision-summit/1b%20Khronos-and-OpenCL-Overview-EVS-Workshop_May19.pdf|title=Khronos and OpenCL Overview EVS Workshop May19|last=Trevett|first=Neil|date=May 23, 2019|website=Khronos Group}}
Open source implementations
OpenCL consists of a set of headers and a shared object that is loaded at runtime. An installable client driver (ICD) must be installed on the platform for every class of vendor for which the runtime would need to support. That is, for example, in order to support Nvidia devices on a Linux platform, the Nvidia ICD would need to be installed such that the OpenCL runtime (the ICD loader) would be able to locate the ICD for the vendor and redirect the calls appropriately. The standard OpenCL header is used by the consumer application; calls to each function are then proxied by the OpenCL runtime to the appropriate driver using the ICD. Each vendor must implement each OpenCL call in their driver.{{cite web |url=https://www.khronos.org/registry/cl/extensions/khr/cl_khr_icd.txt |title=OpenCL ICD Specification |access-date=June 23, 2015}}
The Apple,{{cite web|url=http://llvm.org/Users.html#Apple|title=Apple entry on LLVM Users page|access-date=August 29, 2009}} Nvidia,{{cite web|url=http://llvm.org/Users.html|title=Nvidia entry on LLVM Users page|access-date=August 6, 2009}} ROCm, RapidMind{{cite web|url=http://llvm.org/Users.html|title=Rapidmind entry on LLVM Users page|access-date=October 1, 2009}} and Gallium3D{{cite web|url=http://zrusin.blogspot.com/2009/02/opencl.html|title=Zack Rusin's blog post about the Gallium3D OpenCL implementation|access-date=October 1, 2009|date=February 2009}} implementations of OpenCL are all based on the LLVM Compiler technology and use the Clang compiler as their frontend.
; MESA Gallium Compute
: An implementation of OpenCL (actual 1.1 incomplete, mostly done AMD Radeon GCN) for a number of platforms is maintained as part of the Gallium Compute Project,{{cite web |url=http://dri.freedesktop.org/wiki/GalliumCompute/ |title=GalliumCompute |publisher=dri.freedesktop.org |access-date=June 23, 2015}} which builds on the work of the Mesa project to support multiple platforms. Formerly this was known as CLOVER.,{{Cite web|url=https://www.x.org/wiki/Events/XDC2013/XDC2013TomStellardCloverStatus/XDC2013TomStellardCloverStatus.pdf|title=Clover Status Update}} actual development: mostly support for running incomplete framework with actual LLVM and CLANG, some new features like fp16 in 17.3,{{Cite web|url=https://cgit.freedesktop.org/mesa/mesa/log/?qt=grep&q=clover|title=mesa/mesa – The Mesa 3D Graphics Library|website=cgit.freedesktop.org}} Target complete OpenCL 1.0, 1.1 and 1.2 for AMD and Nvidia. New Basic Development is done by Red Hat with SPIR-V also for Clover.{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Gallium-Clover-NIR-SPIR-V-XDC18|title=Gallium Clover With SPIR-V & NIR Opening Up New Compute Options Inside Mesa – Phoronix|website=www.phoronix.com|access-date=December 13, 2018|archive-date=October 22, 2020|archive-url=https://web.archive.org/web/20201022034403/https://www.phoronix.com/scan.php?page=news_item&px=Gallium-Clover-NIR-SPIR-V-XDC18|url-status=dead}}{{cite web|url=https://xdc2018.x.org/slides/clover.pdf|title=OpenCL support inside mesa through SPIR-V and NIR|date=2018|first1=Rob|last1=Clark|first2=Karol|last2=Herbst}} New Target is modular OpenCL 3.0 with full support of OpenCL 1.2. Actual state is available in Mesamatrix. Image supports are here in the focus of development.
: RustiCL is a new implementation for Gallium compute with Rust instead of C. In Mesa 22.2 experimental implementation is available with openCL 3.0-support and image extension implementation for programs like Darktable.{{cite web | url=https://www.phoronix.com/scan.php?page=news_item&px=Rusticl-Darktable-Milestone | title=Mesa's 'Rusticl' Implementation Now Manages to Handle Darktable OpenCL }} Intel Xe (Arc) and AMD GCN+ are supported in Mesa 22.3+. AMD R600 and Nvidia Kepler+ are also target of hardware support.{{cite web | url=https://www.phoronix.com/news/Rusticl-OpenCL-3.0-Conformance | title=Mesa's Rusticl Achieves Official OpenCL 3.0 Conformance }}{{cite web | url=https://www.phoronix.com/news/Mesa-22.3-Released | title=Mesa 22.3 Released with RDNA3 Vulkan, Rusticl OpenCL, Better Intel Arc Graphics }}{{cite web | url=https://www.phoronix.com/news/Rusticl-RadeonSI-Near | title=Mesa's Rusticl OpenCL Driver Nearly Ready with AMD Radeon GPU Support }} RustiCL outperform AMD ROCM with Radeon RX 6700 XT hardware at Luxmark Benchmark.{{cite web | url=https://www.phoronix.com/news/Rusticl-Outperformed-ROCm | title=Mesa's Rusticl OpenCL Implementation Can Outperform Radeon's ROCm Compute Stack }} Mesa 23.1 supports official RustiCL. In Mesa 23.2 support of important fp64 is at experimental level.
: Microsoft's Windows 11 on Arm added support for OpenCL 1.2 via CLon12, an open source OpenCL implementation on top DirectX 12 via Mesa Gallium.{{Cite web |date=2022-03-13 |title=State of Windows on Arm64: a high-level perspective |url=https://chipsandcheese.com/2022/03/13/state-of-windows-on-arm64-a-high-level-perspective/ |access-date=2023-10-23 |website=Chips and Cheese |language=en-US}}{{Cite web |title=Introducing OpenCL and OpenGL on DirectX |url=https://www.collabora.com/news-and-blog/news-and-events/introducing-opencl-and-opengl-on-directx.html |access-date=2023-10-23 |website=Collabora {{!}} Open Source Consulting |language=en}}{{Cite web |title=Deep dive into OpenGL over DirectX layering |url=https://www.collabora.com/news-and-blog/blog/2020/07/09/deep-dive-into-opengl-over-directx-layering/ |access-date=2023-10-23 |website=Collabora {{!}} Open Source Consulting |language=en}}
; BEIGNET
: An implementation by Intel for its Ivy Bridge + hardware was released in 2013.{{cite web |first=Michael |last=Larabel |author-link=Michael Larabel |date=January 10, 2013 |title=Beignet: OpenCL/GPGPU Comes For Ivy Bridge On Linux |website=Phoronix |url=https://www.phoronix.com/scan.php?page=news_item&px=MTI3MTU}} This software from Intel's China Team, has attracted criticism from developers at AMD and Red Hat,{{cite web |first=Michael |last=Larabel |author-link=Michael Larabel |date=April 16, 2013 |title=More Criticism Comes Towards Intel's Beignet OpenCL |website=Phoronix |url=https://www.phoronix.com/scan.php?page=news_item&px=MTM1MzM}} as well as Michael Larabel of Phoronix.{{cite web |last=Larabel |first=Michael |date=December 24, 2013 |title=Intel's Beignet OpenCL Is Still Slowly Baking |website=Phoronix |url=https://www.phoronix.com/scan.php?page=news_item&px=MTU1MjA}} Actual Version 1.3.2 support OpenCL 1.2 complete (Ivy Bridge and higher) and OpenCL 2.0 optional for Skylake and newer.{{cite web |url= https://freedesktop.org/wiki/Software/Beignet/ |title= Beignet |publisher= freedesktop.org}}{{Cite web|url=https://cgit.freedesktop.org/beignet/|title=beignet – Beignet OpenCL Library for Intel Ivy Bridge and newer GPUs|website=cgit.freedesktop.org}} support for Android has been added to Beignet.,{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Intel-Beignet-Android|title=Intel Brings Beignet To Android For OpenCL Compute – Phoronix|website=www.phoronix.com}} actual development targets: only support for 1.2 and 2.0, road to OpenCL 2.1, 2.2, 3.0 is gone to NEO.
; NEO: An implementation by Intel for Gen. 8 Broadwell + Gen. 9 hardware released in 2018.{{Cite web|url=https://01.org/compute-runtime|title=01.org Intel Open Source – Compute Runtime|date=2018-02-07}} This driver replaces Beignet implementation for supported platforms (not older 6.gen to Haswell). NEO provides OpenCL 2.1 support on Core platforms and OpenCL 1.2 on Atom platforms.{{Cite web|url=https://github.com/intel/compute-runtime/blob/master/README.md|title=NEO GitHub README|website=GitHub|date=2019-03-21}} Actual in 2020 also Graphic Gen 11 Ice Lake and Gen 12 Tiger Lake are supported. New OpenCL 3.0 is available for Alder Lake, Tiger Lake to Broadwell with Version 20.41+. It includes now optional OpenCL 2.0, 2.1 Features complete and some of 2.2.
; ROCm
: Created as part of AMD's GPUOpen, ROCm (Radeon Open Compute) is an open source Linux project built on OpenCL 1.2 with language support for 2.0. The system is compatible with all modern AMD CPUs and APUs (actual partly GFX 7, GFX 8 and 9), as well as Intel Gen7.5+ CPUs (only with PCI 3.0).{{cite web|url=https://radeonopencompute.github.io/|title=ROCm|website=GitHub|archive-url=https://web.archive.org/web/20161008220038/https://radeonopencompute.github.io/|archive-date=October 8, 2016|url-status=dead}}{{cite web|url=https://github.com/RadeonOpenCompute/ROCm|title=RadeonOpenCompute/ROCm: ROCm – Open Source Platform for HPC and Ultrascale GPU Computing|publisher=GitHub|date=2019-03-21}} With version 1.9 support is in some points extended experimental to Hardware with PCIe 2.0 and without atomics. An overview of actual work is done on XDC2018.{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=ROCm-Compute-Stack-Overview|title=A Nice Overview Of The ROCm Linux Compute Stack – Phoronix|website=www.phoronix.com}}{{Cite web|url=https://drive.google.com/file/d/1ePlNzxYryveh6iFL-cqJO7ycSsYJ4LbC/view?usp=embed_facebook|title=XDC Lightning.pdf|website=Google Docs}} ROCm Version 2.0 supports Full OpenCL 2.0, but some errors and limitations are on the todo list.{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Radeon-ROCm-2.0-Arrives|title=Radeon ROCm 2.0 Officially Out With OpenCL 2.0 Support, TensorFlow 1.12, Vega 48-bit VA – Phoronix|website=www.phoronix.com}}{{Cite web|url=https://www.phoronix.com/scan.php?page=article&item=radeon-rocm-20&num=1|title=Taking Radeon ROCm 2.0 OpenCL For A Benchmarking Test Drive – Phoronix|website=www.phoronix.com}} Version 3.3 is improving in details.https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_Release_Notes_v3.3.pdf{{Dead link|date=March 2021}} Version 3.5 does support OpenCL 2.2.{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Radeon-ROCm-3.5-Released|title = Radeon ROCm 3.5 Released with New Features but Still No Navi Support – Phoronix}} Version 3.10 was with improvements and new APIs.{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Radeon-ROCm-3.10-Released|title = Radeon ROCm 3.10 Released with Data Center Tool Improvements, New APIs – Phoronix}} Announced at SC20 is ROCm 4.0 with support of AMD Compute Card Instinct MI 100.{{Cite web|url=https://www.phoronix.com/scan.php?page=article&item=amd-mi100-rocm4&num=1|title = AMD Launches Arcturus as the Instinct MI100, Radeon ROCm 4.0 – Phoronix}} Actual documentation of 5.5.1 and before is available at GitHub.{{Cite web|url=https://rocm-documentation.readthedocs.io/en/latest/|title = Welcome to AMD ROCm™ Platform — ROCm Documentation 1.0.0 documentation}}{{cite web |url=https://docs.amd.com/ |title=Home |website=docs.amd.com}}{{cite web | url=https://docs.amd.com/category/ROCm™%20v5.x | title=AMD Documentation – Portal }} OpenCL 3.0 is available. RocM 5.5.x+ supports only GFX 9 Vega and later, so alternative are older RocM Releases or in future RustiCL for older Hardware.
; POCL: A portable implementation supporting CPUs and some GPUs (via CUDA and HSA). Building on Clang and LLVM.{{Cite journal |journal=Int'l J Parallel Programming |doi=10.1007/s10766-014-0320-y |year=2016 |title=pocl: A Performance-Portable OpenCL Implementation |first1=Pekka |last1=Jääskeläinen |first2=Carlos |last2=Sánchez de La Lama |first3=Erik |last3=Schnetter |first4=Kalle |last4=Raiskila |first5=Jarmo |last5=Takala |first6=Heikki |last6=Berg |volume=43 |issue=5 |pages=752–785 |arxiv=1611.07083 |bibcode=2016arXiv161107083J }} With version 1.0 OpenCL 1.2 was nearly fully implemented along with some 2.x features.{{Cite web|url=http://portablecl.org/|title=pocl home page|website=pocl}} Version 1.2 is with LLVM/CLANG 6.0, 7.0 and Full OpenCL 1.2 support with all closed tickets in Milestone 1.2.{{Cite web|url=https://github.com/pocl/pocl|title=GitHub – pocl/pocl: pocl: Portable Computing Language.|date=March 14, 2019|via=GitHub}} OpenCL 2.0 is nearly full implemented.{{Cite web|url=http://portablecl.org/docs/html/hsa_status.html#opencl-2-0-atomics-and-hsa-memory-scope|title=HSA support implementation status as of 2016-05-17 — Portable Computing Language (pocl) 1.3-pre documentation|website=portablecl.org}} Version 1.3 Supports Mac OS X.{{Cite web|url=http://portablecl.org/pocl-1.3.html|title = PoCL home page}} Version 1.4 includes support for LLVM 8.0 and 9.0.{{Cite web|url=http://portablecl.org/pocl-1.4.html|title = PoCL home page}} Version 1.5 implements LLVM/Clang 10 support.{{Cite web|url=http://portablecl.org/pocl-1.5.html|title = PoCL home page}} Version 1.6 implements LLVM/Clang 11 support and CUDA Acceleration.{{Cite web |url=https://www.phoronix.com/scan.php?page=news_item&px=POCL-1.6-RC1-Released |title=POCL 1.6-RC1 Released with Better CUDA Performance – Phoronix |access-date=December 3, 2020 |archive-date=January 17, 2021 |archive-url=https://web.archive.org/web/20210117000353/https://www.phoronix.com/scan.php?page=news_item&px=POCL-1.6-RC1-Released |url-status=dead }} Actual targets are complete OpenCL 2.x, OpenCL 3.0 and improvement of performance. POCL 1.6 is with manual optimization at the same level of Intel compute runtime.{{Cite web |last1=Baumann |first1=Tobias |last2=Noack |first2=Matthias |last3=Steinke |first3=Thomas |date=2021 |title=Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs |url=https://www.iwocl.org/wp-content/uploads/30-iwocl-syclcon-2021-baumann-slides.pdf |page=51}} Version 1.7 implements LLVM/Clang 12 support and some new OpenCL 3.0 features.{{Cite web|url=http://portablecl.org/pocl-1.7.html|title = PoCL home page}} Version 1.8 implements LLVM/Clang 13 support.{{cite web | url=http://portablecl.org/pocl-1.8.html | title=PoCL home page }} Version 3.0 implements OpenCL 3.0 at minimum level and LLVM/Clang 14.{{cite web | url=http://portablecl.org/pocl-3.0.html | title=PoCL home page }} Version 3.1 works with LLVM/Clang 15 and improved Spir-V support.{{cite web | url=http://portablecl.org/pocl-3.1.html | title=PoCL home page }}
; Shamrock: A Port of Mesa Clover for ARM with full support of OpenCL 1.2,{{cite web|title=About|url=https://git.linaro.org/gpgpu/shamrock.git/about/|website=Git.Linaro.org}}{{cite web|title=LCA14-412: GPGPU on ARM SoC|url=https://s3.amazonaws.com/connect.linaro.org/lca14/presentations/LCA14-412-%20GPGPU%20on%20ARM%20SoC%20session.pdf|first1=T.|last1=Gall|first2=G.|last2=Pitney|website=Amazon Web Services|date=March 6, 2014|access-date=January 22, 2017|archive-date=July 26, 2020|archive-url=https://web.archive.org/web/20200726023253/https://s3.amazonaws.com/connect.linaro.org/lca14/presentations/LCA14-412-%20GPGPU%20on%20ARM%20SoC%20session.pdf|url-status=dead}} no actual development for 2.0.
; FreeOCL : A CPU focused implementation of OpenCL 1.2 that implements an external compiler to create a more reliable platform,{{Cite web|url=https://github.com/zuzuf/freeocl|title=zuzuf/freeocl|website=GitHub|language=en|access-date=April 13, 2017}} no actual development.
; MOCL: An OpenCL implementation based on POCL by the NUDT researchers for Matrix-2000 was released in 2018. The Matrix-2000 architecture is designed to replace the Intel Xeon Phi accelerators of the TianHe-2 supercomputer. This programming framework is built on top of LLVM v5.0 and reuses some code pieces from POCL as well. To unlock the hardware potential, the device runtime uses a push-based task dispatching strategy and the performance of the kernel atomics is improved significantly. This framework has been deployed on the TH-2A system and is readily available to the public.{{cite conference |url=https://jianbinfang.github.io/files/2018-03-15-mocl.pdf |conference=Proc. Int'l Conf. on Computing Frontiers |doi=10.1145/3203217.3203244 |title=MOCL: An Efficient OpenCL Implementation for the Matrix-2000 Architecture |year=2018 |last1=Zhang |first1=Peng |last2=Fang |first2=Jianbin |last3=Yang |first3=Canqun |last4=Tang |first4=Tao |last5=Huang |first5=Chun |last6=Wang | first6=Zheng}} Some of the software will next ported to improve POCL.
; VC4CL: An OpenCL 1.2 implementation for the VideoCore IV (BCM2763) processor used in the Raspberry Pi before its model 4.{{Cite web|url=https://github.com/doe300/VC4CL|title = Status|website = GitHub|date = March 16, 2022}}
Vendor implementations
= Timeline of vendor implementations =
- June, 2008: During Apple's WWDC conference an early beta of Mac OS X Snow Leopard was made available to the participants, it included the first beta implementation of OpenCL, about 6 months before the final version 1.0 specification was ratified late 2008. They also showed two demos. One was a grid of 8×8 screens rendered, each displaying the screen of an emulated Apple II machine – 64 independent instances in total, each running a famous karate game. This showed task parallelism, on the CPU. The other demo was a N-body simulation running on the GPU of a Mac Pro, a data parallel task.
- December 10, 2008: AMD and Nvidia held the first public OpenCL demonstration, a 75-minute presentation at SIGGRAPH Asia 2008. AMD showed a CPU-accelerated OpenCL demo explaining the scalability of OpenCL on one or more cores while Nvidia showed a GPU-accelerated demo.{{cite web |url=https://www.youtube.com/watch?v=sLv_fhQlqis |title=OpenCL Demo, AMD CPU |website=YouTube |date=December 10, 2008 |access-date=March 28, 2009}}{{cite web |url=https://www.youtube.com/watch?v=PJ1jydg8mLg |title=OpenCL Demo, Nvidia GPU |website=YouTube |date=December 10, 2008 |access-date=March 28, 2009}}
- March 16, 2009: at the 4th Multicore Expo, Imagination Technologies announced the PowerVR SGX543MP, the first GPU of this company to feature OpenCL support.{{cite web |url=http://www.imgtec.com/News/Release/index.asp?NewsID=449 |title=Imagination Technologies launches advanced, highly-efficient POWERVR SGX543MP multi-processor graphics IP family |publisher=Imagination Technologies |date=March 19, 2009 |access-date=January 30, 2011 |archive-date=April 3, 2014 |archive-url=https://web.archive.org/web/20140403000618/http://www.imgtec.com/News/Release/index.asp?NewsID=449 |url-status=dead }}
- March 26, 2009: at GDC 2009, AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on ATI Radeon HD 4000 series GPU.{{cite web |url=http://www.pcper.com/comments.php?nid=6954 |title=AMD and Havok demo OpenCL accelerated physics |publisher=PC Perspective |date=March 26, 2009 |access-date=March 28, 2009 |archive-url=https://web.archive.org/web/20090405072046/http://www.pcper.com/comments.php?nid=6954 |archive-date=April 5, 2009}}
- April 20, 2009: Nvidia announced the release of its OpenCL driver and SDK to developers participating in its OpenCL Early Access Program.{{cite web |url=http://www.nvidia.com/object/io_1240224603372.html |title=Nvidia Releases OpenCL Driver To Developers |publisher=Nvidia |date=April 20, 2009 |access-date=April 27, 2009 |archive-url=https://web.archive.org/web/20120204190623/http://www.nvidia.com/object/io_1240224603372.html |archive-date=February 4, 2012 |url-status=dead }}
- August 5, 2009: AMD unveiled the first development tools for its OpenCL platform as part of its ATI Stream SDK v2.0 Beta Program.{{cite web |url=http://arst.ch/5te |title=AMD does reverse GPGPU, announces OpenCL SDK for x86 |publisher=Ars Technica |date=August 5, 2009 |access-date=August 6, 2009 }}{{Dead link|date=March 2021 |bot=InternetArchiveBot |fix-attempted=yes }}
- August 28, 2009: Apple released Mac OS X Snow Leopard, which contains a full implementation of OpenCL.{{cite web |first1=Dan |last1=Moren |first2=Jason |last2=Snell |url=http://www.macworld.com/article/140897/2009/06/keynote.html |title=Live Update: WWDC 2009 Keynote |website=MacWorld.com |publisher=MacWorld |date=June 8, 2009 |access-date=June 12, 2009}}
- September 28, 2009: Nvidia released its own OpenCL drivers and SDK implementation.
- October 13, 2009: AMD released the fourth beta of the ATI Stream SDK 2.0, which provides a complete OpenCL implementation on both R700/HD 5000 GPUs and SSE3 capable CPUs. The SDK is available for both Linux and Windows.{{cite web |url=http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx#one |title=ATI Stream Software Development Kit (SDK) v2.0 Beta Program |access-date=October 14, 2009 |url-status=dead |archive-url=https://web.archive.org/web/20090809065559/http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx |archive-date=August 9, 2009}}
- November 26, 2009: Nvidia released drivers for OpenCL 1.0 (rev 48).
- October 27, 2009: S3 released their first product supporting native OpenCL 1.0 – the Chrome 5400E embedded graphics processor.{{cite web |url=http://www.s3graphics.com/en/news/news_detail.aspx?id=44 |title=S3 Graphics launched the Chrome 5400E embedded graphics processor |access-date=October 27, 2009 |url-status=dead |archive-url=https://web.archive.org/web/20091202065250/http://www.s3graphics.com/en/news/news_detail.aspx?id=44 |archive-date=December 2, 2009}}
- December 10, 2009: VIA released their first product supporting OpenCL 1.0 – ChromotionHD 2.0 video processor included in VN1000 chipset.{{cite web |url=http://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_no=4327 |title=VIA Brings Enhanced VN1000 Graphics Processor |access-date=December 10, 2009 |url-status=dead |archive-url=https://web.archive.org/web/20091215090119/http://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_no=4327 |archive-date=December 15, 2009 |df=mdy-all }}
- December 21, 2009: AMD released the production version of the ATI Stream SDK 2.0,{{cite web |url=http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx |title=ATI Stream SDK v2.0 with OpenCL 1.0 Support |access-date=October 23, 2009 |url-status=dead |archive-url=https://web.archive.org/web/20091101061303/http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx |archive-date=November 1, 2009 |df=mdy-all }} which provides OpenCL 1.0 support for HD 5000 GPUs and beta support for R700 GPUs.
- June 1, 2010: ZiiLABS released details of their first OpenCL implementation for the ZMS processor for handheld, embedded and digital home products.{{cite web |url=http://www.ziilabs.com/opencl |title=OpenCL |publisher=ZiiLABS |access-date=June 23, 2015}}
- June 30, 2010: IBM released a fully conformant version of OpenCL 1.0.
- September 13, 2010: Intel released details of their first OpenCL implementation for the Sandy Bridge chip architecture. Sandy Bridge will integrate Intel's newest graphics chip technology directly onto the central processing unit.{{cite web |url=http://news.cnet.com/8301-13924_3-20016302-64.html |title=Intel discloses new Sandy Bridge technical details |access-date=September 13, 2010 |archive-date=October 31, 2013 |archive-url=https://web.archive.org/web/20131031085949/http://news.cnet.com/8301-13924_3-20016302-64.html |url-status=dead }}
- November 15, 2010: Wolfram Research released Mathematica 8 with OpenCLLink{{cite web | url=http://reference.wolfram.com/mathematica/OpenCLLink/tutorial/Overview.html | title=OpenCLLink Overview—Wolfram Mathematica 9 Documentation }} package.
- March 3, 2011: Khronos Group announces the formation of the WebCL working group to explore defining a JavaScript binding to OpenCL. This creates the potential to harness GPU and multi-core CPU parallel processing from a Web browser.{{cite web |url=https://www.khronos.org/news/categories/C251 |title=WebCL related stories |publisher=Khronos Group |access-date=June 23, 2015}}{{cite web |url=https://www.khronos.org/news/press/releases/khronos-releases-final-webgl-1.0-specification |title=Khronos Releases Final WebGL 1.0 Specification |publisher=Khronos Group |access-date=June 23, 2015 |url-status=dead |archive-url=https://web.archive.org/web/20150709134803/https://www.khronos.org/news/press/releases/khronos-releases-final-webgl-1.0-specification |archive-date=July 9, 2015 |df=mdy-all }}
- March 31, 2011: IBM released a fully conformant version of OpenCL 1.1.{{Cite web|url=https://developer.ibm.com/community/|title=IBM Developer|website=developer.ibm.com}}
- April 25, 2011: IBM released OpenCL Common Runtime v0.1 for Linux on x86 Architecture.{{Cite web|url=https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Wbf059a58a9b9_459d_aca4_493655c96370/page/OpenCL+Common+Runtime|title=Welcome to Wikis|date=October 20, 2009|website=www.ibm.com}}
- May 4, 2011: Nokia Research releases an open source WebCL extension for the Firefox web browser, providing a JavaScript binding to OpenCL.{{cite web |url=https://www.khronos.org/news/permalink/nokia-research-releases-webcl-prototype |title=Nokia Research releases WebCL prototype |publisher=Khronos Group |date=May 4, 2011 |access-date=June 23, 2015 |archive-date=December 5, 2020 |archive-url=https://web.archive.org/web/20201205063140/https://www.khronos.org/news/permalink/nokia-research-releases-webcl-prototype |url-status=dead }}
- July 1, 2011: Samsung Electronics releases an open source prototype implementation of WebCL for WebKit, providing a JavaScript binding to OpenCL.{{cite web |last=KamathK |first=Sharath |url=https://github.com/SRA-SiliconValley/webkit-webcl |title=Samsung's WebCL Prototype for WebKit |publisher=Github.com |access-date=June 23, 2015 |url-status=dead |archive-url=https://web.archive.org/web/20150218105743/https://github.com/SRA-SiliconValley/webkit-webcl |archive-date=February 18, 2015 |df=mdy-all }}
- August 8, 2011: AMD released the OpenCL-driven AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK) v2.5, replacing the ATI Stream SDK as technology and concept.{{cite web |url=https://www.amd.com/us/press-releases/Pages/app-sdk-2011aug08.aspx |title=AMD Opens the Throttle on APU Performance with Updated OpenCL Software Development |publisher=Amd.com |date=August 8, 2011 |access-date=June 16, 2013}}
- December 12, 2011: AMD released AMD APP SDK v2.6{{cite web |url=http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=157108 |title=AMD APP SDK v2.6 |publisher=Forums.amd.com |date=March 13, 2015 |access-date=June 23, 2015}}{{dead link|date=March 2021}} which contains a preview of OpenCL 1.2.
- February 27, 2012: The Portland Group released the PGI OpenCL compiler for multi-core ARM CPUs.{{cite web |url=http://www.anandtech.com/show/5607/the-portland-group-announces-opencl-compiler-for-stericsson-armbased-novathor-socs |title=The Portland Group Announces OpenCL Compiler for ST-Ericsson ARM-Based NovaThor SoCs |access-date=May 4, 2012}}
- April 17, 2012: Khronos released a WebCL working draft.{{cite web |url=https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index.html |title=WebCL Latest Spec |publisher=Khronos Group |date=November 7, 2013 |access-date=June 23, 2015 |archive-url=https://web.archive.org/web/20140801133503/https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index.html |archive-date=August 1, 2014 |url-status=dead }}
- May 6, 2013: Altera released the Altera SDK for OpenCL, version 13.0.{{cite web |url=http://newsroom.altera.com/press-releases/altera-opens-the-world-of-fpgas-to-software-programmers-with-broad-availability-of-sdk-and-off-the-shelf-boards-for-opencl.htm |title=Altera Opens the World of FPGAs to Software Programmers with Broad Availability of SDK and Off-the-Shelf Boards for OpenCL |publisher=Altera.com |access-date=January 9, 2014 |archive-url=https://web.archive.org/web/20140109220211/http://newsroom.altera.com/press-releases/altera-opens-the-world-of-fpgas-to-software-programmers-with-broad-availability-of-sdk-and-off-the-shelf-boards-for-opencl.htm |archive-date=January 9, 2014 |url-status=dead }} It is conformant to OpenCL 1.0.{{cite web |url=http://newsroom.altera.com/press-releases/nr-altera-sdk-opencl-conformance.htm |title=Altera SDK for OpenCL is First in Industry to Achieve Khronos Conformance for FPGAs |publisher=Altera.com |access-date=January 9, 2014 |archive-url=https://web.archive.org/web/20140109083533/http://newsroom.altera.com/press-releases/nr-altera-sdk-opencl-conformance.htm |archive-date=January 9, 2014 |url-status=dead }}
- November 18, 2013: Khronos announced that the specification for OpenCL 2.0 had been finalized.{{cite web |url=https://www.khronos.org/news/press/khronos-finalizes-opencl-2.0-specification-for-heterogeneous-computing |title=Khronos Finalizes OpenCL 2.0 Specification for Heterogeneous Computing |publisher=Khronos Group |date=November 18, 2013 |access-date=June 23, 2015}}
- March 19, 2014: Khronos releases the WebCL 1.0 specification.{{cite web |url=https://www.khronos.org/news/press/khronos-releases-webcl-1.0-specification |title=WebCL 1.0 Press Release |publisher=Khronos Group |date=March 19, 2014 |access-date=June 23, 2015}}{{cite web |url=https://www.khronos.org/registry/webcl/specs/1.0.0/ |title=WebCL 1.0 Specification |publisher=Khronos Group |date=March 14, 2014 |access-date=June 23, 2015}}
- August 29, 2014: Intel releases HD Graphics 5300 driver that supports OpenCL 2.0.{{Cite web|url=https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=24245|title=Intel OpenCL 2.0 Driver|access-date=October 14, 2014|archive-url=https://archive.today/20140917185715/https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=24245|archive-date=September 17, 2014|url-status=dead}}
- September 25, 2014: AMD releases Catalyst 14.41 RC1, which includes an OpenCL 2.0 driver.{{cite web |url=http://support.amd.com/en-us/kb-articles/Pages/OpenCL2-Driver.aspx |title=AMD OpenCL 2.0 Driver |website=Support.AMD.com |date=June 17, 2015 |access-date=June 23, 2015}}
- January 14, 2015: Xilinx Inc. announces SDAccel development environment for OpenCL, C, and C++, achieves Khronos Conformance.{{Cite web|url=https://www.khronos.org/news/permalink/xilinx-sdaccel-development-environment-for-opencl-c-and-c-achieves-khronos|title=Xilinx SDAccel development environment for OpenCL, C, and C++, achieves Khronos Conformance – khronos.org news|website=The Khronos Group|language=en|access-date=June 26, 2017}}
- April 13, 2015: Nvidia releases WHQL driver v350.12, which includes OpenCL 1.2 support for GPUs based on Kepler or later architectures.{{cite web |url=https://a248.e.akamai.net/f/248/10/10/us.download.nvidia.com/Windows/350.12/350.12-win8-win7-winvista-desktop-release-notes.pdf |title=Release 349 Graphics Drivers for Windows, Version 350.12 |date=April 13, 2015 |access-date=February 4, 2016}} Driver 340+ support OpenCL 1.1 for Tesla and Fermi.
- August 26, 2015: AMD released AMD APP SDK v3.0{{cite web |url=http://developer.amd.com/community/blog/2015/08/26/introducing-app-sdk-30-opencl-2/ |title=AMD APP SDK 3.0 Released |website=Developer.AMD.com |date=August 26, 2015 |access-date=September 11, 2015}} which contains full support of OpenCL 2.0 and sample coding.
- November 16, 2015: Khronos announced that the specification for OpenCL 2.1 had been finalized.{{cite web |url=https://www.khronos.org/news/press/khronos-releases-opencl-2.1-and-spir-v-1.0-specifications-for-heterogeneous |title=Khronos Releases OpenCL 2.1 and SPIR-V 1.0 Specifications for Heterogeneous Parallel Programming |date=November 16, 2015 |publisher=Khronos Group }}
- April 18, 2016: Khronos announced that the specification for OpenCL 2.2 had been provisionally finalized.
- November 3, 2016: Intel support for Gen7+ of OpenCL 2.1 in SDK 2016 r3.{{cite web|url=https://software.intel.com/en-us/whats-new-code-builder-2016-r3|title=What's new? Intel® SDK for OpenCL™ Applications 2016, R3|publisher=Intel Software}}
- February 17, 2017: Nvidia begins evaluation support of OpenCL 2.0 with driver 378.66.{{cite web |url=https://www.khronos.org/news/permalink/nvidia-378.66-drivers-for-windows-offer-opencl-2.0-evaluation-support |title=NVIDIA 378.66 drivers for Windows offer OpenCL 2.0 evaluation support |date=February 17, 2017 |publisher=Khronos Group |access-date=March 17, 2017 |archive-date=August 6, 2020 |archive-url=https://web.archive.org/web/20200806235148/https://www.khronos.org/news/permalink/nvidia-378.66-drivers-for-windows-offer-opencl-2.0-evaluation-support |url-status=dead }}{{Cite web|url=https://streamhpc.com/blog/2017-02-22/nvidia-enables-opencl-2-0-beta-support/|title=NVIDIA enables OpenCL 2.0 beta-support|first=Jakub|last=Szuppe|date=February 22, 2017}}{{Cite web|url=https://streamhpc.com/blog/2017-03-06/nvidia-beta-support-opencl-2-0-linux/|title=NVIDIA beta-support for OpenCL 2.0 works on Linux too|first=Jakub|last=Szuppe|date=March 6, 2017}}
- May 16, 2017: Khronos announced that the specification for OpenCL 2.2 had been finalized with SPIR-V 1.2.{{Cite web|url=https://www.khronos.org/news/permalink/khronos-releases-opencl-2.2-with-spir-v-1.2|title=The Khronos Group|date=March 21, 2019|website=The Khronos Group}}
- May 14, 2018: Khronos announced Maintenance Update for OpenCL 2.2 with Bugfix and unified headers.
- April 27, 2020: Khronos announced provisional Version of OpenCL 3.0.
- June 1, 2020: Intel NEO runtime with OpenCL 3.0 for new Tiger Lake.
- June 3, 2020: AMD announced RocM 3.5 with OpenCL 2.2 support.{{Cite web|url=https://github.com/RadeonOpenCompute/ROCm/tree/roc-3.5.0|title = GitHub – RadeonOpenCompute/ROCm at roc-3.5.0|website = GitHub}}
- September 30, 2020: Khronos announced that the specifications for OpenCL 3.0 had been finalized (CTS also available).
- October 16, 2020: Intel announced with NEO 20.41 support for OpenCL 3.0 (includes mostly of optional OpenCL 2.x).
- April 6, 2021: Nvidia supports OpenCL 3.0 for Ampere. Maxwell and later GPUs also supports OpenCL 3.0 with Nvidia driver 465+.{{Cite web|url=https://developer.nvidia.com/blog/nvidia-is-now-opencl-3-0-conformant/|title = NVIDIA is Now OpenCL 3.0 Conformant|date = April 12, 2021}}
- August 20, 2022: Intel Arc Alchemist GPUs (Arc A380, A350M, A370M, A550M, A730M and A770M) are conformant with OpenCL 3.0.{{Cite web |date=2022-12-12 |title=The Khronos Group |url=https://www.khronos.org/adopters/conformant-products/opencl |access-date=2022-12-12 |website=The Khronos Group |language=en}}
- October 14, 2022: Arm Mali-G615 and Mali-G715-Immortalis are conformant with OpenCL 3.0.
- November 11, 2022: The Rusticl OpenCL Library is conformant with OpenCL 3.0.{{Cite web |title=Mesa's Rusticl Achieves Official OpenCL 3.0 Conformance |url=https://www.phoronix.com/news/Rusticl-OpenCL-3.0-Conformance |access-date=2022-12-12 |website=www.phoronix.com |language=en}}
Devices
As of 2016, OpenCL runs on graphics processing units (GPUs), CPUs with SIMD instructions, FPGAs, Movidius Myriad 2, Adapteva Epiphany and DSPs.
= Khronos Conformance Test Suite =
To be officially conformant, an implementation must pass the Khronos Conformance Test Suite (CTS), with results being submitted to the Khronos Adopters Program.{{Cite web|url=https://www.khronos.org/adopters|title=The Khronos Group|date=2019-08-20|website=The Khronos Group|language=en|access-date=2019-08-20}} The Khronos CTS code for all OpenCL versions has been available in open source since 2017.{{cite web|url=https://github.com/KhronosGroup/OpenCL-CTS|title=KhronosGroup/OpenCL-CTL: The OpenCL Conformance Tests|date=2019-03-21|publisher=GitHub}}
= Conformant products =
The Khronos Group maintains an extended list of OpenCL-conformant products.
All standard-conformant implementations can be queried using one of the clinfo tools (there are multiple tools with the same name and similar feature set).{{cite web|title=clinfo by Simon Leblanc|website=GitHub|url=https://github.com/simleb/clinfo|access-date=January 27, 2017}}{{cite web|title=clinfo by Oblomov|website=GitHub|url=https://github.com/Oblomov/clinfo|access-date=January 27, 2017}}{{cite web|title=clinfo: openCL INFOrmation|date=April 2, 2013 |url=https://sourceforge.net/projects/clinfo/|access-date=January 27, 2017}}
= Version support =
Products and their version of OpenCL support include:{{Cite web |url=https://www.khronos.org/conformance/adopters/conformant-products#opencl |title=Khronos Products |website=The Khronos Group |access-date=May 15, 2017}}
== OpenCL 3.0 support ==
All hardware with OpenCL 1.2+ is possible, OpenCL 2.x only optional, Khronos Test Suite available since 2020-10{{Cite web|url=https://github.com/KhronosGroup/OpenCL-CTS/tree/master/test_conformance|title = OpenCL-CTS/Test_conformance at main · KhronosGroup/OpenCL-CTS|website = GitHub}}{{Cite web|url=https://github.com/KhronosGroup/OpenCL-CTS/issues?q=label%3Amust_fix_3.0+is%3Aclosed|title=Issues · KhronosGroup/OpenCL-CTS|website=GitHub}}
- (2020) Intel NEO Compute: 20.41+ for Gen 12 Tiger Lake to Broadwell (include full 2.0 and 2.1 support and parts of 2.2){{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Intel-Compute-20.43.18277|title = Intel Compute-Runtime 20.43.18277 Brings Alder Lake Support}}
- (2020) Intel 6th, 7th, 8th, 9th, 10th, 11th gen processors (Skylake, Kaby Lake, Coffee Lake, Comet Lake, Ice Lake, Tiger Lake) with latest Intel Windows graphics driver
- (2021) Intel 11th, 12th gen processors (Rocket Lake, Alder Lake) with latest Intel Windows graphics driver
- (2021) Arm Mali-G78, Mali-G310, Mali-G510, Mali-G610, Mali-G710 and Mali-G78AE.
- (2022) Intel 13th gen processors (Raptor Lake) with latest Intel Windows graphics driver
- (2022) Intel Arc discrete graphics with latest Intel Arc Windows graphics driver
- (2021) Nvidia Maxwell, Pascal, Volta, Turing and Ampere with Nvidia graphics driver 465+.
- (2022) Nvidia Ada Lovelace with Nvidia graphics driver 525+.
- (2022) Samsung Xclipse 920 GPU (based on AMD RDNA2)
- (2023) Intel 14th gen processors (Raptor Lake) Refresh with latest Intel Windows graphics driver
- (2023) Intel Core Ultra Series 1 processors (Meteor Lake) with latest Intel Windows graphics driver
== OpenCL 2.2 support ==
None yet: Khronos Test Suite ready, with Driver Update all Hardware with 2.0 and 2.1 support possible
- Intel NEO Compute: Work in Progress for actual products{{Cite web|url=https://01.org/compute-runtime|title=compute-runtime|date=February 7, 2018|website=01.org}}
- ROCm: Version 3.5+ mostly
== OpenCL 2.1 support ==
- (2018+) Support backported to Intel 5th and 6th gen processors (Broadwell, Skylake)
- (2017+) Intel 7th, 8th, 9th, 10th gen processors (Kaby Lake, Coffee Lake, Comet Lake, Ice Lake)
- (2017+) Intel Xeon Phi processors (Knights Landing) (experimental runtime)
- Khronos: with Driver Update all Hardware with 2.0 support possible
== OpenCL 2.0 support ==
- (2011+) AMD GCN GPU's (HD 7700+/HD 8000/Rx 200/Rx 300/Rx 400/Rx 500/Rx 5000-Series), some GCN 1st Gen only 1.2 with some Extensions
- (2013+) AMD GCN APU's (Jaguar, Steamroller, Puma, Excavator & Zen-based)
- (2014+) Intel 5th & 6th gen processors (Broadwell, Skylake)
- (2015+) Qualcomm Adreno 5xx series
- (2018+) Qualcomm Adreno 6xx series
- (2017+) ARM Mali (Bifrost) G51 and G71 in Android 7.1 and Linux
- (2018+) ARM Mali (Bifrost) G31, G52, G72 and G76
- (2017+) incomplete Evaluation support: Nvidia Kepler, Maxwell, Pascal, Volta and Turing GPU's (GeForce 600, 700, 800, 900 & 10-series, Quadro K-, M- & P-series, Tesla K-, M- & P-series) with Driver Version 378.66+
== OpenCL 1.2 support ==
- (2011+) for some AMD GCN 1st Gen some OpenCL 2.0 Features not possible today, but many more Extensions than Terascale
- (2009+) AMD TeraScale 2 & 3 GPU's (RV8xx, RV9xx in HD 5000, 6000 & 7000 Series)
- (2011+) AMD TeraScale APU's (K10, Bobcat & Piledriver-based)
- (2012+) Nvidia Kepler, Maxwell, Pascal, Volta and Turing GPU's (GeForce 600, 700, 800, 900, 10, 16, 20 series, Quadro K-, M- & P-series, Tesla K-, M- & P-series)
- (2012+) Intel 3rd & 4th gen processors (Ivy Bridge, Haswell)
- (2013+) Intel Xeon Phi coprocessors (Knights Corner)
- (2013+) Qualcomm Adreno 4xx series
- (2013+) ARM Mali Midgard 3rd gen (T760)
- (2015+) ARM Mali Midgard 4th gen (T8xx)
== OpenCL 1.1 support ==
== OpenCL 1.0 support ==
- mostly updated to 1.1 and 1.2 after first Driver for 1.0 only
Portability, performance and alternatives
A key feature of OpenCL is portability, via its abstracted memory and execution model, and the programmer is not able to directly use hardware-specific technologies such as inline Parallel Thread Execution (PTX) for Nvidia GPUs unless they are willing to give up direct portability on other platforms. It is possible to run any OpenCL kernel on any conformant implementation.
However, performance of the kernel is not necessarily portable across platforms. Existing implementations have been shown to be competitive when kernel code is properly tuned, though, and auto-tuning has been suggested as a solution to the performance portability problem, yielding "acceptable levels of performance" in experimental linear algebra kernels.{{cite journal |last1=Du |first1=Peng |last2=Weber |first2=Rick |last3=Luszczek |first3=Piotr |last4=Tomov |first4=Stanimire |last5=Peterson |first5=Gregory |last6=Dongarra |first6=Jack |title=From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming |journal=Parallel Computing |date=August 2012 |volume=38 |issue=8 |pages=391–407 |doi=10.1016/j.parco.2011.10.002 |author-link6=Jack Dongarra }} Portability of an entire application containing multiple kernels with differing behaviors was also studied, and shows that portability only required limited tradeoffs.{{cite book |last1=Dolbeau |first1=Romain |title=2013 IEEE 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS) |pages=1–6 |last2=Bodin |first2=François |last3=de Verdière |first3=Guillaume Colin |date=September 7, 2013 |chapter=One OpenCL to rule them all? |doi=10.1109/MuCoCoS.2013.6633603 |isbn=978-1-4799-1010-6 }}
A study at Delft University from 2011 that compared CUDA programs and their straightforward translation into OpenCL C found CUDA to outperform OpenCL by at most 30% on the Nvidia implementation. The researchers noted that their comparison could be made fairer by applying manual optimizations to the OpenCL programs, in which case there was "no reason for OpenCL to obtain worse performance than CUDA". The performance differences could mostly be attributed to differences in the programming model (especially the memory model) and to NVIDIA's compiler optimizations for CUDA compared to those for OpenCL.{{cite conference |conference=Proc. Int'l Conf. on Parallel Processing |doi=10.1109/ICPP.2011.45 |chapter=A Comprehensive Performance Comparison of CUDA and OpenCL |year=2011 |last1=Fang |first1=Jianbin |last2=Varbanescu |first2=Ana Lucia |last3=Sips |first3=Henk|title=2011 International Conference on Parallel Processing |pages=216–225 |isbn=978-1-4577-1336-1 }}
Another study at D-Wave Systems Inc. found that "The OpenCL kernel’s performance is between about 13% and 63% slower, and the end-to-end time is between about 16% and 67% slower" than CUDA's performance.{{cite arXiv |first1=Kamran |last1=Karimi |first2=Neil G. |last2=Dickson |first3=Firas |last3=Hamze |eprint=1005.2581v3 |year=2011 |title=A Performance Comparison of CUDA and OpenCL|class=cs.PF }}
The fact that OpenCL allows workloads to be shared by CPU and GPU, executing the same programs, means that programmers can exploit both by dividing work among the devices.A Survey of CPU-GPU Heterogeneous Computing Techniques, ACM Computing Surveys, 2015. This leads to the problem of deciding how to partition the work, because the relative speeds of operations differ among the devices. Machine learning has been suggested to solve this problem: Grewe and O'Boyle describe a system of support-vector machines trained on compile-time features of program that can decide the device partitioning problem statically, without actually running the programs to measure their performance.{{cite conference |chapter=A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL |first1=Dominik |last1=Grewe |first2=Michael F. P. |last2=O'Boyle |title=Compiler Construction |series=Lecture Notes in Computer Science |conference=Proc. Int'l Conf. on Compiler Construction |year=2011 |volume=6601 |pages=286–305 |doi=10.1007/978-3-642-19861-8_16|isbn=978-3-642-19860-1 |doi-access=free }}
In a comparison of actual graphic cards of AMD RDNA 2 and Nvidia RTX Series there is an undecided result by OpenCL-Tests. Possible performance increases from the use of Nvidia CUDA or OptiX were not tested.{{Cite web|url=https://www.phoronix.com/review/amd-rx6800-opencl|title=Radeon RX 6800 Series Has Excellent ROCm-Based OpenCL Performance On Linux|website=www.phoronix.com}}
See also
{{div col|colwidth=18em}}
- Advanced Simulation Library
- AMD FireStream
- BrookGPU
- C++ AMP
- Close to Metal
- CUDA
- DirectCompute
- GPGPU
- HIP
- Larrabee
- Lib Sh
- List of OpenCL applications
- OpenACC
- OpenGL
- OpenHMPP
- OpenMP
- Metal
- RenderScript
- SequenceL
- SIMD
- SYCL
- Vulkan
- WebCL
{{div col end}}
References
{{Reflist|30em}}
External links
{{Commons category}}
- {{Official website|https://www.khronos.org/opencl/}}
- {{Official website|https://www.khronos.org/webcl/}} for WebCL
- [https://www.iwocl.org/ International Workshop on OpenCL] {{Webarchive|url=https://web.archive.org/web/20210126191545/https://www.iwocl.org/ |date=January 26, 2021 }} (IWOCL) sponsored by The Khronos Group
{{Khronos Group standards}}
{{Parallel computing}}
Category:Application programming interfaces