CuPy

{{Short description|Numerical programming library for the Python programming language}}

{{Infobox software

| name = CuPy

| logo =

| screenshot =

| caption =

| collapsible =

| author = Seiya Tokui

| developer = Community, Preferred Networks, Inc.

| released = {{start date and age|2015|9|2}}.{{cite web|url=https://github.com/chainer/chainer/releases/v1.3.0|title=Release v1.3.0 – chainer/chainer|via=GitHub|access-date=25 June 2022}}

| latest release version = v13.3.0{{cite web|url=https://github.com/cupy/cupy/releases|title=Releases – cupy/cupy|via=GitHub|access-date=8 September 2024}}

| latest release date = {{start date and age|2024|8|22}}

| programming language = Python, Cython, CUDA

| repo = {{URL|http://github.com/cupy/cupy}}

| operating system = Linux, Windows

| platform = Cross-platform

| size =

| language =

| genre = Numerical analysis

| license = MIT

| website = {{URL|https://cupy.dev/}}

}}

CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them.

{{cite conference

| url = http://learningsys.org/nips17/assets/papers/paper_16.pdf

| title = CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations

| last1 = Okuta

| first1 = Ryosuke

| last2 = Unno

| first2 = Yuya

| last3 = Nishino

| first3 = Daisuke

| last4 = Hido

| first4 = Shohei

| last5 = Loomis

| first5 = Crissman

| date = 2017

| publisher = Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS)

}}

CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports Nvidia CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.

{{cite web

| url = https://www.phoronix.com/scan.php?page=news_item&px=CuPy-9.0-Released

| title = CuPy 9.0 Brings AMD GPU Support To This Numpy-Compatible Library - Phoronix

| date = 29 April 2021

| access-date = 21 June 2022

| website = Phoronix

}}{{Cite web

| title = AMD Leads High Performance Computing Towards Exascale and Beyond

| date = 28 June 2021

| access-date = 21 June 2022

| url = https://ir.amd.com/news-events/press-releases/detail/1012/amd-leads-high-performance-computing-towards-exascale-and

| quote = Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing.

}}

CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.

{{cite web

| url = https://www.preferred.jp/en/news/pr20170602/

| date = 2 June 2017

| title = Preferred Networks released Version 2 of Chainer, an Open Source framework for Deep Learning - Preferred Networks, Inc.

| access-date = 18 June 2022

}}

CuPy is a part of the NumPy ecosystem array libraries

{{cite web

| url = https://numpy.org/

| title = NumPy

| work = numpy.org

| access-date = 21 June 2022

}} and is widely adopted to utilize GPU with Python,{{cite book

| last1 = Gorelick

| first1 = Micha

| last2 = Ozsvald

| first2 = Ian

| title = High Performance Python: Practical Performant Programming for Humans

| publisher = O'Reilly Media, Inc.

| edition = 2nd

| date = April 2020

| isbn = 9781492055020

| page = 190

}} especially in high-performance computing environments such as Summit,{{cite web

| url = https://docs.olcf.ornl.gov/software/python/cupy.html

| title = Installing CuPy

| author = Oak Ridge Leadership Computing Facility

| work = OLCF User Documentation

| access-date = 21 June 2022

}} Perlmutter,{{cite web

| url = https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#cupy

| title = Using Python on Perlmutter

| author = National Energy Research Scientific Computing Center

| work = NERSC Documentation

| access-date = 21 June 2022

}} EULER,{{cite web

| url = https://scicomp.ethz.ch/wiki/CuPy

| title = CuPy

| author = ETH Zurich

| work = ScientificComputing

| access-date = 21 June 2022

}} and ABCI.{{Cite web

| title = Chainer

| author = National Institute of Advanced Industrial Science and Technology

| work = ABCI 2.0 User Guide

| access-date = 21 June 2022

| url = https://docs.abci.ai/en/apps/chainer/

}}

CuPy is a NumFOCUS sponsored project.{{cite web

| url = https://numfocus.org/sponsored-projects

| title = Sponsored Projects - NumFOCUS

| access-date = 8 September 2024

}}

Features

CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.{{cite web

| url = https://docs.cupy.dev/en/latest/overview.html

| title = Overview

| work = CuPy documentation

| access-date = 18 June 2022}}{{cite web

| url = https://docs.cupy.dev/en/latest/reference/comparison.html

| title = Comparison Table

| work = CuPy documentation

| access-date = 18 June 2022

}}

= NumPy-compatible APIs =

The same set of APIs defined in the NumPy package ({{code|numpy.*}}) are available under {{code|cupy.*}} package.

= SciPy-compatible APIs =

The same set of APIs defined in the SciPy package ({{code|scipy.*}}) are available under {{code|cupyx.scipy.*}} package.

= User-defined GPU kernels =

  • Kernel templates for element-wise and reduction operations
  • Raw kernel (CUDA C/C++)
  • Just-in-time transpiler (JIT)
  • Kernel fusion

= Distributed computing =

  • Distributed communication package ({{code|cupyx.distributed}}), providing collective and peer-to-peer primitives

= Low-level CUDA features =

  • Stream and event
  • Memory pool
  • Profiler
  • Host API binding
  • CUDA Python support{{cite web

|url = https://developer.nvidia.com/cuda-python

|title = CUDA Python {{!}} NVIDIA Developer

|access-date = 21 June 2022

}}

= Interoperability =

  • DLPack{{cite web

| url = https://dmlc.github.io/dlpack/latest/

| title = Welcome to DLPack's documentation!

| work = DLPack 0.6.0 documentation

| access-date = 21 June 2022

}}

  • CUDA Array Interface{{Cite web

| title = CUDA Array Interface (Version 3)

| work = Numba 0.55.2+0.g2298ad618.dirty-py3.7-linux-x86_64.egg documentation

| access-date = 21 June 2022

| url = https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html

}}

  • NEP 13 ({{code|__array_ufunc__}}){{Cite web

| title = NEP 13 — A mechanism for overriding Ufuncs — NumPy Enhancement Proposals

| work = numpy.org

| access-date = 21 June 2022

| url = https://numpy.org/neps/nep-0013-ufunc-overrides.html

}}

  • NEP 18 ({{code|__array_function__}}){{Cite web

| title = NEP 18 — A dispatch mechanism for NumPy's high level array functions — NumPy Enhancement Proposals

| work = numpy.org

| access-date = 21 June 2022

| url = https://numpy.org/neps/nep-0018-array-function-protocol.html

}}{{cite Q|Q99413970|display-authors=3}}

  • Array API Standard{{Cite web

| title = 2021 report - Python Data APIs Consortium

| access-date = 21 June 2022

| url = https://data-apis.org/files/2021_annual_report_DataAPIs_Consortium.pdf

}}{{Cite web

| title = Purpose and scope

| work = Python array API standard 2021.12 documentation

| access-date = 21 June 2022

| url = https://data-apis.org/array-api/latest/purpose_and_scope.html

}}

Examples

= Array creation =

>>> import cupy as cp

>>> x = cp.array([1, 2, 3])

>>> x

array([1, 2, 3])

>>> y = cp.arange(10)

>>> y

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

= Basic operations =

>>> import cupy as cp

>>> x = cp.arange(12).reshape(3, 4).astype(cp.float32)

>>> x

array([[ 0., 1., 2., 3.],

[ 4., 5., 6., 7.],

[ 8., 9., 10., 11.]], dtype=float32)

>>> x.sum(axis=1)

array([ 6., 22., 38.], dtype=float32)

= Raw CUDA C/C++ kernel =

>>> import cupy as cp

>>> kern = cp.RawKernel(r'''

... extern "C" __global__

... void multiply_elemwise(const float* in1, const float* in2, float* out) {

... int tid = blockDim.x * blockIdx.x + threadIdx.x;

... out[tid] = in1[tid] * in2[tid];

... }

... ''', 'multiply_elemwise')

>>> in1 = cp.arange(16, dtype=cp.float32).reshape(4, 4)

>>> in2 = cp.arange(16, dtype=cp.float32).reshape(4, 4)

>>> out = cp.zeros((4, 4), dtype=cp.float32)

>>> kern((4,), (4,), (in1, in2, out)) # grid, block and arguments

>>> out

array([[ 0., 1., 4., 9.],

[ 16., 25., 36., 49.],

[ 64., 81., 100., 121.],

[144., 169., 196., 225.]], dtype=float32)

Applications

| title = Install spaCy

| work = spaCy Usage Documentation

| url = https://spacy.io/usage#gpu

| access-date = 21 June 2022

}}{{cite book

| last1 = Patel

| first1 = Ankur A.

| last2 = Arasanipalai

| first2 = Ajay Uppili

| title = Applied Natural Language Processing in the Enterprise

| publisher = O'Reilly Media, Inc.

| edition = 1st

| date = May 2021

| isbn = 9781492062578

| page = 68

}}

| title = Python Package Introduction

| work = xgboost 1.6.1 documentation

| access-date = 21 June 2022

| url = https://xgboost.readthedocs.io/en/stable/python/python_intro.html#data-interface

}}

| title = UCBerkeleySETI/turbo_seti: turboSETI -- python based SETI search algorithm.

| website = GitHub

| url = https://github.com/UCBerkeleySETI/turbo_seti#turbo_seti

| access-date = 21 June 2022

}}

  • NVIDIA RAPIDS{{Cite web

| title = Open GPU Data Science {{!}} RAPIDS

| access-date = 21 June 2022

| url = https://rapids.ai/

}}{{Cite web

| title = API Docs

| work = RAPIDS Docs

| access-date = 21 June 2022

| url = https://docs.rapids.ai/api

}}{{Cite web

| title = Efficient Data Sharing between CuPy and RAPIDS

| access-date = 21 June 2022

| url = https://medium.com/rapids-ai/using-rapids-memory-manager-with-cupy-8d08fe8f58fa

}}{{Cite web

| title = 10 Minutes to cuDF and CuPy

| access-date = 21 June 2022

| url = https://medium.com/rapids-ai/10-minutes-to-cudf-and-cupy-e131cac0439b

}}

  • {{not a typo|einops}}

{{cite conference

| url = https://openreview.net/forum?id=oapKSVM2bcj

| title = Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation

| last1 = Alex

| first1 = Rogozhnikov

| date = 2022

| publisher = International Conference on Learning Representations

}}{{Cite web

| title = arogozhnikov/einops: Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

| website = GitHub

| url = https://github.com/arogozhnikov/einops

| access-date = 21 June 2022

}}

| title = Array API support (experimental) — scikit-learn documentation

| url = https://scikit-learn.org/stable/modules/array_api.html

| access-date = 8 September 2024

}}

{{cite conference

| url = https://dl.acm.org/doi/10.1145/3292500.3330756

| title = Chainer: A Deep Learning Framework for Accelerating the Research Cycle

| last1 = Tokui

| first1 = Seiya

| last2 = Okuta

| first2 = Ryosuke

| last3 = Akiba

| first3 = Takuya

| last4 = Niitani

| first4 = Yusuke

| last5 = Ogawa

| first5 = Toru

| last6 = Saito

| first6 = Shunta

| last7 = Suzuki

| first7 = Shuji

| last8 = Uenishi

| first8 = Kota

| last9 = Vogel

| first9 = Brian

| last10 = Vincent

| first10 = Hiroyuki Yamazaki

| date = 2019

| publisher = Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

| doi = 10.1145/3292500.3330756

| url-access = subscription

}}

See also

References

{{reflist}}