Zarr (data format)

{{Short description|Storage format for multidimensional arrays}}

{{Infobox file format

| name = Zarr

| icon =

| iconcaption =

| icon_size =

| screenshot =

| screenshot_size =

| caption =

|_noextcode =

| extension = .zarr

|_nomimecode =

| mime =

| type_code =

| uniform_type =

| conforms_to =

| magic =

| max_size =

| developer =

| released =

| latest_release_version = 3

| latest_release_date =

| type = Multidimensional array

| container_for =

| contained_by =

| extended_from =

| extended_to =

| standard =

| free = Yes

| open = Yes

| url = {{URL|https://zarr.dev/}}

}}

Zarr is an open standard for storing large multidimensional array data. It specifies a protocol and data format, and is designed to be "cloud ready" including random access, by dividing data into subsets referred to as chunks.

{{cite web

| title = Zarr - chunked, compressed, N-dimensional arrays

| url = https://zarr.dev/

| website = zarr.dev

| access-date = 2024-09-12}}

{{cite web

| title = Cloud-Optimized Geospatial Formats Guide: Zarr

| url = https://guide.cloudnativegeo.org/zarr/intro.html

| website = guide.cloudnativegeo.org

| access-date = 2024-09-12}}

Zarr can be used within many programming languages, including Python, Java, JavaScript, C++, Rust and Julia.

{{cite web

| title = Zarr Implementations

| url = https://zarr.dev/implementations/

| website = zarr.dev

| access-date = 2025-01-09}}

It has been used by organizations such as Google and Microsoft to publish large datasets.

{{cite web

| title = Google Cloud: ERA5 data

| url = https://cloud.google.com/storage/docs/public-datasets/era5

| website = cloud.google.com

| access-date = 2024-09-12}}

{{cite web

| title = Microsoft Planetary Computer: Reading Zarr Data

| url = https://planetarycomputer.microsoft.com/docs/quickstarts/reading-zarr-data/

| website = planetarycomputer.microsoft.com

| access-date = 2024-09-12}}

Early versions of Zarr were first released in 2015 by Alistair Miles.

{{cite web

|url=https://pypi.org/project/zarr/#history

|title=zarr - PyPI

|access-date=2025-02-10

}}

{{cite web

|url=https://alimanfoo.github.io/2016/04/14/to-hdf5-and-beyond.html

|title=To HDF5 and beyond

|date=2016-04-14

|access-date=2025-02-10

|author=Alistair Miles

}}

Zarr is designed to support high-throughput distributed I/O on different storage systems, which is a common requirement in cloud computing. Multiple read operations can efficiently occur to a Zarr array in parallel, or multiple write operations in parallel.

{{cite web

| title = Zarr - Tutorial

| url = https://zarr.readthedocs.io/en/stable/tutorial.html

| website = zarr.readthedocs.io

| access-date = 2024-09-12}}

Format description

The main data format in Zarr is multidimensional arrays. For parallelisable access, these arrays are stored and accessed as a grid of so-called "chunks". The actual data format on disk depends on the compressor and storage plugins selected by the user.

File:Zarr-scipy2019-storage.png

Zarr's design was influenced by that of HDF5, and so it includes similar features for metadata and grouping: arrays can be grouped into named hierarchies, and they can also be annotated with key-value metadata stored alongside the array.

Applications

For bioimaging such as microscopy, a consortium called the Open Microscopy Environment (OME) created a format called "OME-Zarr", based on Zarr with some discipline-specific extensions.

{{cite journal

| author = Moore, Josh

| title = OME-Zarr: a cloud-optimized bioimaging file format with international community support

| journal = Histochemistry and Cell Biology

| volume = 160

| issue = 3

| publisher = Springer Science and Business Media LLC

| date = 2023

| pages = 223–251

| issn = 1432-119X

| doi = 10.1007/s00418-023-02209-1

| pmid = 37428210

| pmc = 10492740

| hdl = 1721.1/151126

| hdl-access = free

}}

Similarly, Zarr is being used to publish weather and satellite data

{{cite web

| title = Lazy loading: Making it easier to access vast datasets of weather & satellite data

| url = https://openclimatefix.org/post/lazy-loading-making-it-easier-to-access-vast-datasets-of-weather-satellite-data

| website = openclimatefix.org

| access-date = 2024-09-12}}

and energy data,

{{cite journal

| last1 = Sansal

| first1 = Altay

| last2 = Kainkaryam

| first2 = Sribharath

| last3 = Lasscock

| first3 = Ben

| last4 = Valenciano

| first4 = Alejandro

| author-link =

| title = MDIO: Open-source format for multidimensional energy data

| journal = The Leading Edge

| volume = 42

| issue = 7

| pages = 465–473

| publisher = Society of Exploration Geophysicists

| date = 2023

| issn = 1938-3789

| doi = 10.1190/tle42070465.1

| bibcode = 2023LeaEd..42..465S

}}

among others.

See also

References

{{reflist}}