Rasdaman
{{Short description|Database management system}}
{{lowercase title}}
{{Infobox software
| name = rasdaman
| logo = rasdaman logo.png
| author = Peter Baumann
| developer = rasdaman GmbH
| latest_release_version = rasdaman v10.3
| latest_release_date = {{start date |2024|03|13}}
| operating_system = Ubuntu
| programming language = C++{{cite web |url=https://www.openhub.net/p/rasdaman |title=The rasdaman Open Source Project on Open Hub |work=Open Hub |publisher=Black Duck Software |accessdate=2020-01-14}}
| genre = Array DBMS
| license = GPL v3 (server) / LGPL v3 (client) or proprietary{{cite web|url=http://rasdaman.org/wiki/License |title=Rasdaman License |publisher=rasdaman.org |date= |accessdate=2016-08-01}}
| website = {{URL|https://rasdaman.org}}, {{URL|https://rasdaman.com}}
}}
rasdaman ("raster data manager") is an Array DBMS, that is: a Database Management System which adds capabilities for storage and retrieval of massive multi-dimensional arrays, such as sensor, image, simulation, and statistics data. A frequently used synonym to arrays is raster data, such as in 2-D raster graphics; this actually has motivated the name rasdaman. However, rasdaman has no limitation in the number of dimensions - it can serve, for example, 1-D measurement data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z exploration data, 4-D ocean and climate data, and even beyond spatio-temporal dimensions.
History
In 1989, Peter Baumann started a research on database support for images, then at Fraunhofer Computer Graphics Institute. Following an in-depth investigation on raster data formalizations in imaging, in particular the AFATL Image Algebra, he established a database model for multi-dimensional arrays, including a data model and declarative query language.Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/journals/vldb/vldb3.html#Baumann94 On the Management of Multidimensional Discrete Data]. VLDB Journal 4(3)1994, Special Issue on Spatial Database Systems, pp. 401 - 444 pioneering the field of Array Databases. Today, multi-dimensional arrays are also known as Data Cubes.
At TU Munich, in the EU funded basic research project RasDaMan, a first prototype was established, on top of the O2 object-oriented DBMS, and tested in Earth and Life science applications.{{cite web |url=http://cordis.europa.eu/result/rcn/20754_en.html |title=Raster data management in databases |website=Community Research and Development Information Service (CORDIS)}} Over further EU funded projects, this system was completed and extended to support relational DBMSs.
A dedicated research spin-off, rasdaman GmbH,{{cite web|url=http://www.rasdaman.com |title=rasdaman, the Big Data Analytics Server |publisher=Rasdaman.com |date= |accessdate=2022-09-11}} was established to give commercial support in addition to the research which subsequently has been continued at Jacobs University.{{cite web |url=http://www.rasdaman.com/News/archive.php |title=Rasdaman - the Agile Array Analytics Engine |website=www.rasdaman.com |access-date=15 January 2022 |archive-url=https://web.archive.org/web/20150924084739/http://www.rasdaman.com/News/archive.php |archive-date=24 September 2015 |url-status=dead}} Since then, both entities collaborate on the further development and use of the rasdaman technology.
Concepts
= Data model =
Based on an array algebraBaumann, P.: [https://doi.org/10.1007/3-540-48521-X_7 A Database Array Algebra for Spatio-Temporal Data and Beyond]. Proc. NGITS’99, LNCS 1649, Springer 1999, pp.76-93 specifically developed for database purposes, rasdaman adds a new attribute type, array, to the relational model. As this array definition is parametrized it constitutes a second-order construct or template; this fact is reflected by the second-order functionals in the algebra and query language.
For historical reasons, tables are called collections, as initial design emphasized an embedding into the object-oriented database standard, ODMG. Anticipating a full integration with SQL, rasdaman collections represent a binary relation with the first attribute being an object identifier and the second being the array. This allows the establishment of foreign key references between arrays and regular relational tuples.
= Raster Query Language =
The rasdaman query language, rasql, embeds itself into standard SQL and its set-oriented processing.
On the new attribute type, multi-dimensional arrays, a set of extra operations is provided which all are based on a minimal set of algebraically defined core operators, an array constructor (which establishes a new array and fills it with values) and an array condenser (which, similarly to SQL aggregates, derives scalar summary information from an array). The query language is declarative (and, hence, optimizable) and safe in evaluation - that is: every query is guaranteed to return after a finite number of processing steps.
The rasql query guiden.n.: [http://doc.rasdaman.org Rasdaman Query Language Guide] provides details, here some examples may illustrate its use:
- "From all 4-D x/y/z/t climate simulation data cubes, a cutout which contains all in x, a y extract between 100 and 200, all available along z, and a slice at position 42 (effectively resulting in a 3-D x/y/z cube)":
select c[ *:*, 100:200, *:*, 42 ]
from ClimateSimulations as c
- "In all Landsat satellite images, suppress all non-green areas":
select img * (img.green > 130)
from LandsatArchive as img
Note: this is a very naive phrasing of vegetation search; in practice one would use the NDVI formula, use null values for cloud masking, and several more techniques.
- "All MRI images where, in some region defined by the bit masks, intensity exceeds a threshold of 250":
select img
from MRI as img, Masks as m
where some_cells( img > 250 and m )
- "A 2-D x/y slice from all 4-D climate simulation data cubes, each one encoded in PNG format":
select png( c[ *:*, *:*, 100, 42 ] )
from ClimateSimulations as c
Architecture
= Storage management =
Image:Sample tiling of an array for storage in rasdaman.png
Raster objects are partitioned into tiles.
Furtado, P., Baumann, P.: [https://www.informatik.uni-trier.de/~ley/db/conf/icde/icde99.html#FurtadoB99 Storage of Multidimensional Arrays based on Arbitrary Tiling]. Proc. ICDE'99, March 23–26, 1999, Sydney, Australia, pp. 328-336 Aside from a regular subdivision, any user or system generated partitioning is possible. As tiles form the unit of disk access, it is important that the tiling pattern is adjusted to the query access patterns; several tiling strategies assist in establishing a well-performing tiling. A geo index is employed to quickly determine the tiles affected by a query. Optionally, tiles are compressed using one of various choices,A. Dehmel: A Compression Engine for Multidimensional Array Database Systems. PhD Thesis, TU Munich, 2001 including lossless and lossy (wavelet) algorithms; independently from that, query results can be compressed for transfer to the client. Both tiling strategy and compression comprise database tuning parameters.
Tiles and tile index are stored on disk in the rasdaman database,P. Furtado: Storage Management of Multidimensional Arrays in Database Management Systems. PhD Thesis, TU Munich, 2000 together with the data dictionary needed by rasdaman's dynamic type system. For arrays larger than disk space, hierarchical storage management (HSM) support has been developed.B. Reiner et al: Hierarchical Storage Support and Management for
Large-Scale Multidimensional Array Database Management Systems. Proc. DEXA, Aix-en-Provence, France, September 2-6, 2002, Springer LNCS 2453, pp. 689 -700, doi: 10.1007/3-540-46146-9_68
= Query processing =
Query execution is based on tile streaming.N. Widmann: Efficient Operation Execution on Multidimensional Array Data. PhD Thesis, TU Munich, 1999 Execution follows a tile streaming paradigm: whenever possible, array tiles addressed by a query are fetched sequentially, and each tile is discarded after processing. This leads to an architecture scalable to data volumes exceeding server main memory by orders of magnitude.
Queries undergo heavy optimization.R. Ritsch: Optimization and Evaluation of Array Queries in Database Management Systems. PhD Thesis, TU Munich, 1999 The server applies algebraic (heuristic) optimisation rules to the query tree where applicable; of the 150 algebraic rewriting rules, 110 are actually optimising while the other 40 serve to transform the query into canonical form. Further, cost-based optimization is applied. Parsing and optimization together take less than a millisecond on a laptop.
Further, queries get parallelized.K. Hahn: Parallele Anfrageverarbeitung in multidimensionalen Array-Datenbanksystemen. PhD Thesis, TU Munich, 2003 Rasdaman offers inter-query parallelism (a dispatcher schedules requests into a pool of server processes on a per-transaction basis) and intra-query parallelism (transparent distribution of query subtrees across available cores, GPUs, or cloud nodes).
= Client APIs =
The primary interface to rasdaman is the query language. Embeddings into C++ and Java APIs allow invocation of queries, as well as client-side convenience functions for array handling. Arrays per se are delivered in the main memory format of the client language and processor architecture, ready for further processing. Data format codecs allow to retrieve arrays in common raster formats, such as CSV, PNG, and NetCDF.
A Web design toolkit, raswct, is provided which makes the creation of Web query frontends easy, including graphical widgets for parametrized query handling, such as sliders for thresholds in queries.
= Geo Web Services =
Status and license model
Today, rasdaman is a fully-fledged implementation offering select / insert / update / delete array query functionality. It is being used in both research and commercial installations.
In a collaboration of the original code owner, rasdaman GmbH and Jacobs University, a code split was performed in 2008 - 2009 resulting in rasdaman community,{{cite web|url=http://www.rasdaman.org |title=rasdaman |publisher=rasdaman |date=2022-02-28 |accessdate=2022-09-11}} an open-source branch, and rasdaman enterprise, the commercial branch. Since then, rasdaman community is being maintained by Jacobs University whereas rasdaman enterprise remains proprietary to rasdaman GmbH.
The difference between both variants mainly consists of performance boosters (such as specific optimization techniques) intended to support particularly large databases, user numbers, and complex queries; Details are available on the rasdaman community website.[http://rasdaman.org/wiki/License rasdaman license model]
The rasdaman community license releases the server in GPL and all client parts in LGPL, thereby allowing the use of the system in any kind of license environment.
Impact
Being the first Array DBMS shipped (first prototype available in 1996), rasdaman has shaped this recent database research domain. Concepts of the data and query model (declarativeness, sometimes choice of operators) find themselves in more recent approaches. A deep comparison of Array DBMSs and related technology has been performed by the Research Data Alliance in 2018.Baumann, P.: [https://www.rd-alliance.org/app/uploads/2018/03/Array-Databases_final-report.pdf Array Databases: Concepts, Standards, Implementations]. Research Data Alliance, Array Database Assessment Working Group, DOI: https://dx.doi.org/10.15497/RDA00024
= Standards =
In 2008, the Open Geospatial Consortium released the Web Coverage Processing Service standard which defines a raster (often called "datacube") query language based on the concept of a coverage. Operator semanticsBaumann, P.: [https://archive.today/20130203043448/http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s10707-009-0087-2 The OGC Web Coverage Processing Service (WCPS) Standard]. Geoinformatica, 14(4)2010, pp. 447-479 is influenced by the rasdaman array algebra.
In 2016, INSPIRE (Legal Framework for Spatial Information in Europe {{cite web|url=https://knowledge-base.inspire.ec.europa.eu/index_en |title=EU INSPIRE |publisher=EU |date=2022-02-28 |accessdate=2024-07-18}}) adopted WCPS as optional component of INSPIRE-WCS.{{cite web | url = https://knowledge-base.inspire.ec.europa.eu/publications/technical-guidance-implementation-inspire-download-services-using-web-coverage-services-wcs_en | title = Technical Guidance for the implementation of INSPIRE Download Services using Web Coverage Services (WCS) | date = 2016-12-16}}
In 2023, WCPS has been adopted by ISO TC211 as ISO 19123-3:2023.{{cite web | url = https://committee.iso.org/sites/tc211/home/projects/projects---complete-list/iso-19123-3.html | title = ISO 19123-3:2023: Geographic information - Schema for coverage geometry and functions - Part 3: Processing fundamentals | date = 2023-06-21}}
In 2024, OGC adopted the same specification as Abstract Topic 6.3.{{cite web | url = https://docs.ogc.org/as/21-060r2/21-060r2.pdf | title = 2024-07-05 | date = 2024-07-05}}
In 2019, ISO adopted the rasql array query language as {{cite web | url = https://www.iso.org/standard/84807.html | title = ISO 9075-15:2023: Information technology – Database languages – SQL – Part 15: Multi-Dimensional Arrays (MDA) | date = 2023-06-01}} to the SQL standard, with only minor syntactic adjustments to SQL.
= Project Use =
Two selected projects may illustrate use of rasdaman in geo services.
The open Earth Datacube Playground {{cite web|url=https://standards.rasdaman.com/ |title=Earth Datacube Playground |publisher=rasdaman |date= |accessdate=2024-07-18}} is a showcase for actionable geo datacubes, offering 1-D through 4-D use cases of raster data access and ad-hoc processing. The showcase is built using rasdaman.
EarthServer{{cite web|url=https://www.earthserver.eu |title=EarthServer |publisher=Earthserver.eu |date= |accessdate=2022-09-11}} is a global federation of independent datacube providers. The combined datacube offerings are made available as a single, homogenized datacube space. Access is completely location transparent: any node can receive a query and answer it, regardless on which federation node the data sit; this includes automatic distributed data fusion. Participation is free for datacube contributors.