A comprehensive, systematic overview on datacube technologies and functionalities has published recently in the Springer Open Journal of Big Data. An unprecedented line-up of 19 different tools is scrutinized against 30+ criteria.
Performance figures have been obtained in a systematic, github-published benchmark comparing rasdaman, Open Data Cube, SciDB and PostGIS Raster. The result shows rasdaman can be 304x faster than other tools. This has been confirmed independently by Eurac Research with an average speedup of even 400x of rasdaman over Open Data Cube.
Past papers have already compared datacube models and formalisms, and benchmarks have been undertaken as well. Typically, however, they were rather constrained – only two systems are compared, and testing is mostly driven by cherry-picked examples rather than a systematic, justifiable methodology. Each of these represent valuable research; however, to the best of our knowledge there is no comprehensive survey combining model, access interfaces, architecture, practical usability, and performance evaluation. The size of this comparison differentiates the study as well with 19 systems compared, four benchmarked to an extent and depth clearly exceeding previous papers in the field; for example, subsetting tests were designed in a way that systems cannot be tuned to specifically these queries. It is hoped that this gives a representative overview to all who want to immerse into the field as well as a clear guidance to those who need to choose the best suited datacube tool for their application. The article presents results of the Research Data Alliance (RDA) Array Database Assessment Working Group (ADA:WG), a subgroup of the RDA Big Data Interest Group. It has elicited the state of the art in Array Databases and related technology, technically supported by IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and engineers benefit from Array Database technology, commonly called “datacube” engines?