Zarr can represent very large array datasets in a simple, scalable way, and is compatible with cloud object storage – making it ideal for analysis-ready geospatial data.
29 June 2021: The Open Geospatial Consortium (OGC) seeks public comment on the draft Zarr Storage Specification 2.0 Community Standard. Comments are due by July 29, 2021.
Zarr is an open-source specification for the storage of multi-dimensional arrays of data (also known as data cubes, N-dimensional arrays, ND-arrays, or tensors). Such arrays are ubiquitous in scientific research and engineering.
Zarr stores metadata using .json text files and array data as (optionally) compressed binary chunks. Zarr can store data into most storage systems, including databases, standard ‘directory based’ file systems, and cloud object stores, such as Amazon S3. This flexibility allows implementations to experiment with novel storage technologies while maintaining a uniform API for downstream libraries and users.
Zarr arose in genomics research in 2016. It was created by Alistair Miles of Oxford University as a library optimized for massively parallel array analytics. It has since grown into a community project with a range of developers and users from fields such as genomics, bioimaging, astronomy, physics, quantitative finance, oceanography, atmospheric science, climate science, and geospatial imaging.
Because it can represent very large array datasets in a simple, scalable way, and is compatible with cloud object storage, Zarr is an ideal format for analysis-ready geospatial data in the cloud. Indeed, Zarr has already been adopted by several OGC communities as a format for cloud-optimized, analysis-ready geospatial data. Examples include:
- Climate Science: The CMIP6 Google Cloud Public Dataset
- Oceanography: The ECCOv4r3 Ocean State Estimate
- Atmospheric Science: Global cloud-resolving aquaplanet simulations with the System for Atmospheric Modeling
While Zarr is not inherently a geospatial-specific format, it has been put forward by the Zarr Steering Council for adoption as an OGC community standard because of its rapid growth and adoption in geospatial and related fields.
An approved OGC Community Standard is an official standard of OGC that is considered to be a widely used, mature specification, but was developed outside of OGC’s standards development and approval process. The originator of the standard brings to OGC a “snapshot” of their work that is then endorsed by OGC membership so that it can become part of the OGC Standards Baseline.
The candidate Zarr Storage Specification 2.0 Community Standard is available for review and comment on the OGC Portal. Comments are due by July 29, 2021, and should be submitted via the method outlined on the Zarr Storage Specification 2.0 Community Standard’s public comment request page.
The Open Geospatial Consortium (OGC) is an international consortium of more than 500 businesses, government agencies, research organizations, and universities driven to make geospatial (location) information and services FAIR – Findable, Accessible, Interoperable, and Reusable.
OGC’s member-driven consensus process creates royalty free, publicly available geospatial standards. Existing at the cutting edge, OGC actively analyzes and anticipates emerging tech trends, and runs an agile, collaborative Research and Development (R&D) lab that builds and tests innovative prototype solutions to members’ use cases.
OGC members together form a global forum of experts and communities that use location to connect people with technology and improve decision-making at all levels. OGC is committed to creating a sustainable future for us, our children, and future generations.
Visit ogc.org for more info on our work.