DAP2: OPeNDAP
DAP2 (Data Access Protocol version 2) is the core HTTP protocol used by OPeNDAP (Open-source Project for a Network Data Access Protocol) for data transport and access over HTTP. DAP is widely used in the scientific data community as a remote access protocol compatible with the NetCDF data model. DAP enables general-purpose remote access and subsetting of datasets and is widely supported by numerous geospatial and earth-science tools, including the NetCDF library itself.
References
These are links to the OPeNDAP standards and websites:
Arraylake implements the DAP2 standard.
Activating DAP2 for Arraylake Datasets
DAP2 can be activated using the Arraylake command line interface (CLI)
al compute enable {org} dap2
Arraylake currently supports enabling DAP2 on an organization-wide basis.
DAP2 URL Structure
DAP2 endpoints can be accessed via the following URL schema:
https://compute.earthmover.io/v1/services/dap2/{org}/{repo}/{branch|commit|tag}/{path/to/group}/opendap
Where
{org}
is the name of your Arraylake organization{repo}
is the name of the Repo{branch|commit|tag}
is the branch, commit, or tag within the Repo to use to fulfill the request{path/to/group}
is the path to group within the Repo that contains anxarray
Dataset
This URL will be called {base_url}
in the following examples.
All examples use the HTTP GET
protocol.
Integrating with Tools
The DAP2 protocol can be used by a variety of tools to subset and load datasets from Arraylake over HTTP.
Specific examples in this guide will use the Arraylake GFS Repo. The underlying GFS dataset has the following structure:
<xarray.Dataset> Size: 5TB
Dimensions: (longitude: 1440, latitude: 721, time: 736, step: 209)
Coordinates:
* latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
* step (step) timedelta64[ns] 2kB 00:00:00 01:00:00 ... 16 days 00:00:00
* time (time) datetime64[ns] 6kB 2024-05-12T18:00:00 ... 2024-11-12T1...
Data variables:
gust (longitude, latitude, time, step) float32 639GB ...
prate (longitude, latitude, time, step) float32 639GB ...
r2 (longitude, latitude, time, step) float32 639GB ...
t2m (longitude, latitude, time, step) float32 639GB ...
tcc (longitude, latitude, time, step) float32 639GB ...
Attributes:
description: GFS data ingested for forecasting demo
NetCDF Library
The NetCDF C library has built-in support for DAP2. Any application which uses the NetCDF C library should be able to connect to this service.
For example, the ncdump command line utility can be used to extract and export data.
ncdump -h https://compute.earthmover.io/v1/services/dap2/earthmover-demos/gfs/main/solar/opendap
Subsetting and Exporting
The nco
toolkit is a suite of tools for manipulating and analyzing data stored in netCDF format. The ncks
(netCDF Kitchen Sink) tool can be used to download a subset of the GFS dataset from Arraylake in NetCDF format. For example, this command will download only the 2 Meter Temperature
variable at the first timestep of the first available model run, where the latitude is between 10 and 20 degrees, and where the longitude is between 30 and 40 degrees.
ncks -v t2m \
-d step,0 \
-d time,0 \
-d latitude,10.0,20.0 \
-d longitude,30.0,40.0 \
https://compute.earthmover.io/v1/services/dap2/earthmover-demos/gfs/main/solar/opendap \
subset.nc
When finished the resulting netcdf file is only 12 KB on disk! Taking a look with ncdump
we can see the coordinates of our downloaded subset and confirm they match our request:
ncump subset.nc -h -c
# netcdf output {
# dimensions:
# latitude = 41 ;
# longitude = 41 ;
# step = 1 ;
# time = 1 ;
# variables:
# ...
# float t2m(longitude, latitude, time, step) ;
# t2m:GRIB_NV = 0 ;
# t2m:GRIB_Nx = 1440 ;
# t2m:GRIB_Ny = 721 ;
# t2m:GRIB_cfName = "air_temperature" ;
# t2m:GRIB_cfVarName = "t2m" ;
#
# ...
# latitude = 20, 19.75, 19.5, 19.25, 19, 18.75, 18.5, 18.25, 18, 17.75, 17.5,
# 17.25, 17, 16.75, 16.5, 16.25, 16, 15.75, 15.5, 15.25, 15, 14.75, 14.5,
# 14.25, 14, 13.75, 13.5, 13.25, 13, 12.75, 12.5, 12.25, 12, 11.75, 11.5,
# 11.25, 11, 10.75, 10.5, 10.25, 10 ;
#
# longitude = 30, 30.25, 30.5, 30.75, 31, 31.25, 31.5, 31.75, 32, 32.25, 32.5,
# 32.75, 33, 33.25, 33.5, 33.75, 34, 34.25, 34.5, 34.75, 35, 35.25, 35.5,
# 35.75, 36, 36.25, 36.5, 36.75, 37, 37.25, 37.5, 37.75, 38, 38.25, 38.5,
# 38.75, 39, 39.25, 39.5, 39.75, 40 ;
#
# step = 0 ;
#
# time = 0 ;
Xarray
Xarray includes support for loading datasets via DAP. We can try it out using the GFS dataset above:
import xarray as xr
ds = xr.open_dataset("https://compute.earthmover.io/v1/services/dap2/earthmover-demos/gfs/main/solar/opendap")
ds
# <xarray.Dataset> Size: 5TB
# Dimensions: (step: 209, latitude: 721, longitude: 1440, time: 1138)
# Coordinates:
# * step (step) timedelta64[ns] 2kB 00:00:00 01:00:00 ... 16 days 00:00:00
# * latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
# * longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
# * time (time) datetime64[ns] 9kB 2024-05-12T18:00:00 ... 2025-02-21
# Data variables:
# prate (longitude, latitude, time, step) float32 988GB ...
# tcc (longitude, latitude, time, step) float32 988GB ...
# t2m (longitude, latitude, time, step) float32 988GB ...
# gust (longitude, latitude, time, step) float32 988GB ...
# r2 (longitude, latitude, time, step) float32 988GB ...
# Attributes:
# description: GFS data ingested for forecasting demo
This gives lazy read-only access to the dataset over HTTP, while still being able to use Xarray to perform analysis.