Searching and filtering

Introduction

Arraylake allows you to search through a repository using metadata present in the Zarr attrs property. A recommended workflow is to:

Use the Repo.tree visualization to iterate on your filter query, and then
Use Repo.filter_metadata to obtain a list of Zarr groups that can read in to an Xarray dataset.

Both methods use JMESpath, pronounced "james path", a rich query language for JSON to express queries. Arraylake only supports the filtering functionality, not aggregation (or projection). For more, see the JMESpath tutorial.

Setup

import arraylake as al
import numpy as np
import xarray as xr
import zarr

client = al.Client()

An example climate model dataset

This example dataset is loosely inspired by the CMIP datasets, though we use longer names to keep the examples readable.

The hierarchy looks like: project/model_name/experiment_name/data_stream/spatial_grid/variable_name. For each model name, we will assign variables and appropriate CF attributes.

important

Arraylake does not assume or impose any particular convention for the attributes set on the Zarr array or Zarr groups. Instead it lets you use JMESpath, a query language for JSON, to allow a range of simple to complex queries on the attributes.

We begin by using the zarr library to create a nested tree of arrays with attributes that we will later be able to search over.

caution

You'll need to change the organization name 👇 from earthmover to your-org-name.

cmip_repo = client.get_or_create_repo("earthmover/search-cmip-like")

root = cmip_repo.root_group
varnames = {
    "atm": ["pr", "co2", "tas"],
    "land": ["rootd", "tasmin", "tasmax"],
}

attrs = {
    "pr": {"standard_name": "precipitation_flux"},
    "co2": {"standard_name": "mole_fraction_of_carbon_dioxide_in_air"},
    "tas": {"standard_name": "air_temperature", "cell_methods": "time:mean"},
    "tasmin": {"standard_name": "air_temperature", "cell_methods": "time:min"},
    "tasmax": {"standard_name": "air_temperature", "cell_methods": "time:max"},
    "rootd": {"standard_name": "root_depth", "units": "m"},
}

for mip in ["CMIP", "ScenarioMIP"]:
    for model in ["model1", "model2"]:
        for experiment_id in ["historical"]:
            for stream in ["atm_daily", "land_daily", "land_monthly"]:
                if mip == "ScenarioMIP" and stream != "atm_daily":
                    continue
                for grid_id in ["native", "latlon"]:
                    if grid_id == "latlon" and model == "model2":
                        continue
                    frequency = "mon" if "mon" in stream else "day"
                    component, _ = stream.split("_")
                    path = f"{mip}/{model}/{experiment_id}/{stream}/{grid_id}"
                    group = root.create_group(path, overwrite=True)

                    for variable in varnames[component]:
                        path = f"{mip}/{model}/{experiment_id}/{stream}/{grid_id}/{variable}"
                        array = root.create_dataset(
                            path,
                            shape=(4, 64, 128),
                            overwrite=True,
                            fill_value=np.nan,
                            dtype=np.float64,
                        )
                        if variable in attrs:
                            array.attrs.update(attrs[variable])
                            array.attrs.update(
                                {
                                    "frequency": frequency,
                                    "grid": grid_id,
                                    "experiment_id": experiment_id,
                                    "_ARRAY_DIMENSIONS": ["time", "nlat", "nlon"]
                                    if grid_id == "native"
                                    else ["time", "latitude", "longitude"],
                                }
                            )
repo.commit("demo dataset commit")

Here is the full tree

cmip_repo.tree()

Output
/
├── 📁 CMIP
│   ├── 📁 model1
│   │   └── 📁 historical
│   │       ├── 📁 land_daily
│   │       │   ├── 📁 latlon
│   │       │   │   ├── 🇦 rootd (4, 64, 128) float64
│   │       │   │   ├── 🇦 tasmax (4, 64, 128) float64
│   │       │   │   └── 🇦 tasmin (4, 64, 128) float64
│   │       │   └── 📁 native
│   │       │       ├── 🇦 tasmin (4, 64, 128) float64
│   │       │       ├── 🇦 rootd (4, 64, 128) float64
│   │       │       └── 🇦 tasmax (4, 64, 128) float64
│   │       ├── 📁 atm_daily
│   │       │   ├── 📁 latlon
│   │       │   │   ├── 🇦 tas (4, 64, 128) float64
│   │       │   │   ├── 🇦 co2 (4, 64, 128) float64
│   │       │   │   └── 🇦 pr (4, 64, 128) float64
│   │       │   └── 📁 native
│   │       │       ├── 🇦 co2 (4, 64, 128) float64
│   │       │       ├── 🇦 tas (4, 64, 128) float64
│   │       │       └── 🇦 pr (4, 64, 128) float64
│   │       └── 📁 land_monthly
│   │           ├── 📁 latlon
│   │           │   ├── 🇦 rootd (4, 64, 128) float64
│   │           │   ├── 🇦 tasmin (4, 64, 128) float64
│   │           │   └── 🇦 tasmax (4, 64, 128) float64
│   │           └── 📁 native
│   │               ├── 🇦 tasmax (4, 64, 128) float64
│   │               ├── 🇦 tasmin (4, 64, 128) float64
│   │               └── 🇦 rootd (4, 64, 128) float64
│   └── 📁 model2
│       └── 📁 historical
│           ├── 📁 land_daily
│           │   └── 📁 native
│           │       ├── 🇦 tasmax (4, 64, 128) float64
│           │       ├── 🇦 tasmin (4, 64, 128) float64
│           │       └── 🇦 rootd (4, 64, 128) float64
│           ├── 📁 atm_daily
│           │   └── 📁 native
│           │       ├── 🇦 co2 (4, 64, 128) float64
│           │       ├── 🇦 tas (4, 64, 128) float64
│           │       └── 🇦 pr (4, 64, 128) float64
│           └── 📁 land_monthly
│               └── 📁 native
│                   ├── 🇦 tasmax (4, 64, 128) float64
│                   ├── 🇦 rootd (4, 64, 128) float64
│                   └── 🇦 tasmin (4, 64, 128) float64
└── 📁 ScenarioMIP
    ├── 📁 model1
    │   └── 📁 historical
    │       └── 📁 atm_daily
    │           ├── 📁 native
    │           │   ├── 🇦 co2 (4, 64, 128) float64
    │           │   ├── 🇦 tas (4, 64, 128) float64
    │           │   └── 🇦 pr (4, 64, 128) float64
    │           └── 📁 latlon
    │               ├── 🇦 pr (4, 64, 128) float64
    │               ├── 🇦 tas (4, 64, 128) float64
    │               └── 🇦 co2 (4, 64, 128) float64
    └── 📁 model2
        └── 📁 historical
            └── 📁 atm_daily
                └── 📁 native
                    ├── 🇦 co2 (4, 64, 128) float64
                    ├── 🇦 tas (4, 64, 128) float64
                    └── 🇦 pr (4, 64, 128) float64

tip

There is no order to the nodes in the tree.
The tree is a lot nicer to navigate with ipytree installed.

We see a mix of variables from the land, ocean, and atmosphere component models.

Matching the value of a single attribute

Now lets filter to only include variables with the CF attribute standard_name of "air_temperature".

cmip_repo.tree(filter="standard_name == 'air_temperature'")

Output
/
├── 📁 CMIP
│   ├── 📁 model1
│   │   └── 📁 historical
│   │       ├── 📁 atm_daily
│   │       │   ├── 📁 native
│   │       │   │   └── 🇦 tas (4, 64, 128) float64
│   │       │   └── 📁 latlon
│   │       │       └── 🇦 tas (4, 64, 128) float64
│   │       ├── 📁 land_daily
│   │       │   ├── 📁 latlon
│   │       │   │   ├── 🇦 tasmax (4, 64, 128) float64
│   │       │   │   └── 🇦 tasmin (4, 64, 128) float64
│   │       │   └── 📁 native
│   │       │       ├── 🇦 tasmin (4, 64, 128) float64
│   │       │       └── 🇦 tasmax (4, 64, 128) float64
│   │       └── 📁 land_monthly
│   │           ├── 📁 latlon
│   │           │   ├── 🇦 tasmin (4, 64, 128) float64
│   │           │   └── 🇦 tasmax (4, 64, 128) float64
│   │           └── 📁 native
│   │               ├── 🇦 tasmax (4, 64, 128) float64
│   │               └── 🇦 tasmin (4, 64, 128) float64
│   └── 📁 model2
│       └── 📁 historical
│           ├── 📁 land_daily
│           │   └── 📁 native
│           │       ├── 🇦 tasmax (4, 64, 128) float64
│           │       └── 🇦 tasmin (4, 64, 128) float64
│           ├── 📁 land_monthly
│           │   └── 📁 native
│           │       ├── 🇦 tasmax (4, 64, 128) float64
│           │       └── 🇦 tasmin (4, 64, 128) float64
│           └── 📁 atm_daily
│               └── 📁 native
│                   └── 🇦 tas (4, 64, 128) float64
└── 📁 ScenarioMIP
    ├── 📁 model1
    │   └── 📁 historical
    │       └── 📁 atm_daily
    │           ├── 📁 native
    │           │   └── 🇦 tas (4, 64, 128) float64
    │           └── 📁 latlon
    │               └── 🇦 tas (4, 64, 128) float64
    └── 📁 model2
        └── 📁 historical
            └── 📁 atm_daily
                └── 📁 native
                    └── 🇦 tas (4, 64, 128) float64

Very nice! We get back only zarr groups that contain the rootd array. Let's further filter to select only the daily frequency output. The CMIP (CMOR) convention is to specify this by setting the frequency attribute to 'day'

important

It is safer to use backticks (`), that is specify literal values, for comparisons. While single quotes conveniently work for equality comparisons to strings, they will not work for comparisons with other data types. So repo.tree(filter="standard_name == 'root_depth'") will work but is not recommended.

cmip_repo.tree(
    filter="standard_name == `air_temperature` && frequency == `day` && grid==`native`"
)

Output
/
├── 📁 CMIP
│   ├── 📁 model1
│   │   └── 📁 historical
│   │       ├── 📁 atm_daily
│   │       │   └── 📁 native
│   │       │       └── 🇦 tas (4, 64, 128) float64
│   │       └── 📁 land_daily
│   │           └── 📁 native
│   │               ├── 🇦 tasmin (4, 64, 128) float64
│   │               └── 🇦 tasmax (4, 64, 128) float64
│   └── 📁 model2
│       └── 📁 historical
│           ├── 📁 land_daily
│           │   └── 📁 native
│           │       ├── 🇦 tasmax (4, 64, 128) float64
│           │       └── 🇦 tasmin (4, 64, 128) float64
│           └── 📁 atm_daily
│               └── 📁 native
│                   └── 🇦 tas (4, 64, 128) float64
└── 📁 ScenarioMIP
    ├── 📁 model1
    │   └── 📁 historical
    │       └── 📁 atm_daily
    │           └── 📁 native
    │               └── 🇦 tas (4, 64, 128) float64
    └── 📁 model2
        └── 📁 historical
            └── 📁 atm_daily
                └── 📁 native
                    └── 🇦 tas (4, 64, 128) float64

From tree to Xarray

Now lets read that to an Xarray object. Use repo.filter_metadata to view the tree as a list of paths

tip

The results are unsorted!

results = cmip_repo.filter_metadata(
    filter="standard_name == `air_temperature` && frequency == 'day' && grid=='native' && experiment_id=='historical'"
)
results

Output
['CMIP/model1/historical/land_daily/native/tasmax',
 'CMIP/model2/historical/land_daily/native/tasmin',
 'ScenarioMIP/model2/historical/atm_daily/native/tas',
 'CMIP/model1/historical/land_daily/native/tasmin',
 'CMIP/model2/historical/atm_daily/native/tas',
 'CMIP/model1/historical/atm_daily/native/tas',
 'CMIP/model2/historical/land_daily/native/tasmax',
 'ScenarioMIP/model1/historical/atm_daily/native/tas']

One way to work with these results is to read the paths to a single dataset each

datasets = {}
for full_path in results:
    group, array = full_path.rsplit("/", maxsplit=1)
    da = cmip_repo.to_xarray(group=group)[array]
    datasets[group] = da
datasets

Output
{'CMIP/model1/historical/land_daily/native': <xarray.DataArray 'tasmin' (time: 4, nlat: 64, nlon: 128)>
 [32768 values with dtype=float64]
 Dimensions without coordinates: time, nlat, nlon
 Attributes:
     cell_methods:   time:min
     experiment_id:  historical
     frequency:      day
     grid:           native
     standard_name:  air_temperature,
 'CMIP/model2/historical/land_daily/native': <xarray.DataArray 'tasmax' (time: 4, nlat: 64, nlon: 128)>
 [32768 values with dtype=float64]
 Dimensions without coordinates: time, nlat, nlon
 Attributes:
     cell_methods:   time:max
     experiment_id:  historical
     frequency:      day
     grid:           native
     standard_name:  air_temperature,
 'ScenarioMIP/model2/historical/atm_daily/native': <xarray.DataArray 'tas' (time: 4, nlat: 64, nlon: 128)>
 [32768 values with dtype=float64]
 Dimensions without coordinates: time, nlat, nlon
 Attributes:
     cell_methods:   time:mean
     experiment_id:  historical
     frequency:      day
     grid:           native
     standard_name:  air_temperature,
 'CMIP/model2/historical/atm_daily/native': <xarray.DataArray 'tas' (time: 4, nlat: 64, nlon: 128)>
 [32768 values with dtype=float64]
 Dimensions without coordinates: time, nlat, nlon
 Attributes:
     cell_methods:   time:mean
     experiment_id:  historical
     frequency:      day
     grid:           native
     standard_name:  air_temperature,
 'CMIP/model1/historical/atm_daily/native': <xarray.DataArray 'tas' (time: 4, nlat: 64, nlon: 128)>
 [32768 values with dtype=float64]
 Dimensions without coordinates: time, nlat, nlon
 Attributes:
     cell_methods:   time:mean
     experiment_id:  historical
     frequency:      day
     grid:           native
     standard_name:  air_temperature,
 'ScenarioMIP/model1/historical/atm_daily/native': <xarray.DataArray 'tas' (time: 4, nlat: 64, nlon: 128)>
 [32768 values with dtype=float64]
 Dimensions without coordinates: time, nlat, nlon
 Attributes:
     cell_methods:   time:mean
     experiment_id:  historical
     frequency:      day
     grid:           native
     standard_name:  air_temperature}

The resulting dictionary of DataArrays can be manipulated to a more useful form using xarray's combining functions, or the excellent, but experimental, xarray-datatree library.

An example STAC-like dataset

We now introduce a second dataset, inspired by STAC (SpatioTemporal Asset Catalogs).

stac_repo = client.get_or_create_repo("earthmover/search-stac-like")
stac_repo

Output
<arraylake.repo.Repo 'earthmover/search-stac-like'>

for itime in range(1, 6):
    group = stac_repo.root_group.create_group(f"staclike/sensor/time{itime}", overwrite=True)
    group.attrs.update(
        {"created:at:time": f"2022-05-{itime:02d}", "timestamp_number": itime}
    )
    for band in ["nir", "blue"]:
        array = group.create_dataset(
            band,
            shape=(1, 64, 128),
            overwrite=True,
            fill_value=np.nan,
            dtype=np.float64,
        )
        array.attrs.update({"_ARRAY_DIMENSIONS": ["time", "y", "x"]})
stac_repo.commit("added STAC-like dataset")

stac_repo.tree()

Output
/
└── 📁 staclike
    └── 📁 sensor
        ├── 📁 time4
        │   ├── 🇦 blue (1, 64, 128) float64
        │   └── 🇦 nir (1, 64, 128) float64
        ├── 📁 time5
        │   ├── 🇦 blue (1, 64, 128) float64
        │   └── 🇦 nir (1, 64, 128) float64
        ├── 📁 time1
        │   ├── 🇦 blue (1, 64, 128) float64
        │   └── 🇦 nir (1, 64, 128) float64
        ├── 📁 time2
        │   ├── 🇦 blue (1, 64, 128) float64
        │   └── 🇦 nir (1, 64, 128) float64
        └── 📁 time3
            ├── 🇦 blue (1, 64, 128) float64
            └── 🇦 nir (1, 64, 128) float64

Comparing values

Comparisons against literal values that are not strings are allowed. As earlier, use backticks (`) to specify literal values for comparisons.

For example, compare dates:

stac_repo.tree(filter='"created:at:time" <= `2022-05-03`')

Output
/
└── 📁 staclike
    └── 📁 sensor
        ├── 📁 time3
        ├── 📁 time2
        └── 📁 time1

info

JMESpath does not treat dates specially. The example above compares strings meaning the 'created:at:time' entry in the attribute dictionary must contain a string.

Compare to integers (again specify literals using \):

stac_repo.tree(filter='"timestamp_number" < `3`')

Output
/
└── 📁 staclike
    └── 📁 sensor
        ├── 📁 time2
        └── 📁 time1

Handling queries with no results

How about if there are no results for a query?

warning

The comparison of two missing keys is truthy! The following filter string will match all entries if 'foo' is not an attribute that exists.

filter='foo == "bar"'

stac_repo.tree(filter="foo == bar")

Output
/
└── 📁 staclike
    └── 📁 sensor
        ├── 📁 time3
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        ├── 📁 time2
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        ├── 📁 time4
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        ├── 📁 time5
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        └── 📁 time1
            ├── 🇦 blue (1, 64, 128) float64
            └── 🇦 nir (1, 64, 128) float64

Ouch! that's just the whole repository!

Use the contains function to check for the presence of the keys before asserting equality:

stac_repo.tree(filter='contains(keys(@), "foo") && foo == "bar"')

No results since none of our arrays have the attribute key "foo". This is much more sensible. For more, see the JMESpath docs on functions.

Advanced Examples

Inexact matches

Use the contains function to search for a particular substring in a value. Here we search for all air_temperature variables containing 'max' in the cell_methods attribute.

cmip_repo.tree(
    filter="contains(keys(@), `cell_methods`) && contains(cell_methods, 'max')"
)

Output
/
└── 📁 CMIP
    ├── 📁 model2
    │   └── 📁 historical
    │       ├── 📁 land_monthly
    │       │   └── 📁 native
    │       │       └── 🇦 tasmax (4, 64, 128) float64
    │       └── 📁 land_daily
    │           └── 📁 native
    │               └── 🇦 tasmax (4, 64, 128) float64
    └── 📁 model1
        └── 📁 historical
            ├── 📁 land_daily
            │   ├── 📁 latlon
            │   │   └── 🇦 tasmax (4, 64, 128) float64
            │   └── 📁 native
            │       └── 🇦 tasmax (4, 64, 128) float64
            └── 📁 land_monthly
                ├── 📁 latlon
                │   └── 🇦 tasmax (4, 64, 128) float64
                └── 📁 native
                    └── 🇦 tasmax (4, 64, 128) float64

Filtering arrays by dimension names

While Arraylake does not enforce a particular metadata convention, we can take advantage of conventions in Zarr. For example, dimension names are stored under the special key _ARRAY_DIMENSIONS.

This is a more complicated way of just getting back the variables on the latlon grid that is, we could juse use filter="grid_id == 'latlon'"

cmip_repo.tree(filter="_ARRAY_DIMENSIONS == ['time', 'latitude', 'longitude']")

Output
/
├── 📁 CMIP
│   └── 📁 model1
│       └── 📁 historical
│           ├── 📁 atm_daily
│           │   └── 📁 latlon
│           │       ├── 🇦 co2 (4, 64, 128) float64
│           │       ├── 🇦 tas (4, 64, 128) float64
│           │       └── 🇦 pr (4, 64, 128) float64
│           ├── 📁 land_daily
│           │   └── 📁 latlon
│           │       ├── 🇦 tasmax (4, 64, 128) float64
│           │       ├── 🇦 rootd (4, 64, 128) float64
│           │       └── 🇦 tasmin (4, 64, 128) float64
│           └── 📁 land_monthly
│               └── 📁 latlon
│                   ├── 🇦 tasmin (4, 64, 128) float64
│                   ├── 🇦 rootd (4, 64, 128) float64
│                   └── 🇦 tasmax (4, 64, 128) float64
└── 📁 ScenarioMIP
    └── 📁 model1
        └── 📁 historical
            └── 📁 atm_daily
                └── 📁 latlon
                    ├── 🇦 pr (4, 64, 128) float64
                    ├── 🇦 co2 (4, 64, 128) float64
                    └── 🇦 tas (4, 64, 128) float64

Search for entries in the _ARRAY_DIMENSIONS list by order:

stac_repo.tree(filter="_ARRAY_DIMENSIONS[0] == 'time'")  # just all the data

Output
/
└── 📁 staclike
    └── 📁 sensor
        ├── 📁 time2
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        ├── 📁 time3
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        ├── 📁 time4
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        ├── 📁 time5
        │   ├── 🇦 nir (1, 64, 128) float64
        │   └── 🇦 blue (1, 64, 128) float64
        └── 📁 time1
            ├── 🇦 blue (1, 64, 128) float64
            └── 🇦 nir (1, 64, 128) float64

We can check if an array's dimensions contain a specific dimension name:

cmip_repo.tree(filter="contains(_ARRAY_DIMENSIONS, 'time')")

Output
/
├── 📁 ScenarioMIP
│   ├── 📁 model1
│   │   └── 📁 historical
│   │       └── 📁 atm_daily
│   │           ├── 📁 latlon
│   │           │   ├── 🇦 tas (4, 64, 128) float64
│   │           │   ├── 🇦 pr (4, 64, 128) float64
│   │           │   └── 🇦 co2 (4, 64, 128) float64
│   │           └── 📁 native
│   │               ├── 🇦 tas (4, 64, 128) float64
│   │               ├── 🇦 co2 (4, 64, 128) float64
│   │               └── 🇦 pr (4, 64, 128) float64
│   └── 📁 model2
│       └── 📁 historical
│           └── 📁 atm_daily
│               └── 📁 native
│                   ├── 🇦 co2 (4, 64, 128) float64
│                   ├── 🇦 pr (4, 64, 128) float64
│                   └── 🇦 tas (4, 64, 128) float64
└── 📁 CMIP
    ├── 📁 model1
    │   └── 📁 historical
    │       ├── 📁 atm_daily
    │       │   ├── 📁 latlon
    │       │   │   ├── 🇦 tas (4, 64, 128) float64
    │       │   │   ├── 🇦 co2 (4, 64, 128) float64
    │       │   │   └── 🇦 pr (4, 64, 128) float64
    │       │   └── 📁 native
    │       │       ├── 🇦 pr (4, 64, 128) float64
    │       │       ├── 🇦 tas (4, 64, 128) float64
    │       │       └── 🇦 co2 (4, 64, 128) float64
    │       ├── 📁 land_daily
    │       │   ├── 📁 latlon
    │       │   │   ├── 🇦 rootd (4, 64, 128) float64
    │       │   │   ├── 🇦 tasmax (4, 64, 128) float64
    │       │   │   └── 🇦 tasmin (4, 64, 128) float64
    │       │   └── 📁 native
    │       │       ├── 🇦 tasmin (4, 64, 128) float64
    │       │       ├── 🇦 rootd (4, 64, 128) float64
    │       │       └── 🇦 tasmax (4, 64, 128) float64
    │       └── 📁 land_monthly
    │           ├── 📁 latlon
    │           │   ├── 🇦 rootd (4, 64, 128) float64
    │           │   ├── 🇦 tasmin (4, 64, 128) float64
    │           │   └── 🇦 tasmax (4, 64, 128) float64
    │           └── 📁 native
    │               ├── 🇦 tasmax (4, 64, 128) float64
    │               ├── 🇦 rootd (4, 64, 128) float64
    │               └── 🇦 tasmin (4, 64, 128) float64
    └── 📁 model2
        └── 📁 historical
            ├── 📁 atm_daily
            │   └── 📁 native
            │       ├── 🇦 co2 (4, 64, 128) float64
            │       ├── 🇦 tas (4, 64, 128) float64
            │       └── 🇦 pr (4, 64, 128) float64
            ├── 📁 land_daily
            │   └── 📁 native
            │       ├── 🇦 tasmax (4, 64, 128) float64
            │       ├── 🇦 tasmin (4, 64, 128) float64
            │       └── 🇦 rootd (4, 64, 128) float64
            └── 📁 land_monthly
                └── 📁 native
                    ├── 🇦 tasmax (4, 64, 128) float64
                    ├── 🇦 rootd (4, 64, 128) float64
                    └── 🇦 tasmin (4, 64, 128) float64

Recommendations

Handle special characters, for example:, by quoting them with double quotes. For example
```
repo.filter_metadata("'created:at:time' <= `2022-05-03`")
```
will not return any results but will not raise an error.
NaNs are strings, so NaN comparisons should use raw strings with single quotes
```
     "someKey == 'NaN'"
```
The following will not match NaN values:
```
    "someNaN == NaN"
    "someNaN == `NaN`"
```
The comparison of two missing keys is truthy! The following filter string will match all entries if 'foo' is not an attribute that exists.
```
    filter='foo == "bar"'
```

Not supported at the moment

Filtering by group name or array name is not supported at the moment.
It is also not possible to limit results to only arrays or only groups at the moment.

Introduction​

Setup​

An example climate model dataset​

Matching the value of a single attribute​

From tree to Xarray​

An example STAC-like dataset​

Comparing values​

Handling queries with no results​

Advanced Examples​

Inexact matches​

Filtering arrays by dimension names​

Recommendations​

Not supported at the moment​