Client Library Installation and Configuration
In order to use Arraylake from your data science environment, you have to install the Python client library.
Installing the Client Package
The Arraylake Client supports Python versions 3.10 - 3.12.
To install the minimal version of the client, run
pip install "arraylake[icechunk]"
in your Python environment of choice.
There are several optional extras you can install.
- The
virtual
extra includes the ability to create virtual datasets - The
xarray
extra includes a compatible version of Xarray and its dependencies - The
widgets
extra includes integrations with Ipywidgets (e.g.repo.tree()
)
Most users will probably want both, which can be installed via
pip install "arraylake[icechunk,xarray,virtual]"
The client can also be installed via Conda:
conda install --channel conda-forge arraylake icechunk
Dependencies
The table below outlines Arraylake's core and optional dependencies.
Arraylake Dependencies
Package | Versions | Notes |
---|---|---|
python | >=3.10,<3.13 | |
aiobotocore[boto3] | >=1.33.2,<2.0 | |
aioitertools | ^0.11.0 | |
boto3 | >=2.10.0,<3.0 | |
botocore | >=1.33.2,<2.0 | |
cachetools | >=5.3.2,<6.0 | |
cfgrib | ~=0.9 | Optional, required for virtual datasets (GRIB) |
cf_xarray | 0.10.4 | Optional, helfpul for xarray integration |
click | ~=8.1 | |
donfig | >=0.7,<1.0 | |
eccodes | >=2.37 | Optional, required for virtual datasets (GRIB) |
fsspec | >=2024.2.0 | |
gcsfs | >=2024.2.0 | |
h5py | Optional, required for virtual datasets (NetCDF4, HDF5) | |
httpx | >=0.23.0,<0.28 | |
humanize | ~4.9.0 | |
imagecodecs | Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB) | |
ipytree | ~=0.2.2 | Optional, required for Tree widget |
kerchunk | >=0.2.8, <0.3 | Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB) |
numcodecs | >=0.13.1 | |
numpy | ^1.23 | |
packaging | >=0.23 | |
pydantic[email] | >=2.3,<2.10 | |
python-dateutil | ^2.8 | |
rich | >=12.6,<14.0 | |
ruamel-yaml | ~=~0.17 | |
s3fs | >=2024.02.0 | |
structlog | ^24.1.0 | |
sqlitedict | ~=2.1 | |
tifffile | >=2023.2.27 | Optional, required for virtual datasets (TIFF) |
typer | >=0.6.1,<1.0 | |
xarray | >=v2024.10.0 | Optional, required for Xarray integration |
zarr | >=2.18, !=3.03, !=3.0.5 | |
uvloop | >=0.17,<1.0 | Optional, POSIX only |
Client Configuration
Client-side configuration options are used to customize client behavior. Most users won't have to modify these settings.
Setting Configuration Options
Configuration options can be set via CLI or Python API.
- CLI
- Python
# print your current configuration
arraylake config list
from arraylake import config
# print your current configuration
config.pprint()
Config options can also be temporarily overridden in the CLI and in Python:
- CLI
- Python
# temporarily set a config setting
arraylake --config service.ssl.verify=False repo create myorg/myrepo
from arraylake import config, Client
with config.set({"service.ssl.verify": False}):
client = Client()
client.create_repo("myorg/myrepo")
...
Configuration Reference
Field | Type | Example |
---|---|---|
service.uri | string | https://api.earthmover.io |
service.ssl.verify | bool | True |
service.ssl.cafile | string | /path/to/cert.pem |
user.diagnostics | bool | True |
service.uri
- The Earthmover backend service to talk to.service.ssl.verify
- Whether to verify SSL certificates for https requests.service.ssl.cafile
- Trusted certificates to use for SSL verification in PEM format.user-diagnostics
- The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected witharraylake --diagnostics
. To disable logging of user diagnostics, set this option toFalse
Deprecated Options
The following options are deprecated and should not be modified.
Deprecated Options
Field | Type | Example |
---|---|---|
server_managed_sessions | bool | True |
chunkstore.uri | string | s3://mychunkstore |
s3.endpoint_url | string | https://s3.wasabisys.com |
s3.anon | bool | True |
s3.verify | bool | True |
gs.project | string | my-gs-project |
gs.token | string | anon |
user.org | string | earthmover |
chunkstore.hash_method | string | hashlib.sha256 |
chunkstore.use_delegated_credentials | bool | True |
async.batch_size | int | 10 |
The following settings can now be set in Icechunk instead.
Field | Type | Example |
---|---|---|
chunkstore.inline_threshold_bytes | int | 512 |
chunkstore.unsafe_use_fill_value_for_missing_chunks | bool | False |
The following can now be set in Zarr instead.
Field | Type | Example |
---|---|---|
async.concurrency | int | 4 |