Client Library Installation and Configuration
In order to use Arraylake from your data science environment, you have to install the Python client library.
Installing the Client Package​
The Arraylake Client supports Python versions 3.10 - 3.12.
To install the minimal version of the client, run
pip install "arraylake[icechunk]"
in your Python environment of choice.
There are several optional extras you can install.
- The
virtual
extra includes the ability to create virtual datasets - The
xarray
extra includes a compatible version of Xarray and its dependencies - The
widgets
extra includes integrations with Ipywidgets (e.g.repo.tree()
)
Most users will probably want both, which can be installed via
pip install "arraylake[icechunk,xarray,virtual]"
The client can also be installed via Conda:
conda install --channel conda-forge arraylake icechunk
Dependencies​
The table below outlines Arraylake's core and optional dependencies.
Arraylake Dependencies
Package | Versions | Notes |
---|---|---|
python | >=3.10,<3.13 | |
aiobotocore[boto3] | >=1.33.2,<2.0 | |
aioitertools | ^0.11.0 | |
boto3 | >=2.10.0,<3.0 | |
botocore | >=1.33.2,<2.0 | |
cachetools | >=5.3.2,<6.0 | |
cfgrib | ~=0.9 | Optional, required for virtual datasets (GRIB) |
click | ~=8.1 | |
donfig | >=0.7,<1.0 | |
eccodes | >=2.37 | Optional, required for virtual datasets (GRIB) |
fsspec | >=2024.2.0 | |
gcsfs | >=2024.2.0 | |
h5py | Optional, required for virtual datasets (NetCDF4, HDF5) | |
httpx | >=0.23.0,<0.28 | |
humanize | ~4.9.0 | |
imagecodecs | Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB) | |
ipytree | ~=0.2.2 | Optional, required for Tree widget |
kerchunk | ~=0.1,!=0.2.1 | Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB) |
numcodecs | >=0.13.1 | |
numpy | ^1.23 | |
packaging | >=0.23 | |
pydantic[email] | >=2.3,<2.10 | |
python-dateutil | ^2.8 | |
rich | >=12.6,<14.0 | |
ruamel-yaml | ~=~0.17 | |
s3fs | >=2024.02.0 | |
structlog | ^24.1.0 | |
sqlitedict | ~=2.1 | |
tifffile | >=2023.2.27 | Optional, required for virtual datasets (TIFF) |
typer | >=0.6.1,<1.0 | |
xarray | >=v2022.12.0,<= 2024.09.0 | Optional, required for Xarray integration |
zarr | >=2.18,<3 | |
uvloop | >=0.17,<1.0 | Optional, POSIX only |
Client Configuration​
Client-side configuration options are used to customize client behavior. Most users won't have to modify these settings.
Setting Configuration Options​
Configuration options can be set via CLI or Python API.
- CLI
- Python
# print your current configuration
arraylake config list
from arraylake import config
# print your current configuration
config.pprint()
Config options can also be temporarily overridden in the CLI and in Python:
- CLI
- Python
# temporarily set a config setting
arraylake --config service.ssl.verify=False repo create myorg/myrepo
from arraylake import config, Client
with config.set({"service.ssl.verify": False}):
client = Client()
client.create_repo("myorg/myrepo")
...
Configuration Reference​
Field | Type | Example |
---|---|---|
service.uri | string | https://api.earthmover.io |
service.ssl.verify | bool | True |
service.ssl.cafile | string | /path/to/cert.pem |
user.diagnostics | bool | True |
service.uri
- The Earthmover backend service to talk to.service.ssl.verify
- Whether to verify SSL certificates for https requests.service.ssl.cafile
- Trusted certificates to use for SSL verification in PEM format.user-diagnostics
- The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected witharraylake --diagnostics
. To disable logging of user diagnostics, set this option toFalse
Deprecated Options​
The following options are deprecated and should not be modified.
Deprecated Options
Field | Type | Example |
---|---|---|
server_managed_sessions | bool | True |
chunkstore.uri | string | s3://mychunkstore |
s3.endpoint_url | string | https://s3.wasabisys.com |
s3.anon | bool | True |
s3.verify | bool | True |
gs.project | string | my-gs-project |
gs.token | string | anon |
user.org | string | earthmover |
chunkstore.hash_method | string | hashlib.sha256 |
chunkstore.use_delegated_credentials | bool | True |
async.batch_size | int | 10 |
The following settings can now be set in Icechunk instead.
Field | Type | Example |
---|---|---|
chunkstore.inline_threshold_bytes | int | 512 |
chunkstore.unsafe_use_fill_value_for_missing_chunks | bool | False |
The following can now be set in Zarr instead.
Field | Type | Example |
---|---|---|
async.concurrency | int | 4 |