Skip to main content

Client Library Installation and Configuration

In order to use Arraylake from your data science environment, you have to install the Python client library.

Installing the Client Package

info

The Arraylake Client supports Python versions 3.10 - 3.12.

To install the minimal version of the client, run

pip install "arraylake[icechunk]"

in your Python environment of choice.

There are several optional extras you can install.

  • The virtual extra includes the ability to create virtual datasets
  • The xarray extra includes a compatible version of Xarray and its dependencies
  • The widgets extra includes integrations with Ipywidgets (e.g. repo.tree())

Most users will probably want both, which can be installed via

pip install "arraylake[icechunk,xarray,virtual]"

The client can also be installed via Conda:

conda install --channel conda-forge arraylake icechunk

Dependencies

The table below outlines Arraylake's core and optional dependencies.

Arraylake Dependencies
PackageVersionsNotes
python>=3.10,<3.13
aiobotocore[boto3]>=1.33.2,<2.0
aioitertools^0.11.0
boto3>=2.10.0,<3.0
botocore>=1.33.2,<2.0
cachetools>=5.3.2,<6.0
cfgrib~=0.9Optional, required for virtual datasets (GRIB)
cf_xarray0.10.4Optional, helfpul for xarray integration
click~=8.1
donfig>=0.7,<1.0
eccodes>=2.37Optional, required for virtual datasets (GRIB)
fsspec>=2024.2.0
gcsfs>=2024.2.0
h5pyOptional, required for virtual datasets (NetCDF4, HDF5)
httpx>=0.23.0,<0.28
humanize~4.9.0
imagecodecsOptional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB)
ipytree~=0.2.2Optional, required for Tree widget
kerchunk>=0.2.8, <0.3Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB)
numcodecs>=0.13.1
numpy^1.23
packaging>=0.23
pydantic[email]>=2.3,<2.10
python-dateutil^2.8
rich>=12.6,<14.0
ruamel-yaml~=~0.17
s3fs>=2024.02.0
structlog^24.1.0
sqlitedict~=2.1
tifffile>=2023.2.27Optional, required for virtual datasets (TIFF)
typer>=0.6.1,<1.0
xarray>=v2024.10.0Optional, required for Xarray integration
zarr>=2.18, !=3.03, !=3.0.5
uvloop>=0.17,<1.0Optional, POSIX only

Client Configuration

Client-side configuration options are used to customize client behavior. Most users won't have to modify these settings.

Setting Configuration Options

Configuration options can be set via CLI or Python API.

# print your current configuration
arraylake config list

Config options can also be temporarily overridden in the CLI and in Python:

# temporarily set a config setting
arraylake --config service.ssl.verify=False repo create myorg/myrepo

Configuration Reference

FieldTypeExample
service.uristringhttps://api.earthmover.io
service.ssl.verifyboolTrue
service.ssl.cafilestring/path/to/cert.pem
icechunk.scatter_initial_credentialsboolTrue
user.diagnosticsboolTrue
  • service.uri - The Earthmover backend service to talk to.

  • service.ssl.verify - Whether to verify SSL certificates for https requests.

  • service.ssl.cafile - Trusted certificates to use for SSL verification in PEM format.

  • icechunk.scatter_initial_credentials - If True, immediately fetch and cache a set of temporary credentials using Icechunk's get_credentials() function. This only applies to Icechunk repositories backed by object storage buckets that are configured to use a customer-managed IAM role.

    This option is helpful if you're going to pickle and distribute the Icechunk repository or session object (e.g., for parallel processing), as it avoids triggering fresh credential requests in each worker.

    warning

    If you enable this, the credentials will be stored in the object and may be included when the object is serialized (e.g., pickled). If that object is transmitted over the network or saved to disk, the credentials will go with it. Use with care in distributed settings.

    If not set in the runtime config, scatter_initial_credentials will be set to True. See the Icechunk docs for details.

  • user-diagnostics - The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected with arraylake --diagnostics. To disable logging of user diagnostics, set this option to False

Deprecated Options

The following options are deprecated and should not be modified.

Deprecated Options
FieldTypeExample
server_managed_sessionsboolTrue
chunkstore.uristrings3://mychunkstore
s3.endpoint_urlstringhttps://s3.wasabisys.com
s3.anonboolTrue
s3.verifyboolTrue
gs.projectstringmy-gs-project
gs.tokenstringanon
user.orgstringearthmover
chunkstore.hash_methodstringhashlib.sha256
chunkstore.use_delegated_credentialsboolTrue
async.batch_sizeint10

The following settings can now be set in Icechunk instead.

FieldTypeExample
chunkstore.inline_threshold_bytesint512
chunkstore.unsafe_use_fill_value_for_missing_chunksboolFalse

The following can now be set in Zarr instead.

FieldTypeExample
async.concurrencyint4