Skip to main content

Client Library Installation and Configuration

In order to use Arraylake from your data science environment, you have to install the Python client library.

Installing the Client Package​

info

The Arraylake Client supports Python versions 3.10 - 3.12.

To install the minimal version of the client, run

pip install "arraylake[icechunk]"

in your Python environment of choice.

There are several optional extras you can install.

  • The virtual extra includes the ability to create virtual datasets
  • The xarray extra includes a compatible version of Xarray and its dependencies
  • The widgets extra includes integrations with Ipywidgets (e.g. repo.tree())

Most users will probably want both, which can be installed via

pip install "arraylake[icechunk,xarray,virtual]"

The client can also be installed via Conda:

conda install --channel conda-forge arraylake icechunk

Dependencies​

The table below outlines Arraylake's core and optional dependencies.

Arraylake Dependencies
PackageVersionsNotes
python>=3.10,<3.13
aiobotocore[boto3]>=1.33.2,<2.0
aioitertools^0.11.0
boto3>=2.10.0,<3.0
botocore>=1.33.2,<2.0
cachetools>=5.3.2,<6.0
cfgrib~=0.9Optional, required for virtual datasets (GRIB)
click~=8.1
donfig>=0.7,<1.0
eccodes>=2.37Optional, required for virtual datasets (GRIB)
fsspec>=2024.2.0
gcsfs>=2024.2.0
h5pyOptional, required for virtual datasets (NetCDF4, HDF5)
httpx>=0.23.0,<0.28
humanize~4.9.0
imagecodecsOptional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB)
ipytree~=0.2.2Optional, required for Tree widget
kerchunk~=0.1,!=0.2.1Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB)
numcodecs>=0.13.1
numpy^1.23
packaging>=0.23
pydantic[email]>=2.3,<2.10
python-dateutil^2.8
rich>=12.6,<14.0
ruamel-yaml~=~0.17
s3fs>=2024.02.0
structlog^24.1.0
sqlitedict~=2.1
tifffile>=2023.2.27Optional, required for virtual datasets (TIFF)
typer>=0.6.1,<1.0
xarray>=v2022.12.0,<= 2024.09.0Optional, required for Xarray integration
zarr>=2.18,<3
uvloop>=0.17,<1.0Optional, POSIX only

Client Configuration​

Client-side configuration options are used to customize client behavior. Most users won't have to modify these settings.

Setting Configuration Options​

Configuration options can be set via CLI or Python API.

# print your current configuration
arraylake config list

Config options can also be temporarily overridden in the CLI and in Python:

# temporarily set a config setting
arraylake --config service.ssl.verify=False repo create myorg/myrepo

Configuration Reference​

FieldTypeExample
service.uristringhttps://api.earthmover.io
service.ssl.verifyboolTrue
service.ssl.cafilestring/path/to/cert.pem
user.diagnosticsboolTrue
  • service.uri - The Earthmover backend service to talk to.
  • service.ssl.verify - Whether to verify SSL certificates for https requests.
  • service.ssl.cafile - Trusted certificates to use for SSL verification in PEM format.
  • user-diagnostics - The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected with arraylake --diagnostics. To disable logging of user diagnostics, set this option to False

Deprecated Options​

The following options are deprecated and should not be modified.

Deprecated Options
FieldTypeExample
server_managed_sessionsboolTrue
chunkstore.uristrings3://mychunkstore
s3.endpoint_urlstringhttps://s3.wasabisys.com
s3.anonboolTrue
s3.verifyboolTrue
gs.projectstringmy-gs-project
gs.tokenstringanon
user.orgstringearthmover
chunkstore.hash_methodstringhashlib.sha256
chunkstore.use_delegated_credentialsboolTrue
async.batch_sizeint10

The following settings can now be set in Icechunk instead.

FieldTypeExample
chunkstore.inline_threshold_bytesint512
chunkstore.unsafe_use_fill_value_for_missing_chunksboolFalse

The following can now be set in Zarr instead.

FieldTypeExample
async.concurrencyint4