Skip to main content

Client Library Installation and Configuration

In order to use Arraylake from your data science environment, you have to install the Python client library.

Installing the Client Package

info

The Arraylake Client supports Python versions 3.10 - 3.12.

To install the minimal version of the client, run

pip install arraylake

in your Python environment of choice.

There are several optional extras you can install.

  • The virtual extra includes the ability to create virtual datasets
  • The xarray extra includes a compatible version of Xarray and its dependencies
  • The widgets extra includes integrations with Ipywidgets (e.g. repo.tree())

Most users will probably want both, which can be installed via

pip install "arraylake[xarray,virtual]"

The client can also be installed via Conda:

conda install --channel conda-forge arraylake

Dependencies

The table below outlines Arraylake's core and optional dependencies.

Arraylake Dependencies
PackageVersionsNotes
python>=3.10,<3.13
aiobotocore[boto3]>=1.33.2,<2.0
aioitertools^0.11.0
boto3>=2.10.0,<3.0
botocore>=1.33.2,<2.0
cachetools>=5.3.2,<6.0
cfgrib~=0.9Optional, required for virtual datasets (GRIB)
click~=8.1
donfig>=0.7,<1.0
eccodes>=2.37Optional, required for virtual datasets (GRIB)
fsspec>=2024.2.0
gcsfs>=2024.2.0
h5pyOptional, required for virtual datasets (NetCDF4, HDF5)
httpx>=0.23.0,<0.28
humanize~4.9.0
imagecodecsOptional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB)
ipytree~=0.2.2Optional, required for Tree widget
kerchunk~=0.1,!=0.2.1Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB)
numcodecs>=0.13.1
numpy^1.23
packaging>=0.23
pydantic[email]>=2.3,<2.10
python-dateutil^2.8
rich>=12.6,<14.0
ruamel-yaml~=~0.17
s3fs>=2024.02.0
structlog^24.1.0
sqlitedict~=2.1
tifffile>=2023.2.27Optional, required for virtual datasets (TIFF)
typer>=0.6.1,<1.0
xarray>=v2022.12.0,<= 2024.09.0Optional, required for Xarray integration
zarr>=2.18
uvloop>=0.17,<1.0Optional, POSIX only

Client Configuration

Client-side configuration options are used to customize client behavior. Most users won't have to modify these settings.

Setting Configuration Options

Configuration options can be set via CLI or Python API.

# set or update a configuration setting
arraylake config set chunkstore.inline_threshold_bytes 128

# get a configuration setting
arraylake config get chunkstore.inline_threshold_bytes

# print your current configuration
arraylake config list

Config options can also be temporarily overridden in the CLI and in Python:

# temporarily set a config setting
arraylake --config chunkstore.inline_threshold_bytes=128 repo create myorg/myrepo

Configuration Reference

FieldTypeExample
service.uristringhttps://api.earthmover.io
service.ssl.verifyboolTrue
service.ssl.cafilestring/path/to/cert.pem
chunkstore.hash_methodstringhashlib.sha256
chunkstore.inline_threshold_bytesint512
chunkstore.unsafe_use_fill_value_for_missing_chunksboolFalse
chunkstore.use_delegated_credentialsboolTrue
user.diagnosticsboolTrue
async.batch_sizeint10
async.concurrencyint4
  • service.uri - The Earthmover backend service to talk to.
  • service.ssl.verify - Whether to verify SSL certificates for https requests.
  • service.ssl.cafile - Trusted certificates to use for SSL verification in PEM format.
  • chunkstore.hash_method - How to compute hashes
  • chunkstore.inline_threshold_bytes - Chunks of size less than equal to this number of bytes are stored in the metastore rather than the chunkstore.
  • chunkstore.unsafe_use_fill_value_for_missing_chunks - An advanced feature and should be used with caution. This option disables errors that would be raised if an expected chunk is not found in the chunkstore. We recommend only using this when debugging an issue with your object store
  • chunkstore.use_delegated_credentials - Whether to activate Arraylake's credential delegation mechanism. See Storage for more details.
  • user-diagnostics - The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected with arraylake --diagnostics. To disable loggingof user diagnostics, set this option to False
  • async.batch_size and async.concurrency - Fine tuning for how data is transferred from storage to memory.

Deprecated Options

The following options are deprecated and should not be modified.

Deprecated Options
FieldTypeExample
server_managed_sessionsboolTrue
chunkstore.uristrings3://mychunkstore
s3.endpoint_urlstringhttps://s3.wasabisys.com
s3.anonboolTrue
s3.verifyboolTrue
gs.projectstringmy-gs-project
gs.tokenstringanon
user.orgstringearthmover