Client Library Installation and Configuration
In order to use Arraylake from your data science environment, you have to install the Python client library.
Installing the Client Package
The Arraylake Client supports Python versions 3.10 - 3.12.
To install the minimal version of the client, run
pip install arraylake
in your Python environment of choice.
There are several optional extras you can install.
- The
virtual
extra includes the ability to create virtual datasets - The
xarray
extra includes a compatible version of Xarray and its dependencies - The
widgets
extra includes integrations with Ipywidgets (e.g.repo.tree()
)
Most users will probably want both, which can be installed via
pip install "arraylake[xarray,virtual]"
The client can also be installed via Conda:
conda install --channel conda-forge arraylake
Dependencies
The table below outlines Arraylake's core and optional dependencies.
Arraylake Dependencies
Package | Versions | Notes |
---|---|---|
python | >=3.10,<3.13 | |
aiobotocore[boto3] | >=1.33.2,<2.0 | |
aioitertools | ^0.11.0 | |
boto3 | >=2.10.0,<3.0 | |
botocore | >=1.33.2,<2.0 | |
cachetools | >=5.3.2,<6.0 | |
cfgrib | ~=0.9 | Optional, required for virtual datasets (GRIB) |
click | ~=8.1 | |
donfig | >=0.7,<1.0 | |
eccodes | >=2.37 | Optional, required for virtual datasets (GRIB) |
fsspec | >=2024.2.0 | |
gcsfs | >=2024.2.0 | |
h5py | Optional, required for virtual datasets (NetCDF4, HDF5) | |
httpx | >=0.23.0,<0.28 | |
humanize | ~4.9.0 | |
imagecodecs | Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB) | |
ipytree | ~=0.2.2 | Optional, required for Tree widget |
kerchunk | ~=0.1,!=0.2.1 | Optional, required for virtual datasets (NetCDF, HDF5, TIFF, and GRIB) |
numcodecs | >=0.13.1 | |
numpy | ^1.23 | |
packaging | >=0.23 | |
pydantic[email] | >=2.3,<2.10 | |
python-dateutil | ^2.8 | |
rich | >=12.6,<14.0 | |
ruamel-yaml | ~=~0.17 | |
s3fs | >=2024.02.0 | |
structlog | ^24.1.0 | |
sqlitedict | ~=2.1 | |
tifffile | >=2023.2.27 | Optional, required for virtual datasets (TIFF) |
typer | >=0.6.1,<1.0 | |
xarray | >=v2022.12.0,<= 2024.09.0 | Optional, required for Xarray integration |
zarr | >=2.18 | |
uvloop | >=0.17,<1.0 | Optional, POSIX only |
Client Configuration
Client-side configuration options are used to customize client behavior. Most users won't have to modify these settings.
Setting Configuration Options
Configuration options can be set via CLI or Python API.
- CLI
- Python
# set or update a configuration setting
arraylake config set chunkstore.inline_threshold_bytes 128
# get a configuration setting
arraylake config get chunkstore.inline_threshold_bytes
# print your current configuration
arraylake config list
from arraylake import config
# set or update a configuration setting
config.set({"chunkstore.inline_threshold_bytes": 128})
# get a configuration setting
config.get("chunkstore.inline_threshold_bytes")
# print your current configuration
config.pprint()
Config options can also be temporarily overridden in the CLI and in Python:
- CLI
- Python
# temporarily set a config setting
arraylake --config chunkstore.inline_threshold_bytes=128 repo create myorg/myrepo
from arraylake import config, Client
with config.set({"chunkstore.inline_threshold_bytes": 128}):
client = Client()
client.create_repo("myorg/myrepo")
...
Configuration Reference
Field | Type | Example |
---|---|---|
service.uri | string | https://api.earthmover.io |
service.ssl.verify | bool | True |
service.ssl.cafile | string | /path/to/cert.pem |
chunkstore.hash_method | string | hashlib.sha256 |
chunkstore.inline_threshold_bytes | int | 512 |
chunkstore.unsafe_use_fill_value_for_missing_chunks | bool | False |
chunkstore.use_delegated_credentials | bool | True |
user.diagnostics | bool | True |
async.batch_size | int | 10 |
async.concurrency | int | 4 |
service.uri
- The Earthmover backend service to talk to.service.ssl.verify
- Whether to verify SSL certificates for https requests.service.ssl.cafile
- Trusted certificates to use for SSL verification in PEM format.chunkstore.hash_method
- How to compute hasheschunkstore.inline_threshold_bytes
- Chunks of size less than equal to this number of bytes are stored in the metastore rather than the chunkstore.chunkstore.unsafe_use_fill_value_for_missing_chunks
- An advanced feature and should be used with caution. This option disables errors that would be raised if an expected chunk is not found in the chunkstore. We recommend only using this when debugging an issue with your object storechunkstore.use_delegated_credentials
- Whether to activate Arraylake's credential delegation mechanism. See Storage for more details.user-diagnostics
- The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected witharraylake --diagnostics
. To disable loggingof user diagnostics, set this option toFalse
async.batch_size
andasync.concurrency
- Fine tuning for how data is transferred from storage to memory.
Deprecated Options
The following options are deprecated and should not be modified.
Deprecated Options
Field | Type | Example |
---|---|---|
server_managed_sessions | bool | True |
chunkstore.uri | string | s3://mychunkstore |
s3.endpoint_url | string | https://s3.wasabisys.com |
s3.anon | bool | True |
s3.verify | bool | True |
gs.project | string | my-gs-project |
gs.token | string | anon |
user.org | string | earthmover |