Configuration
First Time Use
Before using ArrayLake for the first time, you should set some basic configuration options via the CLI:
arraylake config init
This will walk you through the basic setup process and only needs to be run once.
arraylake config init
is only required for legacy Arraylake users. From Arraylake 0.9.5 onward,
the chunkstore is configured through the BucketConfig
interface described in Manage Storage.
Managing Configuration
After you have initially configured ArrayLake (with arraylake config init
) you can manage your configuration with the ArrayLake CLI or the Python config.
- CLI
- Python
# set or update a configuration setting
arraylake config set user.org earthmover
# get a configuration setting
arraylake config get user.org
# print your current configuration
arraylake config list
from arraylake import config
# set or update a configuration setting
config.set({"chunkstore.hash_method": "hashlib.sha256"})
# get a configuration setting
config.get("chunkstore.uri")
# print your current configuration
config.pprint()
Config options can also be temporarily overridden in the CLI and in Python:
- CLI
- Python
# temporarily set a config setting
arraylake --config foo.bar=spam repo create myorg/myrepo
from arraylake import config, Client
with config.set({"chunkstore.hash_method": "hashlib.sha256"}):
client = Client()
client.create_repo("myorg/myrepo")
...
Chunkstore Configuration
Arraylake relies on an object storage service for storing chunks. Arraylake currently supports two flavors of object storage:
- AWS S3-compatible object stores. In addition to AWS itself,
many other object storage services implement an S3-compatible API. Arraylake
works with all S3-compatible object stores. S3-compatible object stores should use
a
chunkstore.uri
configuration parameter that begins withs3://
. - Google Cloud Storage. For Google Cloud Storage,
the
chunkstore.uri
parameter should begin withgs://
.
This location for storing new chunks is specified by the chunkstore.uri
configuration option.
This option requires at least a bucket name, e.g. s3://my-bucket
.
Optionally, you can specify an additional prefix under which to store data,
e.g. s3://my-bucket/prefix
.
Setting the S3 chunkstore.uri
via the Arraylake configuration is deprecated and will be removed in the future.
Going forward, users will configure access to the chunkstore through Arraylake's managed bucket configuration
described in Manage Storage.
Credentials
Object store credentials can be managed in two ways:
- Role-based access delegation (AWS S3 only): Under this bucket authorization configuration, object store credentials are managed by a user-managed AWS IAM policy that gives the Earthmover authorization service permission to assume this role and generate temporary, scoped credentials to access the chunkstore bucket. For more information on how to configure a this IAM role, see Manage Storage.
To use role-based access delegation for object store credential management, you must set chunkstore.use_delegated_credentials
to True
in the config.
- Self-managed: Under the self-managed bucket authorization configuration, object store credentials are not managed by Arraylake. You should configure your client environment with appropriate permissions to read (and, if desired, write) to your bucket.
S3-Compatible Object Storage
Arraylake determines you are using S3-compatible object storage if the chunkstore.uri
configuration parameter begins with s3://
.
To configure your client environment to read and write to an S3-compatible object store, use AWS configuration and credentials files. This is only needed when using self-managed auth.
Custom configuration for S3-compatible object storage can be provided via the s3
configuration
namespace. The parameters in this namespace will be passed as arguments when creating a
boto3 client.
For standard AWS S3 object storage, no extra config is required. For interacting with non-AWS S3 object storage services, the following options may be helpful
-
s3.endpoint_url
- can be used to point at a non-AWS S3 service. For example, to host a chunkstore on Wasabi Cloud, sets3.entpoint_url
tohttps://s3.wasabisys.com
:arraylake config set s3.endpoint_url https://s3.wasabisys.com
warningSetting the S3
s3.endpoint_url
via the Arraylake configuration is deprecated and will be removed in the future. Going forward, users will configure access to the chunkstore through Arraylake's managed bucket configuration described here. -
s3.verify
- to bypass verification of SSL certificates (sometimes needed with on-prem object storage such as Ceph), sets3.verify
toFalse
. -
s3.anon
- This is a special option (not part of the official boto3 API) which can be used to trigger anonymous access. Suitable for read-only access to public data.
Google Cloud Storage
Arraylake determines you are using Google Cloud Storage if the chunkstore.uri
configuration parameter begins with gs://
.
To configure your client environment to read and write to Google Cloud Storage, you can use any of the supported Google Cloud Authentication methods.
For standard Google Cloud Storage, no extra config is required.
Custom configuration for Google Cloud Storage can be provided via the gs
configuration
namespace. Common parameters include
gs.project
- The project to use.gs.token
- A custom authentication token. Useanon
for anonymous access.
Diagnostics configuration
The Arraylake client logs a limited set of diagnostics about the user's environment when logging in.
The contents of these diagnostics can be inspected with arraylake --diagnostics
. To disable logging
of user diagnostics, set user.diagnostics
to False
.
Config Reference
Example config options are shown in the table below:
Field | Type | Example |
---|---|---|
service.uri | string | https://api.earthmover.io |
server_managed_sessions | bool | True |
chunkstore.uri | string | s3://mychunkstore |
chunkstore.hash_method | string | hashlib.sha256 |
chunkstore.inline_threshold_bytes | int | 512 |
chunkstore.unsafe_use_fill_value_for_missing_chunks | bool | False |
chunkstore.use_delegated_credentials | bool | False |
s3.endpoint_url | string | https://s3.wasabisys.com |
s3.anon | bool | True |
gs.project | string | my-gs-project |
gs.token | string | anon |
user.org | string | earthmover |
user.diagnostics | bool | True |
async.batch_size | int | 10 |
async.concurrency | int | 4 |
Setting the chunkstore.unsafe_use_fill_value_for_missing_chunks
via the Arraylake configuration is an
advanced feature and should be used with caution. The intent of this option is to enable a fallback option
should a chunk go missing from the object store. We recommend only using this when debugging an issue with your object store.