Skip to main content

Configuration

First Time Use

Before using ArrayLake for the first time, you should set some basic configuration options via the CLI:

arraylake config init

This will walk you through the basic setup process and only needs to be run once.

note

arraylake config init is only required for legacy Arraylake users. From Arraylake 0.9.5 onward, the chunkstore is configured through the BucketConfig interface described in Manage Storage.

Managing Configuration

After you have initially configured ArrayLake (with arraylake config init) you can manage your configuration with the ArrayLake CLI or the Python config.

# set or update a configuration setting
arraylake config set user.org earthmover

# get a configuration setting
arraylake config get user.org

# print your current configuration
arraylake config list

Config options can also be temporarily overridden in the CLI and in Python:

# temporarily set a config setting
arraylake --config foo.bar=spam repo create myorg/myrepo

Chunkstore Configuration

Arraylake relies on an object storage service for storing chunks. Arraylake currently supports two flavors of object storage:

  • AWS S3-compatible object stores. In addition to AWS itself, many other object storage services implement an S3-compatible API. Arraylake works with all S3-compatible object stores. S3-compatible object stores should use a chunkstore.uri configuration parameter that begins with s3://.
  • Google Cloud Storage. For Google Cloud Storage, the chunkstore.uri parameter should begin with gs://.

This location for storing new chunks is specified by the chunkstore.uri configuration option. This option requires at least a bucket name, e.g. s3://my-bucket. Optionally, you can specify an additional prefix under which to store data, e.g. s3://my-bucket/prefix.

warning

Setting the S3 chunkstore.uri via the Arraylake configuration is deprecated and will be removed in the future. Going forward, users will configure access to the chunkstore through Arraylake's managed bucket configuration described in Manage Storage.

Object store credentials are not managed by Arraylake. You should configure your client environment with appropriate permissions to read (and, if desired, write) to your bucket.

S3-Compatible Object Storage

Arraylake determines you are using S3-compatible object storage if the chunkstore.uri configuration parameter begins with s3://.

To configure your client environment to read and write to an S3-compatible object store, use AWS configuration and credentials files.

Custom configuration for S3-compatible object storage can be provided via the s3 configuration namespace. The parameters in this namespace will be passed as arguments when creating a boto3 client.

For standard AWS S3 object storage, no extra config is required. For interacting with non-AWS S3 object storage services, the following options may be helpful

  • s3.endpoint_url - can be used to point at a non-AWS S3 service. For example, to host a chunkstore on Wasabi Cloud, set s3.entpoint_url to https://s3.wasabisys.com:

    arraylake config set s3.endpoint_url https://s3.wasabisys.com
    warning

    Setting the S3 s3.endpoint_url via the Arraylake configuration is deprecated and will be removed in the future. Going forward, users will configure access to the chunkstore through Arraylake's managed bucket configuration described here.

  • s3.verify - to bypass verification of SSL certificates (sometimes needed with on-prem object storage such as Ceph), set s3.verify to False.

  • s3.anon - This is a special option (not part of the official boto3 API) which can be used to trigger anonymous access. Suitable for read-only access to public data.

Google Cloud Storage

Arraylake determines you are using Google Cloud Storage if the chunkstore.uri configuration parameter begins with gs://.

To configure your client environment to read and write to Google Cloud Storage, you can use any of the supported Google Cloud Authentication methods.

For standard Google Cloud Storage, no extra config is required. Custom configuration for Google Cloud Storage can be provided via the gs configuration namespace. Common parameters include

  • gs.project - The project to use.
  • gs.token - A custom authentication token. Use anon for anonymous access.

Diagnostics configuration

The Arraylake client logs a limited set of diagnostics about the user's environment when logging in. The contents of these diagnostics can be inspected with arraylake --diagnostics. To disable logging of user diagnostics, set user.diagnostics to False.

Config Reference

Example config options are shown in the table below:

FieldTypeExample
service.uristringhttps://api.earthmover.io
server_managed_sessionsboolTrue
chunkstore.uristrings3://mychunkstore
chunkstore.hash_methodstringhashlib.sha256
chunkstore.inline_threshold_bytesint512
s3.endpoint_urlstringhttps://s3.wasabisys.com
s3.anonboolTrue
gs.projectstringmy-gs-project
gs.tokenstringanon
user.orgstringearthmover
user.diagnosticsboolTrue
async.batch_sizeint10
async.concurrencyint4