Skip to main content

Arraylake Quick Start

Get started with Arraylake in 5 minutes by following along with this quick tutorial:

Install the ArrayLake client and dependencies for the quickstart:

# with pip
pip install "arraylake-client[cli]" xarray netcdf4 dask pooch
# or with conda
conda install -c conda-forge arraylake-client xarray netcdf4 dask pooch

Configure your client:

arraylake config init

Log in

arraylake auth login

Create your first repository:

arraylake repo create myorg/myrepo

Where myorg is the name of your organization and myrepo is the name of your repository.

Once you've created your repo, you can see it alongside the rest of your organizations repos:

arraylake repo list myorg

Configure environment variables:

Prior to using the Arraylake client, it's required to set the ZARR_V3_EXPERIMENTAL_API=1 environment variable.

export ZARR_V3_EXPERIMENTAL_API=1

In addition to to this setting, AWS credentials with appropriate write access to your target S3 bucket should be available in your environment.

Write data with the Python client:

Now we'll switch to Python where we'll put some data into the repo and do a few quick tasks. First we'll connect our Client:

from arraylake_client import Client

client = Client()
repo = client.get_repo("myorg/myrepo")
repo.checkout()

Next, we'll pull some of Xarray's tutorial data and dump it into ArrayLake:

import xarray as xr

air_temp = xr.tutorial.open_dataset("air_temperature").chunk("1mb")
rasm = xr.tutorial.open_dataset("rasm").chunk("1mb")

air_temp.to_zarr(repo.store, group='air_temperature', zarr_version=3)
rasm.to_zarr(repo.store, group='rasm', zarr_version=3)

Now that we've put some data into Arraylake, we can commit our changes:

commit_id = repo.commit("My first commit 🥹")

Next, we can go back and access the data in our store:

ds = xr.open_zarr(repo.store, zarr_version=3, group="rasm")

And you are off to the races. Checkout the Manage Zarr Data Tutorial to go deeper.