Arraylake Quick Start
Get started with Arraylake in 5 minutes by following along with this quick tutorial:
Install the ArrayLake client and dependencies for the quickstart:
# with pip
pip install "arraylake-client[cli]" xarray netcdf4 dask pooch
# or with conda
conda install -c conda-forge arraylake-client xarray netcdf4 dask pooch
Configure your client:
arraylake config init
arraylake auth login
Create your first repository:
arraylake repo create myorg/myrepo
Where myorg
is the name of your organization and myrepo
is the name of your repository.
Once you've created your repo, you can see it alongside the rest of your organizations repos:
arraylake repo list myorg
Configure environment variables:
Prior to using the Arraylake client, it's required to set the ZARR_V3_EXPERIMENTAL_API=1
environment variable.
export ZARR_V3_EXPERIMENTAL_API=1
In addition to to this setting, AWS credentials with appropriate write access to your target S3 bucket should be available in your environment.
Write data with the Python client:
Now we'll switch to Python where we'll put some data into the repo and do a few quick tasks. First we'll connect our Client:
from arraylake_client import Client
client = Client()
repo = client.get_repo("myorg/myrepo")
repo.checkout()
Next, we'll pull some of Xarray's tutorial data and dump it into ArrayLake:
import xarray as xr
air_temp = xr.tutorial.open_dataset("air_temperature").chunk("1mb")
rasm = xr.tutorial.open_dataset("rasm").chunk("1mb")
air_temp.to_zarr(repo.store, group='air_temperature', zarr_version=3)
rasm.to_zarr(repo.store, group='rasm', zarr_version=3)
Now that we've put some data into Arraylake, we can commit our changes:
commit_id = repo.commit("My first commit 🥹")
Next, we can go back and access the data in our store:
ds = xr.open_zarr(repo.store, zarr_version=3, group="rasm")
And you are off to the races. Checkout the Manage Zarr Data Tutorial to go deeper.