Coiled
Arraylake and Coiled work well togetherโyou can use Coiled to manage your cloud infrastructure, run your computations in parallel with Dask, and use Arraylake as your cloud data lake platform. Coiled provides four interfaces for initializing resources:
- Dask clusters
- Serverless functions
- CLI jobs
- Jupyter notebooks
General patternโ
The code snippet below demonstrates a general pattern of how to use Coiled and Arraylake in a simple workflow.
import coiled
import arraylake as al
import xarray as xr
cluster = coiled.Cluster(
n_workers=100, # Start 100 machines on AWS, GCP or Azure
)
dask_client = cluster.get_client()
# Connect to Arraylake by specifying 'organization/repo'
al_client = al.Client()
repo = al_client.get_repo("my-climate-company/ocean-data")
# Begin a writable Icechunk session
session = repo.writable_session("main")
# Read array data from Arraylake via Icechunk
ds = xr.open_dataset(
session.store,
group="xarray/ocean-temp",
chunks="auto", # Use Dask for parallelism
)
# Run your computation in parallel on the cloud
temps = ds.groupby("time.season").mean("temp").compute()
# Write result to Arraylake
temps.to_zarr(
session.store,
group="xarray/avg-season-temps",
engine="zarr"
)
# commit the change to our Arraylake repo.
session.commit('wrote ocean temp data to repo')
Specific examplesโ
The following sections detail how to use Arraylake with the different Coiled APIs. To start, you will need an Arraylake API token (these begin with "ema_").
- Dask cluster
- Serverless functions
- CLI jobs
- Jupyter notebooks
Dask clusterโ
Arraylake access: Set ARRAYLAKE_TOKEN
environment variable to your Arraylake API token.
Dask is a general purpose library for parallel computing that is closely integrated with the PyData ecosystem (Zarr, Xarray, GeoTIFF, etc.) to scale out your workflows. Coiled deploys Dask clusters on the cloud.
Parallelize workflows involving Arraylake by spinning up a Dask cluster with a set number of workers. Before initializing cluster, set the ARRAYLAKE_TOKEN
environment variable with your API token in order to credential into Arraylake. The following example demonstrates initiating a cluster of Dask workers, reading a dataset, and writing it as a Zarr data cube to an Arraylake Repo.
In a Python session:
import coiled
import arraylake as al
import xarray as xr
import zarr
cluster = coiled.Cluster(n_workers=10)
This will prompt Coiled to create a cluster of Dask workers:
โญโโโโโโโโโโโโโโโโโโโโโโโโโ Package Sync for arraylake โโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Fetching latest package priorities โโโโโโโโโโโโโโโโโโโโโโโ 0:00:00 โ
โ Scanning 208 conda packages โโโโโโโโโโโโโโโโโโโโโโโ 0:00:00 โ
โ Scanning 338 python packages โโโโโโโโโโโโโโโโโโโโโโโ 0:00:00 โ
โ Running pip check โโโโโโโโโโโโโโโโโโโโโโโ 0:00:02 โ
โ Validating environment โโโโโโโโโโโโโโโโโโโโโโโ 0:00:03 โ
โ Creating wheel for arraylake โโโโโโโโโโโโโโโโโโโโโโโ 0:00:07 โ
โ Uploading arraylake โโโโโโโโโโโโโโโโโโโโโโโ 0:00:00 โ
โ Requesting package sync build โโโโโโโโโโโโโโโโโโโโโโโ 0:00:00 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Package Info โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โท โ
โ Package โ Note โ
โ โถโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโด โ
โ arraylake โ Wheel built from โ
โ โ ~/Desktop/earthmover/arraylake/client โ
โ โต โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Coiled Cluster โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ https://cloud.coiled.io/clusters/537310?account=dask โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโ Overview โโโโโโโโโโโโโโโฎโญโโโโโโโโโโโ Configuration โโโโโโโโโโโโโฎ
โ โโ โ
โ Name: dask-1b0d5c8f โโ Region: us-east-2 โ
โ โโ โ
โ Scheduler Status: started โโ Scheduler: m6i.xlarge โ
โ โโ โ
โ Dashboard: โโ Workers: m6i.xlarge (2) โ
โ https://cluster-upast.dask.host?toke โโ โ
โ n=U-3fkZ5GRwezON1C โโ Workers Requested: 2 โ
โ โโ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโ (2024/07/26 12:54:51 MDT) โโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ All workers ready. โ