Skip to main content

Managing Repos

Once your Arraylake organization has been fully configured and you have installed the client library, you're ready to start managing data! 🎉

Create a Repo

For the purposes of this example, our org name will be earthmover. If running these commands interactively, replace earthmover with your org name.

For this example, we are going to create a Repo called ocean to hold oceanography data 🌊.

arraylake repo create earthmover/ocean

You can optionally add a repo description and/or metadata to your repo. Descriptions are limited to 255 characters, while repo metadata can be at most 4kB total. Metadata must be a mapping of key-value pairs where values are strings, numbers, lists, booleans, or None.

arraylake repo create earthmover/ocean --description "This is some oceanography data!" --metadata '{"type": ["climate", "coastal", "environmental"], "model": "FVCOM"}'

You can also modify the description and/or metadata of an existing repo.

arraylake repo modify earthmover/ocean --description "This is an updated description for some oceanography data!" -a '{"source": "NOAA"}' -r "type" -u '{"model": "ROMS"}'

Where is Repo data stored?

Arraylake lets you configure the storage location for a Repo's data using org-level bucket configurations. Choose a specific bucket by providing bucket_config_nickname to create_repo. If not specified, the organization's default bucket is used.

Within the bucket and prefix set by a org-level bucket configuration, data for new Icechunk Repos are stored within another prefix. By default, the extra prefix is set to Repo name prefixed with 8 random characters. Choose a specific extra prefix by passing the prefix kwarg to create_repo. For example, for a bucket configured with bucket='my-bucket-name' and prefix='my-bucket-prefix,

  1. create_repo("repo-A") stores data in my-bucket-name/my-bucket-prefix/[8-RANDOM_CHARACTERS]_repo_A
  2. create_repo("repo-B", prefix='zoo') stores data in my-bucket-name/my-bucket-prefix/zoo/

Open a Repo

If you're working in Python, you can open a Repo and start interacting with your data.

repo = client.get_repo("earthmover/ocean")

List Repos

You can list repos associated with an organization.

arraylake repo list earthmover

You can also filter repos on repo metadata. Filtering is inclusive and will return repos that match all of the provided metadata. Metadata filters must be a mapping of key-value pairs where values are strings, numbers, lists, booleans, or None.

arraylake repo list earthmover --filter-metadata '{"source": "NOAA"}'

Delete a Repo

Finally, we can delete a repo.

warning

Deleting a repo cannot be undone! Use this operation carefully.

arraylake repo delete earthmover/ocean