Managing Repos
Once your Arraylake organization has been fully configured and you have installed the client library, you're ready to start managing data! 🎉
Create a Repo
For the purposes of this example, our org name will be earthmover
.
If running these commands interactively, replace earthmover
with your org name.
For this example, we are going to create a Repo called ocean
to hold oceanography data 🌊.
- CLI
- Python
- Python (asyncio)
arraylake repo create earthmover/ocean
from arraylake import Client
client = Client()
client.create_repo("earthmover/ocean")
from arraylake import AsyncClient
aclient = AsyncClient()
await aclient.create_repo("earthmover/ocean")
You can optionally add a repo description and/or metadata to your repo. Descriptions are limited to 255 characters, while repo metadata can be at most 4kB total. Metadata must be a mapping of key-value pairs where values are strings, numbers, lists, booleans, or None
.
- CLI
- Python
- Python (asyncio)
arraylake repo create earthmover/ocean --description "This is some oceanography data!" --metadata '{"type": ["climate", "coastal", "environmental"], "model": "FVCOM"}'
from arraylake import Client
client = Client()
client.create_repo(
"earthmover/ocean",
description="This is some oceanography data!",
metadata={"type": ["climate", "coastal", "environmental"], "model": "FVCOM"}
)
from arraylake import AsyncClient
aclient = AsyncClient()
await aclient.create_repo(
"earthmover/ocean",
description="This is some oceanography data!",
metadata={"type": ["climate", "coastal", "environmental"], "model": "FVCOM"}
)
You can also modify the description and/or metadata of an existing repo.
- CLI
- Python
- Python (asyncio)
arraylake repo modify earthmover/ocean --description "This is an updated description for some oceanography data!" -a '{"source": "NOAA"}' -r "type" -u '{"model": "ROMS"}'
from arraylake import Client
client = Client()
client.modify_repo(
"earthmover/ocean",
description="This is an updated description for some oceanography data!",
add_metadata={"source": "NOAA"},
remove_metadata=["type"],
update_metadata={"model": "ROMS"}
)
from arraylake import AsyncClient
aclient = AsyncClient()
await aclient.modify_repo(
"earthmover/ocean",
description="This is an updated description for some oceanography data!",
add_metadata={"source": "NOAA"},
remove_metadata=["type"],
update_metadata={"model": "ROMS"}
)
Where is Repo data stored?
Arraylake lets you configure the storage location for a Repo's data using org-level bucket configurations.
Choose a specific bucket by providing bucket_config_nickname
to create_repo
.
If not specified, the organization's default bucket is used.
Within the bucket
and prefix
set by a org-level bucket configuration, data for new Icechunk Repos are stored within another prefix.
By default, the extra prefix is set to Repo name prefixed with 8 random characters.
Choose a specific extra prefix by passing the prefix
kwarg to create_repo
.
For example, for a bucket configured with bucket='my-bucket-name'
and prefix='my-bucket-prefix
,
create_repo("repo-A")
stores data inmy-bucket-name/my-bucket-prefix/[8-RANDOM_CHARACTERS]_repo_A
create_repo("repo-B", prefix='zoo')
stores data inmy-bucket-name/my-bucket-prefix/zoo/
Open a Repo
If you're working in Python, you can open a Repo and start interacting with your data.
- Python
- Python (asyncio)
repo = client.get_repo("earthmover/ocean")
arepo = await aclient.get_repo("earthmover/ocean") # returns an AsyncRepo object
List Repos
You can list repos associated with an organization.
- CLI
- Python
- Python (asyncio)
arraylake repo list earthmover
client.list_repos("earthmover")
await aclient.list_repos("earthmover")
You can also filter repos on repo metadata. Filtering is inclusive and will return repos that match all of the provided metadata. Metadata filters must be a mapping of key-value pairs where values are strings, numbers, lists, booleans, or None
.
- CLI
- Python
- Python (asyncio)
arraylake repo list earthmover --filter-metadata '{"source": "NOAA"}'
client.list_repos("earthmover", filter_metadata={"source": "NOAA"})
await aclient.list_repos("earthmover", filter_metadata={"source": "NOAA"})
Delete a Repo
Finally, we can delete a repo.
Deleting a repo cannot be undone! Use this operation carefully.
- CLI
- Python
- Python (asyncio)
arraylake repo delete earthmover/ocean
client.delete_repo("earthmover/ocean", imsure=True, imreallysure=True)
await aclient.delete_repo("earthmover/ocean", imsure=True, imreallysure=True)