Understanding Write Conflicts
Arraylake's version control system means that multiple users can safely collaborate on the same datasets. However, this also means that multiple clients editing the same metadata files or array data objects can give rise to situations where changes to the same objects conflict with one another. This tutorial will illustrate scenarios that give rise to these conflicts. See Transactions and Version Control for a detailed discussion of these concepts.
Learning Goals
- Understand how Arraylake:
- Uses version control to allow multiple users to modify the same Zarr dataset while ensuring that the dataset is always consistent.
- Detects whether multiple writes modify the same object.
- Uses sessions for version control.
- Employs an 'optimistic concurrency' strategy for conflict resolution.
- Understand when conflicts may or may not arise in the following situations:
- Multiple users modify metadata.
- Multiple users modify chunk data.
- Multiple users modify chunk and meta data.
# condense error messages
%xmode minimal
import arraylake as al
import numpy as np
import zarr
Exception reporting mode: Minimal
Overview
The examples in this section will relate to the scenario of two (or more) users in an organization actively working on and writing common data objects. This is also analogous to multiple writers on different machines within a distributed system modifying the same datasets. To illustrate these examples, we will create two connections to the same Arraylake Repo. Throughout these examples, they will be referred to as 'User A' and 'User B', and they will interact with the Arraylake Repository through Repo objects, repo_a
and repo_b
, respectively.
Two clients interacting with Arraylake
First we will create two users connected to the same Arraylake Repository. We will walk through the initial process of users making changes, committing them to a Repository, and checking out changes made by another user before looking at commit conflicts in the next section.
client = al.Client()
repo_a = client.get_or_create_repo("earthmover/conflict-resolution")
repo_b = client.get_or_create_repo("earthmover/conflict-resolution")
This will create two different sessions which we can see by looking at the session_id
given by repo.status()
.
repo_a.status()