Skip to main content

Icechunk 2

Icechunk 2 is the latest version of the Icechunk storage format that powers Arraylake. Existing repositories can be migrated with one click.

Compatibility

While the Icechunk v2.x library is fully backwards compatible with v1.x repos, all new repositories created in Arraylake use Icechunk 2 by default. Versions of the Icechunk library prior to v2.x -- and versions of Arraylake prior to v0.33.0 -- cannot read or write to Icechunk 2 repos.

What's new

Icechunk 2 brings new features, stronger consistency guarantees, and higher performance:

  • JavaScript and WebAssembly bindings. Full support for Node.js and browser environments via the @earthmover/icechunk npm package.
  • Move and rename. Move or rename arrays and groups with session.move(). This is a cheap metadata-only operation that does not copy any chunks.
  • Shift and reindex chunks. Transform chunk coordinates without copying data using shift_array() and reindex_array().
  • Rectilinear chunk grids. Support for Zarr 3 variable-sized chunk grids, where each chunk along a dimension can have a different size.
  • Version control improvements. Ops log for full operation history, amend to replace previous commits, anonymous snapshots, visual ancestry graphs in the terminal and Jupyter, and empty commits.
  • Repository status and feature flags. Repos can be marked read-only. Feature flags allow enabling or disabling specific operations (e.g. move_node, create_tag).
  • Repository-level metadata. Key-value metadata on the repo itself, not just on snapshots.
  • HTTP and redirect storage backends. Read-only access to repos served over HTTP/HTTPS, plus redirect storage for CDN scenarios.
  • Relative virtual chunks (vcc:// URLs). Virtual chunk locations can reference named containers, so assets can be relocated without breaking references.
  • Performance and consistency. Redesigned snapshot structure for faster reads and writes. Stronger consistency guarantees for concurrent operations.
  • Filtered subscriptions. Marketplace listings backed by Icechunk 2 repositories support variable-level subsetting, allowing subscribers to select only the data they need.

For the full changelog, see the Icechunk 2.0.0 release notes.

Migrating existing repositories

Existing repositories continue to work without any changes. When you are ready to migrate:

  1. Update your client. Install the latest version of the arraylake package. Releases after v0.33.0 support both Icechunk 1.x and Icechunk 2.x formats transparently.
  2. Update your pipelines. Any pipeline or service writing to the repository must also use the updated client before you migrate.
  3. Migrate the repository. In the Arraylake web app, navigate to the "Optimization" tab in your repository's settings and click "Migrate now". While your repo is migrating, it will be placed into "maintenance mode", which will block most write operations. The migration process is metadata-only (no chunks will be touched) and therefore very fast. Most repos take seconds to migrate.

Migrate to Icechunk 2 button in repository settings

warning

Migration is irreversible. Once a repository has been migrated, older clients that do not support Icechunk 2 will not be able to open it. Those clients will see a prompt to upgrade when they attempt a read or write operation.

After migration

All existing data, branches, and tags are preserved. Once migrated, filtered subscriptions can be created for Marketplace listings backed by the repository.