ECMWF Reanalysis 5 (ERA5) Surface
Data source
The source files are located at NSF NCAR Curated ECMWF Reanalysis 5 (ERA5).
ERA5 is the fifth-generation global reanalysis produced by the Copernicus Climate Change Service at ECMWF. It provides hourly estimates of atmospheric, land-surface and ocean variables from January 1940 to the present on a ~31 km (0.25°) grid, resolving 137 model levels up to 0.01 hPa; a 10-member 4D-Var ensemble supplies uncertainty information. The NSF NCAR Curated ERA5 collection republishes these data as CF-compliant NetCDF-4 files on AWS, delivering the hourly analyses and 12-h forecasts from both the high-resolution run and its ten-member ensemble on the same 0.25° grid—ready for large-scale research and ML/AI weather-model training.
The Arraylake / Icechunk edition is an analysis-ready dataset (ARD) that retains only the single-level surface fields—18 hourly variables such as 2 m temperature, 10 m wind, surface pressure, cloud cover, snow depth, and total-column water vapour—spanning 1975-01-01
to 2024-12-31
. Rechunked into a ~30 TB Icechunk v3 Zarr store, it supports fast spatial-slicing, making it ideal for climate research and for training ML/AI weather-forecast models.
Arraylake Repo
The ERA5-Surface Arraylake repo is named earthmover-public/era5-surface-aws
and can be browsed at:
https://app.earthmover.io/earthmover-public/era5-surface-aws
Current sub-groups
spatial
– 18 hourly single-level surface variables on the 0.25° grid (1975-01-01 → 2024-12-31). Chunks are (time=1
,latitude=721
,longitude=1440
)—one full global map per hour—so map-style and regional queries load fast.
Additional groups optimised for pure time-series access applications will be added in future releases.
Open Repo and explore contents
Let’s instantiate an Arraylake client, point it at the Earthmover API, and grab the earthmover-public/era5-surface-aws
repo so we can open it with xarray
.
from arraylake import Client
client = Client()
repo = client.get_repo("earthmover-public/era5-surface-aws")
Establish a read-only view of the immutable “main” branch
session = repo.readonly_session("main")
Open the 'spatial' group with Xarray
import xarray as xr
ds = xr.open_dataset(
session.store,
engine="zarr",
consolidated=False,
zarr_format=3,
chunks=None,
group="spatial",
)
ds
is now an xarray Dataset
object that streams data on demand, ready for inspection or analysis.
print(ds)
<xarray.Dataset> Size: 33TB
Dimensions: (time: 438312, latitude: 721, longitude: 1440)
Coordinates:
* latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
* time (time) datetime64[ns] 4MB 1975-01-01 ... 2024-12-31T23:00:00
Data variables: (12/18)
d2 (time, latitude, longitude) float32 2TB ...
blh (time, latitude, longitude) float32 2TB ...
cape (time, latitude, longitude) float32 2TB ...
skt (time, latitude, longitude) float32 2TB ...
mslp (time, latitude, longitude) float32 2TB ...
sp (time, latitude, longitude) float32 2TB ...
... ...
u10 (time, latitude, longitude) float32 2TB ...
t2 (time, latitude, longitude) float32 2TB ...
tcc (time, latitude, longitude) float32 2TB ...
u100 (time, latitude, longitude) float32 2TB ...
v10 (time, latitude, longitude) float32 2TB ...
v100 (time, latitude, longitude) float32 2TB ...
Attributes:
DATA_SOURCE: ECMWF: https://cds.climate.copernicus.eu, Copernicus Climat...
Conventions: CF-1.6
history: Created by Earthmover PBC on 2025-07-07 22:26:57 by combini...
Total dataset size in TB (terabytes)
size_tb = ds.nbytes / 1024**4
print(f"Dataset size: {size_tb:.1f} TB")
Dataset size: 29.8 TB
The dataset contains:
- ➕ 18 hourly surface variables (e.g., skt, cape, u10, swvl1)
- 📐 Dimensions: time (438 312) × latitude (721) × longitude (1 440)
- 🌍 Coordinates: latitudes −90 → 90 °, longitudes 0 → 360 °
- 💾 Total size: ~30 TB stored in Zarr v3 / Icechunk format
- 📝 Global attributes and CF-1.6 metadata (data source, history, conventions)
Spatial Query and Plot
Extract a single-hour slice of 2 m air temperature and render it as a global map for 1 Jan 2018 12 UTC
ds.t2.sel(time="2018-01-01 12:00").plot(cmap="coolwarm", robust=True)
<matplotlib.collections.QuadMesh at 0x71540a012180>