This vignette demonstrates a modern workflow for processing Sentinel-2 Level-2A data from the EOPF (Earth Observation Processing Framework) STAC API.
The Earth Observation data landscape is rapidly evolving, with EOPF
positioning itself as a key component of this transformation. There is a
strong industry movement towards adopting Zarr, a modern cloud-native
data format designed for efficient storage and access of large
multi-dimensional arrays. While the tooling ecosystem around Zarr and
EOPF-Zarr specifically is still maturing, this vignette provides a
practical example of working with these data formats using the
vrtility
package.
This vignette also demonstrates contemporary approaches to cloud-masking and image compositing using OmniCloudMask and the geometric median algorithm, respectively.
This example prioritizes data quality and modern processing methods over raw speed, demonstrating how to generate analysis-ready data using contemporary cloud-native workflows.
We also demonstrate a powerful new feature of vrtility
:
the ability to create derived bands using
vrt_derived_block()
. This feature enables you to generate
spectral indices (such as NDVI and EVI) and other derived metrics
directly within the VRT framework. Under the hood, it uses muparser expressions
executed through VRT pixel functions, providing efficient computation
without requiring intermediate file storage.
Prerequisites
This workflow requires specific system dependencies that may need additional setup:
EOPFZARR GDAL Plugin: You’ll need to build and install the EOPFZARR plugin to read EOPF-Zarr formatted data.
muparser Support: Your GDAL installation must include muparser support for derived band calculations. You can verify this by running:
vrtility::check_muparser()
.
Query the EOPF STAC API
This example demonstrates the standard workflow for discovering and filtering satellite imagery through a STAC (SpatioTemporal Asset Catalog) API. We’ll query the EOPF STAC collection and apply filters to obtain the specific datasets we need.
The process involves: 1. Querying the STAC collection for available imagery 2. Filtering by cloud coverage to ensure data quality 3. Applying coverage filters to guarantee sufficient spatial overlap
In this example, we’re targeting a bounding box around Copenhagen,
Denmark, and using stac_coverage_filter()
to ensure we only
retrieve images with adequate coverage of our area of interest.
library(vrtility)
bbox <- gdalraster::bbox_from_wkt(
wkt = "POINT (12.56 55.67)",
extend_x = 0.3,
extend_y = 0.2
)
te <- bbox_to_projected(bbox)
trs <- attr(te, "wkt")
eopf_s2 <- stac_query(
bbox = bbox,
stac_source = "https://stac.core.eopf.eodc.eu/",
collection = "sentinel-2-l2a",
start_date = "2025-05-01",
end_date = "2025-07-30"
)
eopf_filter <- rstac::assets_select(
eopf_s2,
asset_names = c("B02_10m", "B03_10m", "B04_10m", "B08_10m")
) |>
stac_cloud_filter(max_cloud_cover = 30) |>
stac_coverage_filter(bbox, min_coverage = 0.5)
Download and warp the images
Here we virtually warp the images and then download them to disk
using multiple processes within an isolated set of mirai daemons. Note
that we call mirai::daemons()
within the
with()
function to ensure that the daemons are properly
cleaned up after use.
Critically, we must specify the driver = "EOPFZARR"
argument in vrt_collect()
because GDAL will, by default,
attempt to read the files using the built-in Zarr driver, which does not
fully support the EOPF-Zarr format.
Since we want to continue processing this image set, we use the
recollect = TRUE
argument to ensure that a new VRT
collection is returned rather than file paths.
with(mirai::daemons(10), {
eopf_s2_chgn <- vrt_collect(
eopf_filter,
vsi_prefix = "/vsicurl/",
driver = "EOPFZARR"
) |>
vrt_warp(
t_srs = trs,
te = te,
tr = c(10, 10)
) |>
vrt_compute(
outfile = fs::file_temp(ext = ".tif"),
engine = "gdalraster",
recollect = TRUE
)
})
Cloud masking
Here we generate an OmniCloudMask for each image in the VRT collection using the red, green, and NIR bands. The cloud mask identifies pixels affected by clouds, cloud shadows, and other atmospheric interference. We then apply this mask to filter out low-quality pixels and compute the results to disk.
Note that we use recollect = TRUE
to return a new VRT
collection that references the newly created cloud-masked GeoTIFF files,
allowing us to continue working with the data in a virtualized format
for subsequent processing steps.
eopf_s2_mask <- eopf_s2_chgn |>
vrt_create_mask(
inbands = c(red = 3, green = 2, nir = 4),
maskfun = create_omnicloudmask()
) |>
vrt_set_maskfun(
mask_band = "omnicloudmask",
mask_values = 1:3
) |>
vrt_compute(
outfile = fs::file_temp(ext = ".tif"),
engine = "warp",
recollect = TRUE
)
purrr::walk(
seq_len(eopf_s2_mask$n_items),
~ plot(eopf_s2_mask, item = .x, c(3, 2, 1), na_col = "#ffa928")
)
Geometric median composite
Here we generate a geometric median composite from the cloud-masked
images using multiband_reduce()
with the
geomedian()
function. The geometric median is a robust
statistical method that effectively reduces noise and preserves spectral
characteristics while minimizing the influence of outliers and residual
cloud contamination.
We use mirai::daemons()
to parallelize the computation
across multiple cores, significantly improving processing speed for this
computationally intensive operation.
with(mirai::daemons(10), {
eopf_s2_median <- multiband_reduce(
eopf_s2_mask,
reduce_fun = geomedian(),
outfile = fs::file_temp(ext = ".tif"),
recollect = TRUE
)
})
plot(
eopf_s2_median,
c(3, 2, 1),
na_col = "#ffa928",
pal = grDevices::hcl.colors(10, "YlGnBu")
)
Derived bands
Spectral indices are essential tools in remote sensing for extracting meaningful information from multi-band imagery. Common indices like NDVI (Normalized Difference Vegetation Index), EVI (Enhanced Vegetation Index), and others help identify vegetation health, water content, and other surface characteristics.
The vrtility
package provides
vrt_derived_block()
for efficient creation of these derived
bands. This function uses R’s intuitive formula syntax to define band
calculations, which are then automatically translated to muparser
expressions and executed using VRT pixel functions. This approach
provides memory-efficient computation without requiring intermediate
file storage.
In this example, we’ll calculate several useful indices and visualize the results:
exptest <- vrt_derived_block(
eopf_s2_median,
ndvi ~ (B08_10m - B04_10m) / (B08_10m + B04_10m),
ndti ~ (B04_10m - B03_10m) / (B04_10m + B03_10m),
evi ~ 2.5 * (B08_10m - B04_10m) / (B08_10m + 6 * B04_10m - 7.5 * B02_10m + 1),
vgnirbi ~ (B03_10m - B08_10m) / (B03_10m + B08_10m)
)
print(exptest, pixfun = TRUE)
#> → <VRT Block>
#> VRT XML: [hidden]
#> run print(x, xml = TRUE) to view
#> Pixel Function:
#> ndvi ~ ((B08_10m * 0.0001) - (B04_10m * 0.0001))/((B08_10m * 0.0001) + (B04_10m * 0.0001))
#> ndti ~ ((B04_10m * 0.0001) - (B03_10m * 0.0001))/((B04_10m * 0.0001) + (B03_10m * 0.0001))
#> evi ~ 2.5 * ((B08_10m * 0.0001) - (B04_10m * 0.0001))/((B08_10m * 0.0001) + 6 * (B04_10m * 0.0001) - 7.5 * (B02_10m * 0.0001) + 1)
#> vgnirbi ~ ((B03_10m * 0.0001) - (B08_10m * 0.0001))/((B03_10m * 0.0001) + (B08_10m * 0.0001))
#>
#>
#> VRT SRS:
#> PROJCS["unknown",GEOGCS["unknown",DATUM["Unknown based on WGS 84 ellipsoid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["latitude_of_center",55.67],PARAMETER["longitude_of_center",12.56],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
#> Bounding Box: -18980 -22270 18980 22310
#> Pixel res: 10, 10
#> Assets: ndvi, ndti, evi, vgnirbi
#> No Data Value(s): 0, 0, 0, 0
#> Date Time: 2025-06-21 22:21:41 UTC
x <- vrt_compute(
exptest,
outfile = fs::file_temp(ext = ".tif")
)
purrr::walk(
1:4,
~ plot_raster_src(
x,
band = .x,
minmax_def = if (.x == 3) c(-1, 1) else NULL,
pal = grDevices::hcl.colors(10, "Inferno"),
legend = FALSE,
axes = FALSE
)
)