Introduction
The Harmonized Landsat Sentinel-2 (HLS) project is fantastic dataset offering global, harmonized surface reflectance data from Landsat 8 and Sentinel-2 satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The data has a high quality cloud/shadow bitmask layer, enabling the creation of excellent cloud-free composites. Further information on HLS can be found at the following links:
Combining the Harmonized landsat and sentinel collections does require some additional work because the two collections have different numbers of bands with different names.
This Vignette outlines a workflow to combine the two collections into a single median composite image that maintains all available bands.
You will need to have an Earthdata account to access the HLS data. You can create an account at https://urs.earthdata.nasa.gov/users/new.
Workflow setup
First we load the vrtility
package and set up our
multiprocessing daemons using the mirai
package. Using
mirai daemons allows for the automated speed up of several of the
vrtility processing steps.
Next we define our area of interest, we do this using the
gdalraster
package but you can as easily simply provide a
numeric vector (length 4) with lon/lat coordinates ordered by xmin,
ymin, xmax, ymax.
When working with any raster data it is often more convenient to use
a projected coordinate system. The bbox_to_projected
function provides a convenient way to reproject a bounding box, either
with a specific spatial reference system (SRS) or using a default SRS
appropriate for the area of interest. The default SRS is an equal area
projection centered around the centroid of the bounding box.
library(vrtility)
mirai::daemons(10)
#> Error in mirai::daemons(10): daemons already set for `default` compute profile
bbox <- gdalraster::bbox_from_wkt(
wkt = wk::wkt("POINT (144.3 -7.6)"),
extend_x = 0.17,
extend_y = 0.125
)
te <- bbox_to_projected(bbox)
trs <- attr(te, "wkt")
Now we need to find the data. we need to query both the HLS Landsat
and Sentinel-2 collections. The hls_stac_query
function
provides a convenient way to do this. First we will query the HLS
Landsat collection. Here we set a maximum cloud cover of 35% and a date
range of 2023-01-01 to 2023-12-31. we can see that of a total of 70
images, only 6 had less than 35% cloud cover.
hlssl_stac <- hls_stac_query(
bbox = bbox,
start_date = "2023-01-01",
end_date = "2023-12-31",
max_cloud_cover = 35,
collection = "HLSL30_2.0"
)
print(hlssl_stac)
#> ###Items
#> - matched feature(s): 70
#> - features (6 item(s) / 64 not fetched):
#> - HLS.L30.T55MBM.2023017T003226.v2.0
#> - HLS.L30.T54MZS.2023017T003226.v2.0
#> - HLS.L30.T55MBM.2023065T003210.v2.0
#> - HLS.L30.T54MZS.2023065T003210.v2.0
#> - HLS.L30.T55MBM.2023337T003219.v2.0
#> - HLS.L30.T54MZS.2023337T003219.v2.0
#> - assets: B01, B02, B03, B04, B05, B06, B07, B09, B10, B11, Fmask
#> - item's fields:
#> assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type
And now we will query the HLS Sentinel-2 collection using the same parameters. Again we can see that < 10% of images have less than 35% cloud cover.
hlsss_stac <- hls_stac_query(
bbox = bbox,
start_date = "2023-01-01",
end_date = "2023-12-31",
max_cloud_cover = 35,
collection = "HLSS30_2.0"
)
print(hlsss_stac)
#> ###Items
#> - matched feature(s): 98
#> - features (8 item(s) / 90 not fetched):
#> - HLS.S30.T54MZS.2023029T004701.v2.0
#> - HLS.S30.T55MBM.2023074T004709.v2.0
#> - HLS.S30.T55MBM.2023079T004701.v2.0
#> - HLS.S30.T55MBM.2023109T004701.v2.0
#> - HLS.S30.T54MZS.2023109T004701.v2.0
#> - HLS.S30.T55MBM.2023319T004701.v2.0
#> - HLS.S30.T55MBM.2023364T004709.v2.0
#> - HLS.S30.T54MZS.2023364T004709.v2.0
#> - assets:
#> B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, Fmask
#> - item's fields:
#> assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type
In order to download the data we need to set up our Earthdata credentials. The simplest way to do this is with the earthdatalogin package. But, because we are using asynchronous mirai daemons, we need to set the credentials for all daemons.
mirai::everywhere(earthdatalogin::edl_netrc(
username = Sys.getenv("EARTHDATA_USER"),
password = Sys.getenv("EARTHDATA_PASSWORD")
))
Now we can begin forming the VRT pipeline. Here we “collect” all of the assets from the STAC collections. This is essentially just a list of virtual rasters. This can take a little time due to VRT validation - however, GDAL caching makes re-accessing these remote files faster, for the subsequent parts of the workflow.
hls_sl_col <- vrt_collect(hlssl_stac)
print(hls_sl_col)
#> → <VRT Collection>
#>
#> VRT SRS:
#> PROJCS["WGS 84 / UTM zone 55N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32655"]]
#>
#> PROJCS["WGS 84 / UTM zone 54N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32654"]]
#> Bounding Box: NA
#> Pixel res: 30, 30
#> Start Date: 2023-01-17 00:32:26 UTC
#> End Date: 2023-12-03 00:32:19 UTC
#> Number of Items: 6
#> Assets: B01, B02, B03, B04, B05, B06, B07, B09, B10, B11, Fmask
hls_ss_col <- vrt_collect(hlsss_stac)
print(hls_ss_col)
#> → <VRT Collection>
#>
#> VRT SRS:
#> PROJCS["WGS 84 / UTM zone 54N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32654"]]
#>
#> PROJCS["WGS 84 / UTM zone 55N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32655"]]
#> Bounding Box: NA
#> Pixel res: 30, 30
#> Start Date: 2023-01-29 00:49:07 UTC
#> End Date: 2023-12-30 00:49:10 UTC
#> Number of Items: 8
#> Assets: B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B10, B11, B12, Fmask
The print method for vrt_collection objects gives us a high level overview of the imagery we will download. Our collections contain images with two different SRS. We can also see that the number of bands differs across the two collections.
Now we set the mask function required for the HLS data. The “Fmask”
band is a true bitmask (unlike other datasets which may use ineger
masks). Therefore we must use the build_bitmask
function to
set the mask and can specify the bits that we wish to set as nodata
across all bands.
hls_sl_col_mask <- vrt_set_maskfun(
hls_sl_col,
mask_band = "Fmask",
mask_values = c(0, 1, 2, 3),
build_mask_pixfun = build_bitmask(),
drop_mask_band = TRUE
)
hls_ss_col_mask <- vrt_set_maskfun(
hls_ss_col,
mask_band = "Fmask",
mask_values = c(0, 1, 2, 3),
build_mask_pixfun = build_bitmask(),
drop_mask_band = TRUE
)
# let's check out a vrt and inspect the mask function that we used.
print(hls_sl_col_mask, maskfun = TRUE)
#> → <VRT Collection>
#> Mask Function:
#> import numpy as np
#> def build_bitmask(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
#> raster_ysize, buf_radius, gt, **kwargs):
#> # Convert comma-separated bit positions to integers
#> bit_positions = [int(x) for x in kwargs['mask_values'].decode().split(',')]
#>
#> # Initialize mask array with zeros
#> mask = np.zeros_like(in_ar[0], dtype=bool)
#>
#> # Combine masks for each bit position using OR
#> for bit in bit_positions:
#> mask |= np.bitwise_and(in_ar[0], np.left_shift(1, bit)) > 0
#>
#> # Set output: 255 for valid pixels (mask True), 0 for invalid
#> out_ar[:] = np.where(mask, 0, 1)
#>
#>
#> VRT SRS:
#> PROJCS["WGS 84 / UTM zone 55N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32655"]]
#>
#> PROJCS["WGS 84 / UTM zone 54N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32654"]]
#> Bounding Box: NA
#> Pixel res: 30, 30
#> Start Date: 2023-01-17 00:32:26 UTC
#> End Date: 2023-12-03 00:32:19 UTC
#> Number of Items: 6
#> Assets: B01, B02, B03, B04, B05, B06, B07, B09, B10, B11
Now we need to align these data so that we can composite all the imagery in one go. We can do this by simply adding nodata bands where required. In the case of the HLS Landsat collection, we also need to move the position of the cirrus band to match that of the HLS Sentinel-2 collection.
Similarly, with HLS Sentinel-2, we need to add the thermal bands that
are absent, but critically, note that we need to specify the scale value
for these bands. If this isn’t provided then the scale from the first
band is used (which in this case is 0.0001) or if there is no scale, it
is ignored. Automating this is tricky but if you miss an appropriate
scale, you will see warnings when you then use
vrt_stack
.
hls_ls_align <- vrt_set_band_names(
hls_sl_col_mask,
c("A", "B", "G", "R", "N2", "S1", "S2", "C", "T1", "T2")
) |>
vrt_add_empty_band(after = 4, description = "RE1") |>
vrt_add_empty_band(after = 5, description = "RE2") |>
vrt_add_empty_band(after = 6, description = "RE3") |>
vrt_add_empty_band(after = 7, description = "N") |>
vrt_add_empty_band(after = 9, description = "WV") |>
vrt_move_band(band_idx = 13, after = 10)
hls_ss_align <- vrt_set_band_names(
hls_ss_col_mask,
c("A", "B", "G", "R", "RE1", "RE2", "RE3", "N", "N2", "WV", "C", "S1", "S2")
) |>
vrt_add_empty_band(after = 13, description = "T1", scale_value = 0.01) |>
vrt_add_empty_band(after = 14, description = "T2", scale_value = 0.01)
Now we can combine the two collections into a single collection using
the c
method. This is a simple concatenation of the two
collections.
hls_merge_coll <- c(
hls_ls_align,
hls_ss_align
)
Now we need to warp all vrt_blocks in the collection to the same
spatial reference system, extent and pixel size. This is particularly
important in this case because of the mutliple SRS in the collection.
Also to make use of the fastest vrt_compute
engine, we need
to ensure that all blocks are in the same SRS with equal dimensions.
hls_warp <- vrt_warp(
hls_merge_coll,
t_srs = trs,
te = te,
tr = c(30, 30),
resampling = "bilinear"
)
Now we can stack the warped vrt collection - this can be thought of as a virtual raster cube. Then, we can set a pixel function which defines how the mutliple images for each band are summarised. Here we use the median_numpy pixel function to compute the cell-level median.
hls_stack <- vrt_stack(hls_warp) |>
vrt_set_pixelfun(pixfun = median_numpy())
Finally we compute all of this using vrt_compute. Note that we
specify the gdalraster
engine. This is the fastest engine
for computing large VRTs As it parallises downloads and processing
across bands and also across raster rows using the nsplits argument. At
the start of this workflow we set 10 mirai daemons. Because these are
already set, they will be used as “outer” daemons processing across
bands - each of these daemons will then spawn 2 “inner” daemons to
process the rows of each band. This means that we will have 20 parallel
processes running at once.
outfile <- tempfile(fileext = ".tif")
hls_composite <- vrt_compute(hls_stack, outfile, engine = "gdalraster", nsplits = 2)
#> ! Active mirai daemons have been detected, but fewer than the number of bands.
#> ℹ No changes were made to this mirai configuartion but this could result in performance issues
Now let’s take a look at a false-colour composite!
plot_raster_src(
hls_composite,
bands = c(9, 3, 2),
minmax_pct_cut = c(25, 97)
)
And all the other bands too:
par(mfrow = c(5, 3), mar = c(0, 0, 1, 0))
purrr::walk(1:15, function(i) {
plot_raster_src(
hls_composite,
bands = i,
pal = hcl.colors(256, "Rocket"),
legend = FALSE,
axes = FALSE
)
})