Skip to contents

Introduction

The Harmonized Landsat Sentinel-2 (HLS) project is fantastic dataset offering global, harmonized surface reflectance data from Landsat 8 and Sentinel-2 satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The data has a high quality cloud/shadow bitmask layer, enabling the creation of excellent cloud-free composites. Further information on HLS can be found at the following links:

Combining the Harmonized landsat and sentinel collections does require some additional work because the two collections have different numbers of bands with different names.

This Vignette outlines a workflow to combine the two collections into a single median composite image that maintains all available bands.

You will need to have an Earthdata account to access the HLS data. You can create an account at https://urs.earthdata.nasa.gov/users/new.

Workflow setup

First we load the vrtility package and set up our multiprocessing daemons using the mirai package. Using mirai daemons allows for the automated speed up of several of the vrtility processing steps.

Next we define our area of interest, we do this using the gdalraster package but you can as easily simply provide a numeric vector (length 4) with lon/lat coordinates ordered by xmin, ymin, xmax, ymax.

When working with any raster data it is often more convenient to use a projected coordinate system. The bbox_to_projected function provides a convenient way to reproject a bounding box, either with a specific spatial reference system (SRS) or using a default SRS appropriate for the area of interest. The default SRS is an equal area projection centered around the centroid of the bounding box.

library(vrtility)

mirai::daemons(10)
#> Error in mirai::daemons(10): daemons already set for `default` compute profile

bbox <- gdalraster::bbox_from_wkt(
  wkt = wk::wkt("POINT (144.3 -7.6)"),
  extend_x = 0.17,
  extend_y = 0.125
)

te <- bbox_to_projected(bbox)
trs <- attr(te, "wkt")

Now we need to find the data. we need to query both the HLS Landsat and Sentinel-2 collections. The hls_stac_query function provides a convenient way to do this. First we will query the HLS Landsat collection. Here we set a maximum cloud cover of 35% and a date range of 2023-01-01 to 2023-12-31. we can see that of a total of 70 images, only 6 had less than 35% cloud cover.

hlssl_stac <- hls_stac_query(
  bbox = bbox,
  start_date = "2023-01-01",
  end_date = "2023-12-31",
  max_cloud_cover = 35,
  collection = "HLSL30_2.0"
)
print(hlssl_stac)
#> ###Items
#> - matched feature(s): 70
#> - features (6 item(s) / 64 not fetched):
#>   - HLS.L30.T55MBM.2023017T003226.v2.0
#>   - HLS.L30.T54MZS.2023017T003226.v2.0
#>   - HLS.L30.T55MBM.2023065T003210.v2.0
#>   - HLS.L30.T54MZS.2023065T003210.v2.0
#>   - HLS.L30.T55MBM.2023337T003219.v2.0
#>   - HLS.L30.T54MZS.2023337T003219.v2.0
#> - assets: B01, B02, B03, B04, B05, B06, B07, B09, B10, B11, Fmask
#> - item's fields: 
#> assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

And now we will query the HLS Sentinel-2 collection using the same parameters. Again we can see that < 10% of images have less than 35% cloud cover.

hlsss_stac <- hls_stac_query(
  bbox = bbox,
  start_date = "2023-01-01",
  end_date = "2023-12-31",
  max_cloud_cover = 35,
  collection = "HLSS30_2.0"
)
print(hlsss_stac)
#> ###Items
#> - matched feature(s): 98
#> - features (8 item(s) / 90 not fetched):
#>   - HLS.S30.T54MZS.2023029T004701.v2.0
#>   - HLS.S30.T55MBM.2023074T004709.v2.0
#>   - HLS.S30.T55MBM.2023079T004701.v2.0
#>   - HLS.S30.T55MBM.2023109T004701.v2.0
#>   - HLS.S30.T54MZS.2023109T004701.v2.0
#>   - HLS.S30.T55MBM.2023319T004701.v2.0
#>   - HLS.S30.T55MBM.2023364T004709.v2.0
#>   - HLS.S30.T54MZS.2023364T004709.v2.0
#> - assets: 
#> B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, Fmask
#> - item's fields: 
#> assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

In order to download the data we need to set up our Earthdata credentials. The simplest way to do this is with the earthdatalogin package. But, because we are using asynchronous mirai daemons, we need to set the credentials for all daemons.

mirai::everywhere(earthdatalogin::edl_netrc(
  username = Sys.getenv("EARTHDATA_USER"),
  password = Sys.getenv("EARTHDATA_PASSWORD")
))

Now we can begin forming the VRT pipeline. Here we “collect” all of the assets from the STAC collections. This is essentially just a list of virtual rasters. This can take a little time due to VRT validation - however, GDAL caching makes re-accessing these remote files faster, for the subsequent parts of the workflow.

hls_sl_col <- vrt_collect(hlssl_stac)
print(hls_sl_col)
#> → <VRT Collection>
#> 
#>  VRT SRS: 
#> PROJCS["WGS 84 / UTM zone 55N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32655"]]
#> 
#>  PROJCS["WGS 84 / UTM zone 54N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32654"]]
#> Bounding Box: NA
#> Pixel res: 30, 30
#> Start Date: 2023-01-17 00:32:26 UTC
#> End Date: 2023-12-03 00:32:19 UTC
#> Number of Items: 6
#> Assets: B01, B02, B03, B04, B05, B06, B07, B09, B10, B11, Fmask

hls_ss_col <- vrt_collect(hlsss_stac)
print(hls_ss_col)
#> → <VRT Collection>
#> 
#>  VRT SRS: 
#> PROJCS["WGS 84 / UTM zone 54N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32654"]]
#> 
#>  PROJCS["WGS 84 / UTM zone 55N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32655"]]
#> Bounding Box: NA
#> Pixel res: 30, 30
#> Start Date: 2023-01-29 00:49:07 UTC
#> End Date: 2023-12-30 00:49:10 UTC
#> Number of Items: 8
#> Assets: B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B10, B11, B12, Fmask

The print method for vrt_collection objects gives us a high level overview of the imagery we will download. Our collections contain images with two different SRS. We can also see that the number of bands differs across the two collections.

Now we set the mask function required for the HLS data. The “Fmask” band is a true bitmask (unlike other datasets which may use ineger masks). Therefore we must use the build_bitmask function to set the mask and can specify the bits that we wish to set as nodata across all bands.

hls_sl_col_mask <- vrt_set_maskfun(
  hls_sl_col,
  mask_band = "Fmask",
  mask_values = c(0, 1, 2, 3),
  build_mask_pixfun = build_bitmask(),
  drop_mask_band = TRUE
)

hls_ss_col_mask <- vrt_set_maskfun(
  hls_ss_col,
  mask_band = "Fmask",
  mask_values = c(0, 1, 2, 3),
  build_mask_pixfun = build_bitmask(),
  drop_mask_band = TRUE
)

# let's check out a vrt and inspect the mask function that we used.
print(hls_sl_col_mask, maskfun = TRUE)
#> → <VRT Collection>
#> Mask Function:
#> import numpy as np
#> def build_bitmask(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
#>                   raster_ysize, buf_radius, gt, **kwargs):
#>     # Convert comma-separated bit positions to integers
#>     bit_positions = [int(x) for x in kwargs['mask_values'].decode().split(',')]
#>     
#>     # Initialize mask array with zeros
#>     mask = np.zeros_like(in_ar[0], dtype=bool)
#>     
#>     # Combine masks for each bit position using OR
#>     for bit in bit_positions:
#>         mask |= np.bitwise_and(in_ar[0], np.left_shift(1, bit)) > 0
#>     
#>     # Set output: 255 for valid pixels (mask True), 0 for invalid
#>     out_ar[:] = np.where(mask, 0, 1)
#> 
#> 
#>  VRT SRS: 
#> PROJCS["WGS 84 / UTM zone 55N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32655"]]
#> 
#>  PROJCS["WGS 84 / UTM zone 54N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",141],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32654"]]
#> Bounding Box: NA
#> Pixel res: 30, 30
#> Start Date: 2023-01-17 00:32:26 UTC
#> End Date: 2023-12-03 00:32:19 UTC
#> Number of Items: 6
#> Assets: B01, B02, B03, B04, B05, B06, B07, B09, B10, B11

Now we need to align these data so that we can composite all the imagery in one go. We can do this by simply adding nodata bands where required. In the case of the HLS Landsat collection, we also need to move the position of the cirrus band to match that of the HLS Sentinel-2 collection.

Similarly, with HLS Sentinel-2, we need to add the thermal bands that are absent, but critically, note that we need to specify the scale value for these bands. If this isn’t provided then the scale from the first band is used (which in this case is 0.0001) or if there is no scale, it is ignored. Automating this is tricky but if you miss an appropriate scale, you will see warnings when you then use vrt_stack.

hls_ls_align <- vrt_set_band_names(
  hls_sl_col_mask,
  c("A", "B", "G", "R", "N2", "S1", "S2", "C", "T1", "T2")
) |>
  vrt_add_empty_band(after = 4, description = "RE1") |>
  vrt_add_empty_band(after = 5, description = "RE2") |>
  vrt_add_empty_band(after = 6, description = "RE3") |>
  vrt_add_empty_band(after = 7, description = "N") |>
  vrt_add_empty_band(after = 9, description = "WV") |>
  vrt_move_band(band_idx = 13, after = 10)



hls_ss_align <- vrt_set_band_names(
  hls_ss_col_mask,
  c("A", "B", "G", "R", "RE1", "RE2", "RE3", "N", "N2", "WV", "C", "S1", "S2")
) |>
  vrt_add_empty_band(after = 13, description = "T1", scale_value = 0.01) |>
  vrt_add_empty_band(after = 14, description = "T2", scale_value = 0.01)

Now we can combine the two collections into a single collection using the c method. This is a simple concatenation of the two collections.

hls_merge_coll <- c(
  hls_ls_align,
  hls_ss_align
)

Now we need to warp all vrt_blocks in the collection to the same spatial reference system, extent and pixel size. This is particularly important in this case because of the mutliple SRS in the collection. Also to make use of the fastest vrt_compute engine, we need to ensure that all blocks are in the same SRS with equal dimensions.

hls_warp <- vrt_warp(
  hls_merge_coll,
  t_srs = trs,
  te = te,
  tr = c(30, 30),
  resampling = "bilinear"
)

Now we can stack the warped vrt collection - this can be thought of as a virtual raster cube. Then, we can set a pixel function which defines how the mutliple images for each band are summarised. Here we use the median_numpy pixel function to compute the cell-level median.

hls_stack <- vrt_stack(hls_warp) |>
  vrt_set_pixelfun(pixfun = median_numpy())

Finally we compute all of this using vrt_compute. Note that we specify the gdalraster engine. This is the fastest engine for computing large VRTs As it parallises downloads and processing across bands and also across raster rows using the nsplits argument. At the start of this workflow we set 10 mirai daemons. Because these are already set, they will be used as “outer” daemons processing across bands - each of these daemons will then spawn 2 “inner” daemons to process the rows of each band. This means that we will have 20 parallel processes running at once.

outfile <- tempfile(fileext = ".tif")
hls_composite <- vrt_compute(hls_stack, outfile, engine = "gdalraster", nsplits = 2)
#> ! Active mirai daemons have been detected, but fewer than the number of bands.
#> ℹ No changes were made to this mirai configuartion but this could result in performance issues

Now let’s take a look at a false-colour composite!

plot_raster_src(
  hls_composite,
  bands = c(9, 3, 2),
  minmax_pct_cut = c(25, 97)
)

And all the other bands too:

par(mfrow = c(5, 3), mar = c(0, 0, 1, 0))
purrr::walk(1:15, function(i) {
  plot_raster_src(
    hls_composite,
    bands = i,
    pal = hcl.colors(256, "Rocket"),
    legend = FALSE,
    axes = FALSE
  )
})

Note.

This vignette is a work in progress and will be updated as the package matures. Principally, the major goal is to use the geometric median for compositing.