Skip to contents

This vignette aims to provide a “cookbook” walking through common use cases and code patterns for rsi. If you’ve got a problem that it seems like rsi should be able to solve, hopefully this document can help – otherwise, open an issue and we’ll see if we can get you started! And if you’ve got a use case that took you a second to figure out, please feel free to open a PR to add it as an example to this document.

With that introduction out of the way, we’ll go ahead and load rsi:

And start answering: How can I…

Get one composite per year, month, or other interval?

If you’re looking to get separate files for several intervals, you’ll need to call get_stac_data() separately for each of those intervals. The easiest way to do this is through something like vapply() or a for-loop. Iterate along your intervals of interest, construct your start_date and end_date inside of each iteration, and call get_stac_data() using those dates as arguments:

aoi <- sf::st_point(c(-74.912131, 44.080410))
aoi <- sf::st_set_crs(sf::st_sfc(aoi), 4326)
aoi <- sf::st_buffer(sf::st_transform(aoi, 5070), 1000)

downloaded_years <- vapply(
  2018:2020,
  function(year) {
    get_stac_data(
      aoi = aoi,
      start_date = glue::glue("{year}-01-01"),
      end_date = glue::glue("{year}-12-31"),
      asset_names = "lcpri",
      stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1",
      collection = "usgs-lcmap-conus-v13",
      output_filename = tempfile(fileext = ".tif")
    )
  },
  character(1)
)

downloaded_years
#> [1] "/tmp/RtmpwkKpr9/file28725a50761.tif" 
#> [2] "/tmp/RtmpwkKpr9/file287250d4cd3f.tif"
#> [3] "/tmp/RtmpwkKpr9/file287257b6f008.tif"

This ensures that get_stac_data() is run using the same arguments each time, so your outputs should be standardized across each interval!

invisible(lapply(lapply(downloaded_years, terra::rast), terra::plot))

Filter the imagery I download by cloud cover (or other metadata)?

If you want to refine the outputs from your STAC query, the best approach is to write a custom query function using CQL2. CQL2 is a complicated topic, which is covered in a bit more detail both in the Downloading Data vignette as well as in the STAC website’s tutorials, but at its core is a query language that lets us filter our results down using a spatiotemporal area of interest as well as other item-level metadata.

This requires knowing what metadata your STAC API provides for the items you’re querying! Luckily, a good number of these fields are standardized via the STAC standard and various STAC extensions. For instance, items implementing the electro-optical extension will have an eo:cloud_cover field which we could use to filter the results of our query.

Writing a query function to filter using this field might look like this:

aoi <- sf::st_point(c(-74.912131, 44.080410))
aoi <- sf::st_set_crs(sf::st_sfc(aoi), 4326)
aoi <- sf::st_buffer(sf::st_transform(aoi, 5070), 1000)

custom_query_function <- function(bbox, 
                                  stac_source, 
                                  collection, 
                                  start_date, 
                                  end_date, 
                                  limit, 
                                  ...) {
  geometry <- rstac::cql2_bbox_as_geojson(bbox) 
  datetime <- rstac::cql2_interval(start_date, end_date)
  
  request <- rstac::ext_filter(
    rstac::stac(stac_source),
    collection == {{collection}} &&
      t_intersects(datetime, {{datetime}}) &&
      s_intersects(geometry, {{geometry}}) && 
      `eo:cloud_cover` < 50
  )
  rstac::items_fetch(rstac::post_request(request))
}

And we can use this to filter down how much data we’ll download! For instance, we could download Landsat imagery for our area of interest without using our new query function and setting composite_function to NULL, so that we’ll get one file per Landsat image:

unfiltered_landsat_images <- get_landsat_imagery(
  aoi,
  start_date = "2023-06-01",
  end_date = "2023-09-01",
  # Only downloading one asset, because we aren't using this data for anything
  asset_names = landsat_band_mapping$planetary_computer_v1["red"],
  mask_function = NULL,
  mask_band = NULL,
  output_filename = tempfile(fileext = ".tif"),
  composite_function = NULL
)
length(unfiltered_landsat_images)
#> [1] 12

And we could compare that to a query using our custom query function, to see how many images are filtered out by our CQL2 query:

filtered_landsat_images <- get_landsat_imagery(
  aoi,
  start_date = "2023-06-01",
  end_date = "2023-09-01",
  query_function = custom_query_function,
  asset_names = landsat_band_mapping$planetary_computer_v1["red"],
  mask_function = NULL,
  mask_band = NULL,
  output_filename = tempfile(fileext = ".tif"),
  composite_function = NULL
)
length(filtered_landsat_images)
#> [1] 7

We only wind up downloading about half the number of images!