Skip to contents

Introduction

If you are reading this vignette you are most probably to contribute to the mapme.biodiversity package. This is great news and we are very happy to receive Pull-Requests extending the package’s functionality! Below you will find important in-depth information about how to add resources and indicators to make the process as seamless as possible for both you and the package’s maintainers. Please make sure to read and understand this guide before opening a PR. If in doubt, especially if you feel that the framework does not support your use case, always feel free to raise an issue and we will happily discuss how we can support your ideas. If you have not already done so, make sure to read the Terminology vignette to get familiar with the most important concepts of this package.

Note that we use the tidyverse style guide for the package. That specifically means that function and variable names should follow the snake case pattern. We also use the arrow assignment operator (<-). When submitting a PR that does not consistently follow the tidyverse style guide, the maintainers of the package might change the code to adhere to this code style without further notice before accepting the PR.

Getting started

Ideally, you clone the GitHub repository via the git command in a command line on Linux and MacOS systems or via the GitHub Desktop application on Windows. On Linux, the command would look like this:

git clone https://github.com/mapme-initiative/mapme.biodiversity

We do not accept pushes to main, thus the first step would be to create a specific branch for your extension. In this tutorial, we will pretend to re-implement the nasa_srtm resource and the associated elevation indicator, so we will create a branch reflecting this. Don’t forget to check out to the newly created branch!

git branch add-elevation
git checkout add-elevation

Below, we will assume that you develop your extension to the package in R Studio. The general guidelines to follow also apply if you choose different tooling for your development process, however, it will not be covered in this vignette. We assume that all R development dependencies are installed. The easiest way to ensure this is using devtools:

devtools::install_dev_deps()

Adding a resource

Checklist

Overview of adding a resource

A resource is a supported dataset that can be downloaded from a user’s perspective by specifying one or more functions to get_resources(). Currently, the package supports only raster and vector resources. If you wish to submit the support of a new resource, please be aware that we will only accept new resources if they are associated with at least one indicator calculation. The very first step to adding a resource is to create a new file that will be holding the required code.
Once you checked out to the new branch and having the project opened in R Studio, adapt the following command to open the a new resource file:

file.edit("R/get_<your-new-resource>.R")
# e.g. file.edit("R/get_soildgrids.R")

Documenting the new resource

In the first part of a resource function, make sure to include detailed documentation. This documentation should explain what this resource represents, where it comes from (including a citation), and the user-facging arguments that should be specified during runtime.

Importantly, this documentation MUST receive the roxygen tag @keywords resource, so that the documentation will be identified as a resource. Also, add the bare name of the resource as the @name tag (e.g. in the case of our example that translates to @name nasa_srtm).

#' Short title
#'
#' One or more description paragraphs might follow here. Please describe
#' the spatio-temporal structure of your resource here briefly.
#'
#' @name <the short name of your resource>
#' @param <any user-facing arguments>
#' @keywords resource <identifies the documentation as a resource>
#' @references <ideally a citable scientific publication>
#' @source <a link in the \url{} tag linking to an online documentation>
#' @returns A function that makes a resource available for a portfolio
#' @include register.R
#' @export

The last two tags are important to add as well. The include statement is mandatory for the register functionality (more on that below) to be loaded before your resource function. The export tag is important so that the resource is actually exposed to the users of the package.

Constructing a resource function - Outer level

Resource functions are constructed as closures, i.e. functions that return a function. The outer level exposes arguments to be set by users of the function to fine-control the flow of the function. Note, it is important to check the user input in this outer level for correctness so that warning/error messages in case of any miss specifications are thrown immediately.

For nasa_srtm, this outer level does not look really exciting becuase there are no user-facing arguments to be checked (we will see how to check user-facing arguments when constructing the indicator below):

get_nasa_srtm <- function() {
  # .... inner function level
}

Note, that there are some exported helper functions for re-occurring argument checks that you are free to use (e.g. check_available_years() in case you query the user for a temporal time frame). The arguments defined in the outer level of the resource function are then ready to be used in the inner level, that we will have a look at next.

Constructing a resource function - Inner level

The inner level of a resource function has a mandatory function signature that will be checked during run-time. Your function is required to exactly specify the below signature. For the nasa_srtm resource, this looks like this:

function(x,
         name = "nasa_srtm",
         type = "raster",
         outdir = mapme_options()[["outdir"]],
         verbose = mapme_options()[["verbose"]],
         testing = mapme_options()[["testing"]]) {
  # ... function body
}

The x argument here represents the portfolio object handed over by the user when calling get_resources() which is an sf-object and can thus be used to derive the spatial extent of the portfolio. Next, comes the name and the type of the resource which is required for the backend to correctly handle the output and log the resource once it has been made available.

The other arguments should default to the their respective output values of mapme_options() and represent a character vector for the output directory, a logical to control the verbosity and another logical indicating if the code is currently executed in a testing mode. We will look into how these things come together now as we peak into constructing the actual body of a resource function.

Constructing a resource function - Body

The expected output of a resource function is a character vector of file paths to either raster or vector sources that represent GDAL readable spatial data sets on the local file system. In case you require to download or write intermediate files, you can create files in the output of tempdir().

The output files should match the spatial extent of the portfolio and respect the arguments specified by the user. Output files should be constructed based on the outdir argument. The function should not re-download already existing files in the output directory. For flat files you can use the download_or_skip() helper function to make sure that existing files are not re-downloaded.

Use the verbose argument to decide if informative messages should be printed, e.g. to inform users about download progress. Errors or warnings should be emitted in either case.

Please include a check if testing = TRUE that returns example filenames early without actually downloading anything. This is mandatory for automated test checks of the package on CI platforms where it is not possible to conduct lengthy downloads.

If there is no intersection between the x object and your resource, make sure to return NA as early as possible.

Adding sample resource for package internal testing

We ask you to provide a small subset of your resource to inst/res/resource_name so that indicators that depend on the resource can be tested without the need to actual download the resource.

Because there are some restrictions to the final size of the package, we ask you to put substantial effort in reducing the size of the files to a minimum. This includes cropping all resource samples to the spatial extent of the polygon provided in inst/extdata/sierra_de_neibe_478140.gpkg or a polygon of similar size supplied by you in case it the spatial extent does not intersect with your resource.

For raster resources, if the original raster is encoded as float, consider changing the data type to integer by introducing a scale factor. Also, please use a compression algorithm to further reduce the file size. For vector resources, consider reducing the number of vertices in case the geometries are very complex.

Finally, put your processing script of the resource into data-raw to ensure reproducibility. Then, you are required to write a unit-test for your resource function, which should execute as much of your code as possible without actually conducting a download.

A note on dependencies for resources

Note, that a resource SHALL NOT add additional dependencies to the package. If you add dependencies we require you to add a supporting statement to your PR explaining why these dependencies are needed and why other approaches would fail. Before accepting your PR, we might request you to change your code to remove these dependencies, if it is feasible to achieve the same functionality without.

Adding an indicator

The process of adding an indicator is very similar to the one for resources. However, some input-output requirements are different. Note, that in case that you added a new resource we also expect a new indicator taking advantage of that resource in your PR.

As you will see, there are two new important concepts to have in mind when adding an indicator. These are the processing mode and computational engines. We will briefly explain these concepts below, however, you can also head over to the Terminology vignette if you are interested in a more comprehensive definition of these two terms.

Checklist

Overview of adding a new indicator

An indicator is a logical routine depending on one or more resources that extracts numeric outputs for all assets in a portfolio. From a user’s perspective, indicators are processed via the calc_indicators() function. You as a developer will have to construct an indicator function as a closure, e.g. a function that returns another function. The outer level exposes user-facing arguments and checks that they are correctly specified, while the inner level is required to follow a specified signature and returns a tibble.

Once you checked out to the new branch and having the project opened in R Studio, adapt the following command to open the a new indicator file:

file.edit("R/calc_<your-new-indicator>.R")
# e.g. file.edit("R/calc_precipitation")

Documenting the new indicator

In the first part of an indicator function, make sure to include detailed documentation. This documentation should explain which resources are required to calculate the indicator, the user-facing arguments that should be specified during runtime and the structure of the output tibble. Importantly, this documentation MUST receive the roxygen tag @keywords indicator, so that the documentation will be identified as an indicator. Also, add the bare name of the indicator as the @name tag (e.g. @name elevation).

#' Short title
#'
#' One or more description paragraphs might follow here. Please describe
#' required resource and user arguments here.
#' Please document which processing engines are available for your indicator
#' and briefly describe how the indicator is derived from its inputs.
#'
#' @name <the short name of your indicator, same as in the backlog>
#' @param <any user-facing arguments>
#' @keywords indicator <identifies the documentation as an indicator>
#' @returns A function that calculates an indicator for a portfolio
#' @include register.R
#' @export

The last two tags are important to add as well. The include statement is mandatory for the register functionality (more on that below) to be loaded before your indicator function. The export tag is important so that the resource is actually exposed to the users of the package.

Constructing an indicator function - Outer level

Indicator functions are constructed as closures, i.e. functions that return a function. The outer level exposes arguments to be set by users of the function to fine-control the flow of the function. Note, it is important to check the user input in this outer level for correctness so that warning/error messages in case of any miss specifications are thrown immediately.

For elevation, this outer level could look something like this:

calc_elevation <- function(engine = "extract",
                           stats = "mean") {
  engine <- check_engine(engine)
  stats <- check_stats(stats)

  # ... inner function level
}

There are some exported helper functions for re-occurring argument checks that you are free to use (e.g. check_engine()). Note, that the arguments defined this way in the outer level of the indicator function are then ready to be used in the inner level that we will have a look at next.

Constructing an indicator function - Inner level

The inner level of an indicator function has a mandatory function signature that will be checked during run-time. Your function is required to exactly specify the below signature. For the elevation indicator, this looks like this:

function(x,
         nasa_srtm = NULL,
         name = "elevation",
         mode = "asset",
         verbose = mapme_options()[["verbose"]]) {
  # ... function body
}

The x argument here represents the portfolio object handed over by the user when calling get_resources() which is an sf-object with 'POLYGON' as features. Next, comes the name(s) of the required resource(s) and the name of the indicator. What follows is the computation mode, that must be one of "asset" or "portfolio".

We realized, that for large (potentially global) portfolios, depending on the spatial resolution of a resource, different processing modes substantially impact the time needed for a computation. For high to medium resolution raster resources, processing on the asset level benefits computation time. However, spatially cropping coarse resolution datasets for a high number of assets introduces significant overhead, thus processing these resources on a portfolio level is more efficient.

If neither of the two processing modes lead to satisfactory processing times for your indicator, please leave an issue/comment to discuss the addition of another processing mode with the maintainers of the package.

The argument verbose defaults to the corresponding package-wide option and should control the verbosity of you indicator function.

Constructing an indicator function - Body

The expected output of an indicator function is a tibble. Depending on the mode specified for processing, it is a single tibble for mode = "asset", or a list of tibbles equal to the rows of x in case mode = "portfolio".

You may use helper functions provided by the package for a common interface e.g. for vector-raster zonal statistics (e.g. by using select_engine()).

You are encouraged to write your own helper function that are needed for your indicator processor. These should be located in the same file as the main processor, start with a dot and should not be exported.

If you wish to include roxygen documentation for your helpers, make sure to add the @keywords internal and @noRd tags to your functions. If you feel that one or more of your helper functions would be of benefit to more that just one indicator, please comment in and issue/pull-request to discuss with the package maintainers if your helper function could be moved to R/utils.R.

Use the verbose argument to decide if informative messages should be printed, e.g. to inform users about processing progress. Errors or warnings should be emitted in either case.

If there is no intersection between the x object and the required resources, or for any other reason why your indicator might not be calculated with the given configuration, make sure to return NA as early as possible.

Adding units tests for an indicator

You are required to add unit tests for your indicator using the package internal example data sets for resources. Make sure to properly test for missspecification of user-facing arguments and also check for the correctness of numerical results of your indicator.

You might not need to construct a portfolio from scratch to test you indicator function. Instead, you can directly call the returned function on an appropriate polygon with the respective required resource. For the elevation indicator, this looks like this:

x <- read_sf(system.file(
  "extdata", "sierra_de_neiba_478140.gpkg",
  package = "mapme.biodiversity"
))

nasa_srtm <- list.files(
  system.file(
    "res", "nasa_srtm",
    package = "mapme.biodiversity"
  ),
  pattern = ".tif$", full.names = TRUE
)

nasa_srtm <- rast(nasa_srtm)
ce <- calc_elevation(stats = c("mean", "median", "sd"))
result_multi_stat <- ce(shp, nasa_srtm)

expect_equal(
  names(result_multi_stat),
  c("elevation_mean", "elevation_median", "elevation_sd")
)