Skip to contents

## Introduction

If you are reading this vignette you are most probably to contribute to the mapme.biodiversity package. This is great news and we are very happy to receive Pull-Requests extending the package’s functionality! Below you will receive important in-depth information about how to add resources and indicators to make the process as seamless as possible for both you and the package’s maintainers. Please make sure to read and understand this guide before opening a PR. If in doubt, especially if you feel that the framework does not support your specific use case, always feel free to raise an issue and we will happily discuss how we can support your ideas. If you have not already done so, make sure to read Terminology vignette to get familiar with the most important concepts of this package.

Note that we use the tidyverse style guide for the package. That specifically means that function and variable names should follow the snake case pattern. We also use the arrow assignment operator (<-). When submitting a PR that does not consistently follow the tidyverse style guide, the maintainers of the package might change the code to adhere to this code style without further notice before accepting the PR.

## Getting started

Ideally, you clone the GitHub repository via the git command in a command line on Linux and MacOS systems or via the GitHub Desktop application on Windows. On Linux, the command would look like this:

git clone https://github.com/mapme-initiative/mapme.biodiversity

We do not accept pushes to main, thus the first step would be to create a specific branch for your extension. In this tutorial, we will pretend to reimplement the soilgrids resources and the associated soilproperties indicator, so that we create a branch reflecting this. Don’t forget to check out to the newly created branch!

git branch add-soilgrid-indicators
git checkout add-soilgrid-indicators

Below, we will assume that you develop your extension to the package in R Studio. The general guidelines to follow also apply if you choose different tooling for your development process, however, it will not be covered in this vignette. We assume that all R development dependencies for the state of the package when you used the git clone command are installed. The easiest way to ensure this is using devtools when in the package’s directory:

devtools::install_dev_deps()

## Adding a resource

### Checklist

• Add the new resource to R/resources_backlog.R following the standardized template
• Create a file for all necessary code to download your resource (R/get_resource_name.R)
• Include roxygen documentation for your resource following the provided template
• Check user-specified arguments (if any) for correctness
• Retrieve portfolio-wide parameters of interest for your resource from the portfolio
• Match the spatio-temporal extent of the portfolio with your resource
• Provide your own download functionality or use .download_or_skip()
• Delete any intermediate files that are no longer required
• Return the absolute file paths for all matching resource files
• Write a testthat script testing all the newly added functionality (except the actual download) and write it to test/testthat/test-get_resource_name.R
• Add a small example data set of your resource to inst/res/resource_name/
• Added a new dependency? Make sure to include a supporting statement for that dependency in your PR!

### Introducing a new resource to the backlog

A resource is a supported dataset that can be downloaded from a user’s perspective
via the get_resources() function. Currently, the package supports only raster and vector resources. If you wish to submit the support of a new resource, please be aware that we will only accept new resources if they are associated with at least one indicator calculation. The very first step to adding a resource is to add it into the internal resource backlog function so that the package is aware of its existence. Once checkout to the new branch and having the project opened in R Studio, issue the following command to open the resource backlog file:

file.edit("R/resources_backlog.R")

This file keeps track of all supported resources in a list object. You will see that each resource shares a common structure and how it is specified. The name of the list object will be the name the package uses to identify a specific resource. Most importantly, the type argument specifies whether a resource is of type ‘raster’ or ‘vector’. If applicable, the source argument shall contain an URL pointing to a webpage documenting the resource. The downloader argument is the package internal function name that is used to download the resource. This function is the most important code file for a new resource. Then, arguments and their default value to govern the download process can be specified. If no additional arguments are needed just enter an empty list. For the soilgrids resource, the internal backlog looks like this (don’t spend too much effort in understanding the arguments just yet. These will become clearer when we “write” the downloader. When contributing a new resource, it is usually an iterative process between the backlog and the downloader to

y those arguments that need to be specified by users):

soilgrids <- list(
type = "raster",
source = "https://www.isric.org/explore/soilgrids",
downloader = ".get_soilgrids",
arguments = list(
layers = "clay",
depths = "0-5cm",
stats = "mean"
)
)

With the resource being backlogged, the package now can find a resource called soilgrids of type raster and it can also identify the downloader function In this specific case, the package can also determine the default values of three arguments in case users did not specify anything. This is important information that will determine how the get_resources() function works when called by users.

### Documenting the new resource

By convention, the filename of a downloader MUST start with get_<resource_name>.R appended by the name of the resource. In the case of the soilgrids resource that translates to get_soilgrids.R. In the first part of such a downloader, make sure to include detailed documentation. This documentation should explain what this resource represents, where it comes from (including a citation), and the arguments users should specify to control what is downloaded. Importantly, this documentation MUST receive the roxygen tag @docType data as well as the @keywords resource tag, so that the documentation can be identified as a resource. The NULL value below the documentation MUST be included. Below is a template that should be used for documenting a resource.

#' Short title
#'
#' One or more description paragraphs might follow here. Please describe
#' required user arguments here, ideally as itemized lists.
#'
#' @name <the short name of your resource, same as in the backlog>
#' @docType data <we document resources as a dataset>
#' @keywords resource <identifies the documentation as a resource>
#' @format <one sentence on data format and spatial extent>
#' @references <ideally a citable scientific publication>
#' @source <a link in the \url{} tag linking to an online documentation>
NULL

### Function inputs for resources

After documenting the resource, you can get started with implementing the actual downloader. The downloader is a package’s internal function that users do not directly interact with. By convention, we append package internal function names with a dot. Similar to the filename itself, resource downloaders should start with .get_<resource_name>. The first argument is always x, which corresponds to the portfolio object. Important attributes (e.g. the spatial-temporal extent) can be derived from this object. Then additional user-facing arguments might follow. After these arguments, each resource downloader receives the argument rundir which by default should point to the output of tempdir(), but will be pointing to an output directory on disk where the output shall be written to when used by users. Additionally, a logical called verbose, by default set to TRUE, controls the verbosity of the downloader as well as the dots argument. For the soilgrids resource, the function header thus looks like this:

.get_soilgrids <- function(x, layers, depths, stats,
rundir = tempdir(),
verbose = TRUE,
...) {
# downloader coder goes here
}

### Check arguments and retrieval of portfolio-wide parameters

Before actually conducting any downloads, it is important that you as the provider of the new resource check extensively that all required arguments were correctly specified. That specifically applies to the user-defined variables that your downloader requires. The package framework cannot check for the correctness of these arguments. That is something that each downloader has to take care of. If some arguments are wrongly specified, the function should fail (via stop()) and gracefully inform users which arguments where misspecified and which values represent valid values. You can head over to the soilgrids downloader (use file.edit("R/get_soilgrids.R")) and analyse the first few lines of the file (up to line ~130) to see how the inputs are checked for the soilgrids resource.

Some portfolio-wide parameters that might be important to your specific downloader can be determined by analysing the x portfolio attributes. Currently, the following attributes with regard to a resource download are set, when users initialize their portfolio:

  attr(x, "nitems") <- nrow(x)
attr(x, "bbox") <- st_bbox(x)
attr(x, "years") <- years
attr(x, "cores") <- cores
attr(x, "aria_bin") <- aria_bin

Your resource downloader should take care that with these user-specified arguments and the portfolio-wide parameters the files matching the spatio-temporal extent of the portfolio are downloaded. These can be queried with the following syntax with the temporal extent of the portfolio as an example:

### Using helper functions

There are some package internal helper function that we found to be of use for multiple indicators that you are free to use in your indicator processor. You will find them in R/utils.R. These helpers currently are:

• .check_available_years(): Checks if a given target year vector is available for a given indicator
• .check_engine(): Checks if a user-specified engine is available
• .check_stats(): Stats if a user-specified zonal statistic is available

You are encouraged to write your own helper function that are needed for your indicator processor. These should be located in the same file as the main processor, start with a dot and should not be exported. If you wish to include roxygen documentation for your helpers, make sure to add the @keywords internal and @noRd tags to your functions. If you feel that one or more of your helper functions would be of benefit to more that just one indicator, please comment in and issue/pull-request to discuss with the package maintainers if your helper function could be moved to R/utils.R.

### Adding engines to your indicator processor

In writing this package we realized that depending on the structure of a portfolio (i.e. the number of assets, their size and geometric complexity), different engines might lead to better processing times. We thus included three different engines for most of our indicators, and we would invite you to do the same for your contribution. Engines are mostly used in the very last step of an indicator calculation, that is when some kind of zonal statistics are calculated for a specific asset. The currently used engines are:

• terra::extract(): Takes a SpatRaster and a SpatVector as input and computes a zonal statistic for all pixels within the SpatVector
• terra::zonal(): Takes two SpatRasters as input, one the with the target variable(s) the other representing the rasterized input polygon. Then a zonal statistic for the pixels that correspond to the asset extent is calculated
• exactextractr::exact_extract(): Takes a SpatRaster and an sf-object as input and calculates a zonal statistic. It is implemented in C, thus promising fast processing even for very large extents.

If you wish to include another processing engine for your indicator, please indicate this in a comment so that it can be discussed with the packages maintainers. Note, that indicators ideally should not add new dependencies if possible. If they do, please add a supporting statement why this dependency is necessary for your indicator. We also ask you to add dependencies to the Suggests field of the DESCRIPTION file and that you check whether or not this dependency is installed at the beginning of your indicator routine.

### Differences between processing in asset or portfolio mode

An important difference of how to handle the processing between the asset and portfolio mode is that in the asset mode, the package handles the parallelization. However, in the case that processing is conducted in the portfolio mode, we ask you as a developer to iterate over all assets in the portfolio using the pbapply package. Specifically, you can retrieve the number of cores available via the portfolios attributes:

cores <- attributes(shp)$cores results <- pbapply::pblapply(1:nrow(shp), function(i) { # processing logic goes here obj$.id <- i
obj
}, cl = cores)

It is important that you add a variable .id to the output tibble indicating the row number of the asset.

### Defining the output of indicator functions

Indicator functions, should return a tibble in long format as their output without “hiding” any variables in column names. Beside from that requirement, the output of your indicator does not need to follow any specific shape, except that columns shall be equal across all assets. In case that you cannot calculate the indicator for a specific indicator (e.g. because the extents do not overlap), simply return NA. The package will handle these values internally and fill in NA a single row for that asset with the same column names as any other assets, its values set to NA.